Skip to content

Commit

Permalink
Merge pull request #7 from phuse-org/feedback-pharmaverse
Browse files Browse the repository at this point in the history
Make changes based on Pharmaverse council feedback This will close #4
  • Loading branch information
epijim authored Feb 20, 2023
2 parents f89637c + 6d81732 commit 26f8d6f
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 19 deletions.
6 changes: 3 additions & 3 deletions definitions.qmd
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Definitions {.unnumbered}

CLA: Contributor License Agreement. Has a similar purpose to a DCO (Developer Certificate of Origin).
CLA: Contributor licence Agreement. Has a similar purpose to a DCO (Developer Certificate of Origin).

CSR: Clinical Study Report

eCRF: electronic Case Report Form

GPL: GNU General Public License
GPL: GNU General Public Licence

MIT: Common acronym for a license released by the Massachusetts Institute of Technology
MIT: Common acronym for a licence released by the Massachusetts Institute of Technology

OS: Open-Source

Expand Down
14 changes: 7 additions & 7 deletions releasing.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Competitive IP

Post-competitive IP

: Code that translates existing eCRF data into a CSR for submission. In the context of PhUSE collaborators, this will often be packages that take data and apply templated data steps and visualizations, like those seen in the [pharmaverse](https://pharmaverse.org/).
: A less common term we have defined to be where code collaboration improves the efficency of insights, rather than the creation of insights that would otherwise not be possible. In the context of PhUSE collaborators, this includes packages that take CDISC data and apply templated data steps and visualizations to prepare a CSR, like those seen in the [pharmaverse](https://pharmaverse.org/).

## Preparing for release

Expand All @@ -30,7 +30,7 @@ As a general rule *arising IP* [@LawArising], that is IP generated as part of th

What are the differences between GitHub organizations that host packages like [phuse-org](https://github.com/phuse-org), [rinpharma](https://github.com/rinpharma), [ropensci](https://github.com/ropensci), [openpharma](https://github.com/openpharma), [pharmaverse](https://github.com/pharmaverse), [pharmar](https://github.com/pharmar), personal organisations, company owned organisations and organisations created to host a single project?

Ultimately, the license chosen has an impact on how a package can be used, rather than the location the code is shared from. The location though can influence how a project is perceived. If it is hosted on a GitHub organisation with the name of a pharma company, relative to a pan-company organisation, it may imply that the project is '*Company A's*' project rather than something they wish to co-create. As a general rule, the recommendation would be to place it in a company's organisation if you wish to remain control of the roadmap, but look to pan-company organisations if you wish to co-create and co-own the packages trajectory. Some examples are;
Ultimately, the licence chosen has an impact on how a package can be used, rather than the location the code is shared from. The location though can influence how a project is perceived. If it is hosted on a GitHub organisation with the name of a pharma company, relative to a pan-company organisation, it may imply that the project is '*Company A's*' project rather than something they wish to co-create. As a general rule, the recommendation would be to place it in a company's organisation if you wish to remain control of the roadmap, but look to pan-company organisations if you wish to co-create and co-own the packages trajectory. Some examples are;

- Personal Github orgs
- diffdf ([gowerc/diffdf](https://github.com/gowerc/diffdf)) and survival ([therneau/survival](https://github.com/therneau/survival)) are examples of two repositories used in pharma hosted in Github orgs belonging to a specific individual
Expand All @@ -52,21 +52,21 @@ If a package started its development on an internal git server, or a private rep

When discussing the open sourcing of a codebase, it is important to flag to internal counsel existing external projects, and the overlap of scope with the project you intend to release.

It is possible that decisions made before open sourcing could become a risk after open sourcing. As an example of a plausible scenario; a team need to implement a new function. This function exists in another GPL-3 copy left licenced project. To add that project would introduce multiple dependencies that aren't used by that particular function so a member of the team decides to copy the function into the package. One year later, the package is open sourced with the licence infringing code. Such an occurrence could be lessened by a Contributor License Agreement (CLA; see https://github.com/contributor-assistant/github-action for an example of CLA automation). A CLA helps ensure that anyone contributing to a project acknowledges specific terms expected of contributions, like the contributions are novel code and the author will abide by the projects license terms. In the absence of a CLA it is important to ensure that all code within the package is original, and there is no culture of cannibalising external code and infringing on people's copyright within the development team even for internal projects.
It is possible that decisions made before open sourcing could become a risk after open sourcing. As an example of a plausible scenario; a team need to implement a new function. This function exists in another GPL-3 copy left licenced project. To add that project would introduce multiple dependencies that aren't used by that particular function so a member of the team decides to copy the function into the package. One year later, the package is open sourced with the licence infringing code. Such an occurrence could be lessened by a Contributor Licence Agreement (CLA; see https://github.com/contributor-assistant/github-action for an example of CLA automation). A CLA helps ensure that anyone contributing to a project acknowledges specific terms expected of contributions, like the contributions are novel code and the author will abide by the projects licence terms. In the absence of a CLA it is important to ensure that all code within the package is original, and there is no culture of cannibalising external code and infringing on people's copyright within the development team even for internal projects.

## Reputational risks and supporting others

What are the expectations when I release a package? Are there risks to my company's brand having abandoned non-maintained packages?

In this guidance it is suggested to open-source early, yet doing so could expose projects that are not ready for use, might be cancelled before reaching v1.0 or are never successfully adopted. The ratio of failed to successful projects is an important consideration, but a skew in that ratio being a negative indicator can be mitigated if repositories are clear on what stage of the product life cycle they are at (https://lifecycle.r-lib.org/) and make use of tools to inform users if a project has been deprecated (https://docs.github.com/en/repositories/archiving-a-github-repository/archiving-repositories), or are looking for new maintainers to take over the project.

## Licenses: releasing a project
## Licences: releasing a project

Ultimately, the license used for a project would require in-house counsel guidance on what license is preferred.
Ultimately, the licence used for a project would require in-house counsel guidance on what licence is preferred.

All code open-sourced should have a license. The license has a standard location of being a text file called 'LICENSE' in the root of the project folder, or a markdown file called 'LICENSE.md'. Of particular note is that R packages often have the license specified in the R specific location of the DESCRIPTION file, or may have it in both the standard and R specific locations (in rare cases these can also contradict so it is important to read both).
All code open-sourced should have a licence. The licence has a standard location of being a text file called 'LICENSE' in the root of the project folder, or a markdown file called 'LICENSE.md'. Of particular note is that R packages often have the licence specified in the R specific location of the DESCRIPTION file, or may have it in both the standard and R specific locations (in rare cases these can also contradict so it is important to read both).

Generally, permissive licenses are more common in clinical reporting, with the majority of pharmaverse R packages using an MIT (https://choosealicense.com/licenses/mit/) or Apache 2.0 (https://choosealicense.com/licenses/apache-2.0/) license (add ref). These licences allow distribution, commercial use and modification. One primary difference between MIT and Apache 2.0 is that the latter has patent protection language and rules around trademark usage, and may be preferred in larger projects due to its focus on more explicitly spelling out the terms.
Generally, permissive licences are more common in clinical reporting, with the majority of pharmaverse R packages using an MIT (https://choosealicense.com/licenses/mit/) or Apache 2.0 (https://choosealicense.com/licenses/apache-2.0/) license (add ref). These licences allow distribution, commercial use and modification. One primary difference between MIT and Apache 2.0 is that the latter has patent protection language and rules around trademark usage, and may be preferred in larger projects due to its focus on more explicitly spelling out the terms.

As a general guidance, if the purpose of the project is to let future contributors freely use the code, MIT license is a concise permissive license to adopt. In the pharmaceutical industry, however, the patent of the code is often of concern in a post-competitive environment across companies, and thus an Apache 2.0 license could be more suitable. On the other hand, the copyleft license (e.g. GPLv2, GPLv3) demands any downstream derivatives to follow the same copyleft license of the source project and generally should be avoided. Sometimes, a company's legal team might come up with their own license that is not listed as one of the approved open-source licenses. It is highly recommended to only use standard open-source licenses, as these are verified by the Open-Source Initiative, so others can easily understand the governance model of your project.

Expand Down
17 changes: 10 additions & 7 deletions using.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@ Some sites like openpharma.pharmaverse.org (specific to R and python packages in

## How active are the community behind a project? 

Projects can go through lifecycles, so activity on a repo could have a variety of positive or negative implications. A project could have almost no active community in terms of recent contributions or response to issues, much like the R package survival (<https://github.com/therneau/survival>), yet be a stable and critical package in R installations. Alternatively, a lack of activity could indicate a package is abandoned or deprecated. CROSS LINK LUFECYCLES
The activity on a project does not tell you the quality and extent of use of a project. Two examples are:

- A project could have almost no active community in terms of recent contributions or response to issues, much like the R package survival (<https://github.com/therneau/survival>), yet be a stable and critical package in R installations.
- A project could also have no activity as it has been abondened after or before it reached v1.0.

The community behind a project is also not limited to the people that contribute code. Users can also engage with a project via giving feedback via mechanisms like GitHub issues, emailing authors or engaging in discussions on GitHub issues. @fig-teal is an example of an issue page for the teal R package. The figure shows that teal has 24 open issues, and 266 closed issues. Small speech bubbles on the right of the figure show discussion have occurred on some issues. By looking through the issues, subjective impressions on community health can be made, for instance whether it's a few people giving feedback and one person developing, does it have stale issues no-one replies to, or does it have a lively community engaged in discussion and coordination.

Expand Down Expand Up @@ -50,7 +53,7 @@ Numerous methods exist to find projects. Specific to R projects, the following s

Using R packages as an example, if your analysis plan requires creating a Kaplan Meier plot, you could implement this using open code you program using R base plotting functions. Alternatively, you could introduce a dependency on a package that provides that functionality as a parameterised function, like [survminer](https://github.com/kassambara/survminer/), [visR](https://github.com/openpharma/visR/) or [tern](https://github.com/insightsengineering/tern/). Occasionally an existing package may be missing a feature you want, as can be derived from the presence of at least 3 R packages with a Kaplan Meier plotting function. In such cases, you may need to extend, or start a new package.   

In the case of wanting to change default behaviour of package beyond what is possible in the current function, the user has several options if an open-source license is specified -- ranging from extending the package function to meet your needs, through to initiating a new package. It can be difficult to decide whether to extend an existing package, or whether it may be worth starting a new one, some resources to help understand how to contribute to a new package: 
In the case of wanting to change default behaviour of package beyond what is possible in the current function, the user has several options if an open-source licence is specified -- ranging from extending the package function to meet your needs, through to initiating a new package. It can be difficult to decide whether to extend an existing package, or whether it may be worth starting a new one, some resources to help understand how to contribute to a new package: 

- A blog post by Jim Hester on contributing to the tidyverse: <https://www.tidyverse.org/blog/2017/08/contributing/> 

Expand All @@ -66,19 +69,19 @@ Risk can come from several domains including;  

- Accuracy, the package does not correctly reference what it does, or implements it incorrectly.  

The [R validation hub](https://pharmar.org) is a pan-pharma organisation, that aims to coordinate between pharma companies how the validation (and by extension risk) in R packages is undertaken and documented. Of particular relevance is the [Case Studies repository](https://github.com/pharmaR/case_studies), which contains examples from Roche, Merck and Novartis (as of July 2022) on how they approach validation and risk mitigation. The R Validation Hub is also continuing work on the [Risk Assessment App](https://github.com/pharmaR/risk_assessment), which aims to provide an application that will surface metrics to a user to help evaluate an R package. 
The [R validation hub](https://pharmar.org) is a pan-pharma organisation, that aims to coordinate between pharma companies how the validation (and by extension risk) in R packages is undertaken and documented. Of particular relevance is the [Case Studies repository](https://github.com/pharmaR/case_studies), which contains examples from Roche, Merck and Novartis (as of July 2022) on how they approach validation and risk mitigation. The R Validation Hub also created [`riskmetric`](https://www.pharmar.org/risk/) as a tool to extract metrics relevant to validation, and is continuing work on the [Risk Assessment App](https://github.com/pharmaR/risk_assessment), which aims to provide an application that will surface these metrics to a user to help evaluate an R package. 

Roche has also open sourced a github-action called [thevalidatoR, which is available on Github Marketplace](https://github.com/marketplace/actions/r-package-validation-report), which will generate a PDF with the unit testing results, as well as a traceability of matrix of documentation to tested functionality against a standard rocker R container.

## Licenses: using a project 
## Licences: using a project 

The licence of projects you depend on, particularly if you incorporate the source code into your compiled/shared product, can have drastic effects on what you can do with your project. It is always important to seek in-house counsel advice on your companies position on different license types.  
The licence of projects you depend on, particularly if you incorporate the source code into your compiled/shared product, can have drastic effects on what you can do with your project. It is always important to seek in-house counsel advice on your companies position on different licence types.  

As a general guidance: 

- There are permissive licenses that allow people to use a project in almost any way, through to copy-left licenses that prevent distributing and, in some cases, monetizing any project that incorporates the dependency into its codebase.  
- There are permissive licences that allow people to use a project in almost any way, through to copy-left licences that prevent distributing and, in some cases, monetizing any project that incorporates the dependency into its codebase.  

```{=html}
<!-- -->
```
- Two key resources to understand license types are <https://choosealicense.com/> and <https://opensource.org/licenses>.  
- Two key resources to understand licence types are <https://choosealicense.com/> and <https://opensource.org/licenses>.  
Loading

0 comments on commit 26f8d6f

Please sign in to comment.