diff --git a/definitions.qmd b/definitions.qmd index 885b6bc..b6ad891 100644 --- a/definitions.qmd +++ b/definitions.qmd @@ -1,14 +1,14 @@ # Definitions {.unnumbered} -CLA: Contributor License Agreement. Has a similar purpose to a DCO (Developer Certificate of Origin). +CLA: Contributor licence Agreement. Has a similar purpose to a DCO (Developer Certificate of Origin). CSR: Clinical Study Report eCRF: electronic Case Report Form -GPL: GNU General Public License +GPL: GNU General Public Licence -MIT: Common acronym for a license released by the Massachusetts Institute of Technology +MIT: Common acronym for a licence released by the Massachusetts Institute of Technology OS: Open-Source diff --git a/releasing.qmd b/releasing.qmd index 5eed2c3..42e1cf1 100644 --- a/releasing.qmd +++ b/releasing.qmd @@ -18,7 +18,7 @@ Competitive IP Post-competitive IP -: Code that translates existing eCRF data into a CSR for submission. In the context of PhUSE collaborators, this will often be packages that take data and apply templated data steps and visualizations, like those seen in the [pharmaverse](https://pharmaverse.org/). +: A less common term we have defined to be where code collaboration improves the efficency of insights, rather than the creation of insights that would otherwise not be possible. In the context of PhUSE collaborators, this includes packages that take CDISC data and apply templated data steps and visualizations to prepare a CSR, like those seen in the [pharmaverse](https://pharmaverse.org/). ## Preparing for release @@ -30,7 +30,7 @@ As a general rule *arising IP* [@LawArising], that is IP generated as part of th What are the differences between GitHub organizations that host packages like [phuse-org](https://github.com/phuse-org), [rinpharma](https://github.com/rinpharma), [ropensci](https://github.com/ropensci), [openpharma](https://github.com/openpharma), [pharmaverse](https://github.com/pharmaverse), [pharmar](https://github.com/pharmar), personal organisations, company owned organisations and organisations created to host a single project? -Ultimately, the license chosen has an impact on how a package can be used, rather than the location the code is shared from. The location though can influence how a project is perceived. If it is hosted on a GitHub organisation with the name of a pharma company, relative to a pan-company organisation, it may imply that the project is '*Company A's*' project rather than something they wish to co-create. As a general rule, the recommendation would be to place it in a company's organisation if you wish to remain control of the roadmap, but look to pan-company organisations if you wish to co-create and co-own the packages trajectory. Some examples are; +Ultimately, the licence chosen has an impact on how a package can be used, rather than the location the code is shared from. The location though can influence how a project is perceived. If it is hosted on a GitHub organisation with the name of a pharma company, relative to a pan-company organisation, it may imply that the project is '*Company A's*' project rather than something they wish to co-create. As a general rule, the recommendation would be to place it in a company's organisation if you wish to remain control of the roadmap, but look to pan-company organisations if you wish to co-create and co-own the packages trajectory. Some examples are; - Personal Github orgs - diffdf ([gowerc/diffdf](https://github.com/gowerc/diffdf)) and survival ([therneau/survival](https://github.com/therneau/survival)) are examples of two repositories used in pharma hosted in Github orgs belonging to a specific individual @@ -52,7 +52,7 @@ If a package started its development on an internal git server, or a private rep When discussing the open sourcing of a codebase, it is important to flag to internal counsel existing external projects, and the overlap of scope with the project you intend to release. -It is possible that decisions made before open sourcing could become a risk after open sourcing. As an example of a plausible scenario; a team need to implement a new function. This function exists in another GPL-3 copy left licenced project. To add that project would introduce multiple dependencies that aren't used by that particular function so a member of the team decides to copy the function into the package. One year later, the package is open sourced with the licence infringing code. Such an occurrence could be lessened by a Contributor License Agreement (CLA; see https://github.com/contributor-assistant/github-action for an example of CLA automation). A CLA helps ensure that anyone contributing to a project acknowledges specific terms expected of contributions, like the contributions are novel code and the author will abide by the projects license terms. In the absence of a CLA it is important to ensure that all code within the package is original, and there is no culture of cannibalising external code and infringing on people's copyright within the development team even for internal projects. +It is possible that decisions made before open sourcing could become a risk after open sourcing. As an example of a plausible scenario; a team need to implement a new function. This function exists in another GPL-3 copy left licenced project. To add that project would introduce multiple dependencies that aren't used by that particular function so a member of the team decides to copy the function into the package. One year later, the package is open sourced with the licence infringing code. Such an occurrence could be lessened by a Contributor Licence Agreement (CLA; see https://github.com/contributor-assistant/github-action for an example of CLA automation). A CLA helps ensure that anyone contributing to a project acknowledges specific terms expected of contributions, like the contributions are novel code and the author will abide by the projects licence terms. In the absence of a CLA it is important to ensure that all code within the package is original, and there is no culture of cannibalising external code and infringing on people's copyright within the development team even for internal projects. ## Reputational risks and supporting others @@ -60,13 +60,13 @@ What are the expectations when I release a package? Are there risks to my compan In this guidance it is suggested to open-source early, yet doing so could expose projects that are not ready for use, might be cancelled before reaching v1.0 or are never successfully adopted. The ratio of failed to successful projects is an important consideration, but a skew in that ratio being a negative indicator can be mitigated if repositories are clear on what stage of the product life cycle they are at (https://lifecycle.r-lib.org/) and make use of tools to inform users if a project has been deprecated (https://docs.github.com/en/repositories/archiving-a-github-repository/archiving-repositories), or are looking for new maintainers to take over the project. -## Licenses: releasing a project +## Licences: releasing a project -Ultimately, the license used for a project would require in-house counsel guidance on what license is preferred. +Ultimately, the licence used for a project would require in-house counsel guidance on what licence is preferred. -All code open-sourced should have a license. The license has a standard location of being a text file called 'LICENSE' in the root of the project folder, or a markdown file called 'LICENSE.md'. Of particular note is that R packages often have the license specified in the R specific location of the DESCRIPTION file, or may have it in both the standard and R specific locations (in rare cases these can also contradict so it is important to read both). +All code open-sourced should have a licence. The licence has a standard location of being a text file called 'LICENSE' in the root of the project folder, or a markdown file called 'LICENSE.md'. Of particular note is that R packages often have the licence specified in the R specific location of the DESCRIPTION file, or may have it in both the standard and R specific locations (in rare cases these can also contradict so it is important to read both). -Generally, permissive licenses are more common in clinical reporting, with the majority of pharmaverse R packages using an MIT (https://choosealicense.com/licenses/mit/) or Apache 2.0 (https://choosealicense.com/licenses/apache-2.0/) license (add ref). These licences allow distribution, commercial use and modification. One primary difference between MIT and Apache 2.0 is that the latter has patent protection language and rules around trademark usage, and may be preferred in larger projects due to its focus on more explicitly spelling out the terms. +Generally, permissive licences are more common in clinical reporting, with the majority of pharmaverse R packages using an MIT (https://choosealicense.com/licenses/mit/) or Apache 2.0 (https://choosealicense.com/licenses/apache-2.0/) license (add ref). These licences allow distribution, commercial use and modification. One primary difference between MIT and Apache 2.0 is that the latter has patent protection language and rules around trademark usage, and may be preferred in larger projects due to its focus on more explicitly spelling out the terms. As a general guidance, if the purpose of the project is to let future contributors freely use the code, MIT license is a concise permissive license to adopt. In the pharmaceutical industry, however, the patent of the code is often of concern in a post-competitive environment across companies, and thus an Apache 2.0 license could be more suitable. On the other hand, the copyleft license (e.g. GPLv2, GPLv3) demands any downstream derivatives to follow the same copyleft license of the source project and generally should be avoided. Sometimes, a company's legal team might come up with their own license that is not listed as one of the approved open-source licenses. It is highly recommended to only use standard open-source licenses, as these are verified by the Open-Source Initiative, so others can easily understand the governance model of your project. diff --git a/using.qmd b/using.qmd index 426641b..5b703ed 100644 --- a/using.qmd +++ b/using.qmd @@ -14,7 +14,10 @@ Some sites like openpharma.pharmaverse.org (specific to R and python packages in ## How active are the community behind a project?  -Projects can go through lifecycles, so activity on a repo could have a variety of positive or negative implications. A project could have almost no active community in terms of recent contributions or response to issues, much like the R package survival (), yet be a stable and critical package in R installations. Alternatively, a lack of activity could indicate a package is abandoned or deprecated. CROSS LINK LUFECYCLES +The activity on a project does not tell you the quality and extent of use of a project. Two examples are: + +- A project could have almost no active community in terms of recent contributions or response to issues, much like the R package survival (), yet be a stable and critical package in R installations. +- A project could also have no activity as it has been abondened after or before it reached v1.0. The community behind a project is also not limited to the people that contribute code. Users can also engage with a project via giving feedback via mechanisms like GitHub issues, emailing authors or engaging in discussions on GitHub issues. @fig-teal is an example of an issue page for the teal R package. The figure shows that teal has 24 open issues, and 266 closed issues. Small speech bubbles on the right of the figure show discussion have occurred on some issues. By looking through the issues, subjective impressions on community health can be made, for instance whether it's a few people giving feedback and one person developing, does it have stale issues no-one replies to, or does it have a lively community engaged in discussion and coordination. @@ -50,7 +53,7 @@ Numerous methods exist to find projects. Specific to R projects, the following s Using R packages as an example, if your analysis plan requires creating a Kaplan Meier plot, you could implement this using open code you program using R base plotting functions. Alternatively, you could introduce a dependency on a package that provides that functionality as a parameterised function, like [survminer](https://github.com/kassambara/survminer/), [visR](https://github.com/openpharma/visR/) or [tern](https://github.com/insightsengineering/tern/). Occasionally an existing package may be missing a feature you want, as can be derived from the presence of at least 3 R packages with a Kaplan Meier plotting function. In such cases, you may need to extend, or start a new package.    -In the case of wanting to change default behaviour of package beyond what is possible in the current function, the user has several options if an open-source license is specified -- ranging from extending the package function to meet your needs, through to initiating a new package. It can be difficult to decide whether to extend an existing package, or whether it may be worth starting a new one, some resources to help understand how to contribute to a new package:  +In the case of wanting to change default behaviour of package beyond what is possible in the current function, the user has several options if an open-source licence is specified -- ranging from extending the package function to meet your needs, through to initiating a new package. It can be difficult to decide whether to extend an existing package, or whether it may be worth starting a new one, some resources to help understand how to contribute to a new package:  - A blog post by Jim Hester on contributing to the tidyverse:   @@ -66,19 +69,19 @@ Risk can come from several domains including;   - Accuracy, the package does not correctly reference what it does, or implements it incorrectly.   -The [R validation hub](https://pharmar.org) is a pan-pharma organisation, that aims to coordinate between pharma companies how the validation (and by extension risk) in R packages is undertaken and documented. Of particular relevance is the [Case Studies repository](https://github.com/pharmaR/case_studies), which contains examples from Roche, Merck and Novartis (as of July 2022) on how they approach validation and risk mitigation. The R Validation Hub is also continuing work on the [Risk Assessment App](https://github.com/pharmaR/risk_assessment), which aims to provide an application that will surface metrics to a user to help evaluate an R package.  +The [R validation hub](https://pharmar.org) is a pan-pharma organisation, that aims to coordinate between pharma companies how the validation (and by extension risk) in R packages is undertaken and documented. Of particular relevance is the [Case Studies repository](https://github.com/pharmaR/case_studies), which contains examples from Roche, Merck and Novartis (as of July 2022) on how they approach validation and risk mitigation. The R Validation Hub also created [`riskmetric`](https://www.pharmar.org/risk/) as a tool to extract metrics relevant to validation, and is continuing work on the [Risk Assessment App](https://github.com/pharmaR/risk_assessment), which aims to provide an application that will surface these metrics to a user to help evaluate an R package.  Roche has also open sourced a github-action called [thevalidatoR, which is available on Github Marketplace](https://github.com/marketplace/actions/r-package-validation-report), which will generate a PDF with the unit testing results, as well as a traceability of matrix of documentation to tested functionality against a standard rocker R container. -## Licenses: using a project  +## Licences: using a project  -The licence of projects you depend on, particularly if you incorporate the source code into your compiled/shared product, can have drastic effects on what you can do with your project. It is always important to seek in-house counsel advice on your companies position on different license types.   +The licence of projects you depend on, particularly if you incorporate the source code into your compiled/shared product, can have drastic effects on what you can do with your project. It is always important to seek in-house counsel advice on your companies position on different licence types.   As a general guidance:  -- There are permissive licenses that allow people to use a project in almost any way, through to copy-left licenses that prevent distributing and, in some cases, monetizing any project that incorporates the dependency into its codebase.   +- There are permissive licences that allow people to use a project in almost any way, through to copy-left licences that prevent distributing and, in some cases, monetizing any project that incorporates the dependency into its codebase.   ```{=html} ``` -- Two key resources to understand license types are and .   +- Two key resources to understand licence types are and .   diff --git a/why.qmd b/why.qmd index 08452c1..0ed3676 100644 --- a/why.qmd +++ b/why.qmd @@ -1,5 +1,5 @@ # Open source: the what and why -'Open Source' software is software covered by a license that legally allows access and inspection of the software's source code. The many varieties of open-source licenses determine what you can then do with the software's source code, i.e. copy, modify, contribute or redistribute. Being able to view and then do something with source code wasn't always so. The term 'open source' has been in use at least since the 1990's [@Peterson2018] and the principles behind the term pre-date computer software. Thus, as long as there has been source code there have been efforts to make it 'open source'. As computing systems became widely adopted in universities and beyond, so to the value of freely accessing the source code of the software they ran became apparent. This effort was described as making software 'free' by Richard Stallman and formalised by the creation of the Free Software Foundation in 1985, including the creation of a legally enforceable licenses (the GNU Public License) to enshrine source code as 'free', that is, having the free-dom to access. Although this effort was the genesis of today's open-source communities, many people mistakenly understood 'free' to mean gratis, which was incorrect: most open-source licences allow the software to be sold for a fee [@OSI]. As is the case, even if the main goal of open source is not creating software gratis, it so happens that the majority of open-source software is made available at no cost. Regardless of whether it is sold for a fee or not, the term 'open source' is the preferred term by most with respect to software with a license that allows access to the source code. +'Open Source' software is software covered by a licence that legally allows access and inspection of the software's source code. The many varieties of open-source licences determine what you can then do with the software's source code, i.e. copy, modify, contribute or redistribute. Being able to view and then do something with source code wasn't always so. The term 'open source' has been in use at least since the 1990's [@Peterson2018] and the principles behind the term pre-date computer software. Thus, as long as there has been source code there have been efforts to make it 'open source'. As computing systems became widely adopted in universities and beyond, so to the value of freely accessing the source code of the software they ran became apparent. This effort was described as making software 'free' by Richard Stallman and formalised by the creation of the Free Software Foundation in 1985, including the creation of a legally enforceable licences (the GNU Public Licence) to enshrine source code as 'free', that is, having the free-dom to access. Although this effort was the genesis of today's open-source communities, many people mistakenly understood 'free' to mean gratis, which was incorrect: most open-source licences allow the software to be sold for a fee [@OSI]. As is the case, even if the main goal of open source is not creating software gratis, it so happens that the majority of open-source software is made available at no cost. Regardless of whether it is sold for a fee or not, the term 'open source' is the preferred term by most with respect to software with a licence that allows access to the source code. -Readers coming from the pharmaceutical industry probably perceive a contradiction here: how can software which is typically gratis to use, have any intrinsic value to either business or private users? Fair enough: this industry depends on capital investment which then depends on retaining the details of their drugs and production secret. The difference lies in the utility of (some) software, versus, in this example, a drug or therapy. Certain categories of software enable the creation of new value. Obvious examples being programming languages enabling creation of specialized applications which can support a specific business process, e.g. C, Python, R and many others. The ability to use and improve these open source languages freely accelerate in multiple dimensions the ability to create business value, e.g. specialized smart phone apps that offers a service to end-users. Imagine if you have an idea for a smart phone app, but before you can write a line of code, you need to buy a license to install that language. And after investing the time and money to access this language you realise it doesn't work as well as you need for your particular app. Or worse yet, it has a bug which renders it unfit for your purpose. Little chance you can resolve this quickly. Open-source software does not have these restrictions so you can focus all your resources on end-user value, not the tools needed for creation. The drugs and therapies manufactured by the pharmaceutical industry are the equivalent of a smart phone app: they provide end-user value. It’s sound business logic to open source the tools used to create these products: remove the restrictions to creating drugs and enable each company to sharpen their focus on developing and delivering them. +Readers coming from the pharmaceutical industry probably perceive a contradiction here: how can software which is typically gratis to use, have any intrinsic value to either business or private users? Fair enough: this industry depends on capital investment which then depends on retaining the details of their drugs and production secret. The difference lies in the utility of (some) software, versus, in this example, a drug or therapy. Certain categories of software enable the creation of new value. Obvious examples being programming languages enabling creation of specialized applications which can support a specific business process, e.g. C, Python, R and many others. The ability to use and improve these open source languages freely accelerate in multiple dimensions the ability to create business value, e.g. specialized smart phone apps that offers a service to end-users. Imagine if you have an idea for a smart phone app, but before you can write a line of code, you need to buy a licence to install that language. And after investing the time and money to access this language you realise it doesn't work as well as you need for your particular app. Or worse yet, it has a bug which renders it unfit for your purpose. Little chance you can resolve this quickly. Open-source software does not have these restrictions so you can focus all your resources on end-user value, not the tools needed for creation. The drugs and therapies manufactured by the pharmaceutical industry are the equivalent of a smart phone app: they provide end-user value. It’s sound business logic to open source the tools used to create these products: remove the restrictions to creating drugs and enable each company to sharpen their focus on developing and delivering them.