Skip to content

Commit

Permalink
sharing and zenodo episode
Browse files Browse the repository at this point in the history
  • Loading branch information
bast committed Sep 16, 2024
1 parent cc8b564 commit 5a37e45
Show file tree
Hide file tree
Showing 9 changed files with 295 additions and 8 deletions.
Binary file added content/img/ai/record-player.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/img/ai/turntable.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/img/license-models.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/img/turing-way/8-fair-principles.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions content/img/turing-way/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Obtained from https://zenodo.org/record/3332808.

When using any of the images, please credit it with
"This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence."
6 changes: 4 additions & 2 deletions content/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ them to own projects**.
- {ref}`refactoring-concepts` (15 min)

- 15:00-16:30 - How to release and publish your code
- {ref}`licensing-publishing` (45 min)
- {ref}`software-licensing` (30 min)
- {ref}`publishing` (15 min)
- {ref}`packaging` (45 min)

- 16:45-18:00 - **Debriefing and Q&A**
Expand Down Expand Up @@ -120,7 +121,8 @@ testing
reusable
refactoring-demo
refactoring-concepts
licensing-publishing
software-licensing
publishing
packaging
profiling
```
Expand Down
6 changes: 0 additions & 6 deletions content/licensing-publishing.md

This file was deleted.

134 changes: 134 additions & 0 deletions content/publishing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
(publishing)=

# How to publish your code

:::{objectives}
- Make our code citable and persistent.
- Make our Notebook reusable and persistent.
:::


## Is putting software on GitHub/GitLab/... publishing?

```{figure} img/turing-way/8-fair-principles.jpg
:alt: FAIR principles
:width: 70%
FAIR principles. (c) [Scriberia](http://www.scriberia.co.uk) for [The Turing Way](https://the-turing-way.netlify.com), CC-BY.
```

Is it enough to make the code public for the code to remain **findable and accessible**?
- No. Because nothing prevents me from deleting my GitHub repository or
rewriting the Git history and we have no guarantee that GitHub will still be around in 10 years.
- **Make your code citable and persistent**:
Get a persistent identifier (PID) such as DOI in addition to sharing the
code publicly, by using services like [Zenodo](https://zenodo.org) or
similar services.


## How to make your software citable

```{discussion} Discussion (Citation-1): Explain how you currently cite software
- Do you cite software that you use? How?
- If I wanted to cite your code/scripts, what would I need to do?
```

**Checklist for making a release of your software citable**:

- Assigned an appropriate license
- Described the software using an appropriate metadata format
- Clear version number
- Authors credited
- Procured a persistent identifier
- Added a recommended citation to the software documentation

This checklist is adapted from: N. P. Chue Hong, A. Allen, A. Gonzalez-Beltran,
et al., Software Citation Checklist for Developers (Version 0.9.0). Zenodo.
2019b. ([DOI](https://doi.org/10.5281/zenodo.3482769))

**Our practical recommendations**:
- Add a file called [CITATION.cff](https://citation-file-format.github.io/) ([example](https://github.com/bast/runtest/blob/main/CITATION.cff)).
- Get a [digital object identifier
(DOI)](https://en.wikipedia.org/wiki/Digital_object_identifier) for your code
on [Zenodo](https://zenodo.org/) ([example](https://zenodo.org/record/8003695)).
- Make it as easy as possible: clearly say what you want cited.

This is an example of a simple `CITATION.cff` file:
```yaml
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Doe
given-names: Jane
orcid: https://orcid.org/1234-5678-9101-1121
title: "My Research Software"
version: 2.0.4
doi: 10.5281/zenodo.1234
date-released: 2021-08-11
```
More about `CITATION.cff` files:
- [GitHub now supports CITATION.cff files](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files)
- [Web form to create, edit, and validate CITATION.cff files](https://citation-file-format.github.io/cff-initializer-javascript/)
- [Video: "How to create a CITATION.cff using cffinit"](https://www.youtube.com/watch?v=zcgLIT5Qd4M)


## Papers with focus on scientific software

Where can I publish papers which are primarily focused on my scientific
software? Great list/summary is provided in this blog post: ["In which
journals should I publish my software?" (Neil P. Chue
Hong)](https://www.software.ac.uk/top-tip/which-journals-should-i-publish-my-software)


## How to cite software

```{admonition} Great resources
- A. M. Smith, D. S. Katz, K. E. Niemeyer, and FORCE11 Software Citation
Working Group, "Software citation principles," PeerJ Comput. Sci., vol. 2,
no. e86, 2016 ([DOI](https://doi.org/10.7717/peerj-cs.86))
- D. S. Katz, N. P. Chue Hong, T. Clark, et al., Recognizing the value of
software: a software citation guide [version 2; peer review: 2 approved].
F1000Research 2021, 9:1257 ([DOI](https://doi.org/10.12688/f1000research.26932.2))
- N. P. Chue Hong, A. Allen, A. Gonzalez-Beltran, et al., Software Citation
Checklist for Authors (Version 0.9.0). Zenodo. 2019a. ([DOI](https://doi.org/10.5281/zenodo.3479199))
- N. P. Chue Hong, A. Allen, A. Gonzalez-Beltran, et al., Software Citation
Checklist for Developers (Version 0.9.0). Zenodo. 2019b. ([DOI](https://doi.org/10.5281/zenodo.3482769))
```

Recommended format for software citation is to ensure the following information
is provided as part of the reference (from [Katz, Chue Hong, Clark,
2021](https://doi.org/10.12688/f1000research.26932.2) which also contains
software citation examples):
- Creator
- Title
- Publication venue
- Date
- Identifier
- Version
- Type



## Exercise/demo

:::{exercise}
- We will add a `CITATION.cff` file to our example repository.
- We will get a DOI using the [Zenodo sandbox](https://sandbox.zenodo.org):
- We will log into the [Zenodo sandbox](https://sandbox.zenodo.org) using
GitHub.
- We will follow [these steps](https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content)
and finally create a GitHub release and get a DOI.
- We will use the [Binder badge on our example repository](https://github.com/coderefinery/imgfilters) to run the Notebook
in the cloud and discuss how we could make it persistent and citable.
:::

:::{discussion}
- Why did we use the Zenodo sandbox and not the "real" Zenodo for our exercise?
:::


## More resources

- [Social coding lesson material](https://coderefinery.github.io/social-coding/software-citation/)
- [Sharing Jupiter Notebooks](https://coderefinery.github.io/jupyter/sharing/)
153 changes: 153 additions & 0 deletions content/software-licensing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
(software-licensing)=

# Choosing a software license

:::{objectives}
- Knowing about what derivative work is and whether we can share it.
- Get familiar with terminology around licensing.
- We will add a license to our example project.
:::


## Copyright and derivative work: Sampling/remixing

:::{figure} img/ai/record-player.png
:alt: Generated image of a monk operating a record player
:width: 50%
:::
[Midjourney, CC-BY-NC 4.0]

:::{figure} img/ai/turntable.png
:alt: Generated image of a monk operating two record players
:width: 50%
:::
[Midjourney, CC-BY-NC 4.0]

- Copyright controls whether and how we can distribute
the original work or the **derivative work**.
- In the **context of software** it is more about
being able to change and **distribute changes**.
- Changing and distributing software is similar to changing and distributing
music
- You can do almost anything if you don't distribute it

**Often we don't have the choice**:
- We are expected to publish software
- Sharing can be good insurance against being locked out

**Can we distribute our changes** with the research community or our future selves?


## Why software licenses matter

You find some great code that you want to reuse for your own publication.

- This is good for the original author - you will cite them. Maybe other people who cite you will cite them.

- You modify and remix the code.

- Two years later ... ⌛

- Time to publish: You realize **there is no license to the original work** 😱

**Now we have a problem**:
- 😬 "Best" case: You manage to publish the paper without the software/data.
Others cannot build on your software and data.
- 😱 Worst case: You cannot publish it at all.
Journal requires that papers should come with data and software so that they are reproducible.


## Taxonomy of software licenses

:::{figure} img/license-models.png
:alt: "European Union Public Licence (EUPL): guidelines July 2021"

European Commission, Directorate-General for Informatics, Schmitz, P., European Union Public Licence (EUPL): guidelines July 2021, Publications Office, 2021, <https://data.europa.eu/doi/10.2799/77160>
:::

Comments:
- Arrows represent compatibility (A -> B: B can reuse A)
- Proprietary/custom: Derivative work typically not possible (no arrow goes from proprietary to open)
- Permissive: Derivative work does not have to be shared
- Copyleft/reciprocal: Derivative work must be made available under the same license terms
- NC (non-commercial) and ND (non-derivative) exist for data licenses but not really for software licenses

**Great resource for comparing software licenses**: [Joinup Licensing Assistant](https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-find-and-compare-software-licenses)
- Provides comments on licenses
- Easy to compare licenses ([example](https://joinup.ec.europa.eu/licence/compare/BSD-3-Clause;Apache-2.0))
- [Joinup Licensing Assistant - Compatibility Checker](https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-compatibility-checker)
- Not biased by some company agenda


## Exercise/demo

:::{exercise}
- Let us choose a license for our example project.
- We will add a LICENSE to the repository.
:::

:::{discussion}
- What if my code uses libraries like `scikit-image`, `scikit-learn`, `numpy`,
`matplotlib`, etc. Do we need to look at their licenses? In other words,
**is our project derivative work** of something else?
:::


## More resources

- Presentation slides "Practical software licensing" (R. Bast): <https://doi.org/10.5281/zenodo.11554001>
- [Social coding lesson material](https://coderefinery.github.io/social-coding/)
- [UiT research software licensing guide (draft)](https://research-software.uit.no/blog/2023-software-licensing-guide/)
- [Research institution policies to support research software (compiled by the Research Software Alliance)](https://www.researchsoft.org/software-policies/)
- More [reading material](https://coderefinery.github.io/social-coding/software-licensing/#great-resources)


## More exercises

:::{exercise} Exercise: What constitutes derivative work?

Which of these are derivative works? Also reflect/discuss how this affects the
choice of license.
- A. Download some code from a website and add on to it
- B. Download some code and use one of the functions in your code
- C. Changing code you got from somewhere
- D. Extending code you got from somewhere
- E. Completely rewriting code you got from somewhere
- F. Rewriting code to a different programming language
- G. Linking to libraries (static or dynamic), plug-ins, and drivers
- H. Clean room design (somebody explains you the code but you have never seen it)
- I. You read a paper, understand algorithm, write own code

```{solution}
- Derivative work: A-F
- Not derivative work: G-I
- E and F: This depends on how you do it, see "clean room design".
```
:::

:::{exercise} Exercise: Licensing situations

Consider some common licensing situations. If you are part of an exercise
group, discuss these with others:
1. What is the StackOverflow license for code you copy and paste?
2. A journal requests that you release your software during publication. You have
copied a portion of the code from another package, which you have forgotten.
Can you satisfy the journal's request?
3. You want to fix a bug in a project someone else has released, but there is no license. What risks are there?
4. How would you ask someone to add a license?
5. You incorporate MIT, GPL, and BSD3 licensed code into your project. What possible licenses can you pick for your project?
6. You do the same as above but add in another license that looks strong copyleft. What possible licenses can you use now?
7. Do licenses apply if you don't distribute your code? Why or why not?
8. Which licenses are most/least attractive for companies with proprietary software?

```{solution}
1. As indicated [here](https://stackoverflow.com/help/licensing), all publicly accessible user contributions are licensed under [Creative Commons Attribution-ShareAlike](https://creativecommons.org/licenses/by-sa/4.0/) license. See Stackoverflow [Terms of service](https://stackoverflow.com/legal/terms-of-service/public#licensing) for more detailed information.
2. "Standard" licensing rules apply. So in this case, you would need to remove the portion of code you have copied from another package before being able to release your software.
3. By default you are no authorized to use the content of a repository when there is no license. And derivative work is also not possible by default. Other risks: it may not be clear whether you can use and distribute (publish) the bugfixed code. For the repo owners it may not be clear whether they can use and distributed the bugfixed code. However, the authors may have forgotten to add a license so we suggest you to contact the authors (e.g. make an issue) and ask whether they are willing to add a license.
4. As mentionned in 3., the easiest is to fill an issue and explain the reasons why you would like to use this software (or update it).
5. Combining software with different licenses can be tricky and it is important to understand compatibilities (or lack of compatibilities) of the various licenses. GPL license is the most protective (BSD and MIT are quite permissive) so for the resulting combined software you could use a GPL license. However, re-licensing may not be necessary.
6. Derivative work would need to be shared under this strong copyleft license (e.g. AGPL or GPL), unless the components are only plugins or libraries.
7. If you keep your code for yourself, you may think you do not need a license. However, remember that in most companies/universities, your employer is "owning" your work and when you leave you may not be allowed to "distribute your code to your future self". So the best is always to add a license!
8. The least attractive licenses for companies with proprietary software are licenses where you would need to keep an open license when creating derivative work. For instance GPL and and AGPL. The most attractive licenses are permissive licenses where they can reuse, modify and relicense with no conditions. For instance MIT, BSD and Apache License.
```
:::

0 comments on commit 5a37e45

Please sign in to comment.