Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add options to show packages with only-FSF, only-OSI or only-DFSG compatible licenses #8

Open
mahlzahn opened this issue Oct 15, 2023 · 7 comments

Comments

@mahlzahn
Copy link

mahlzahn commented Oct 15, 2023

Idea (see, also #5) by @im397:

-f, --list-fsf List only packages with FSF compatible licenses
-o, --list-osi List only packages with OSI compatible licenses
-d, --list-DFSG List only packages with DFSG compatible licenses

I created this issue to discuss the implementation, I’d be happy to implement it myself.

Based on the information from spdx/license-list-data I suggest to create a simple licenses table file which we than load in the main program to read the licenses from. E.g., here some licenses:

name id FSF OSI DFSG vrms alternatives
BSD Zero Clause License 0BSD Y Y Y [Zero-Clause BSD]
Beerware License Beerware Y Y
Creative Commons Attribution CC-BY Y [CCPL:by]
Creative Commons Attribution 4.0 International CC-BY-4.0 Y Y Y [CCPL:by-4.0]
GNU General Public License v2.0 or later GPL-2.0-or-later Y Y Y Y [GPL2+, GPL2-or-later, GPL2 or any later version]

Technically, I suggest using the json format provided by SPDX and add our own variables for DFSG and alternative IDs, by adding an own json file with content such as:

{
  "licenseListVersion": "2.01",
  "releaseDate": "2023-10-15",
  "licenses": [
    {
      "licenseId": "0BSD",
      "alternativeIds": [
        "Zero-Clause BSD"
      ],
      "isDfsgFree": true,
      "isVrmsFree": true
    },
    {
      "licenseId": "Beerware",
      "isDfsgFree": true,
      "isVrmsFree": true
    },
    {
      "licenseId": "CC-BY",
      "alternativeIds": [
        "CCPL:by"
      ],
      "isVrmsFree": true
    },
    {
      "licenseId": "CC-BY-4.0",
      "alternativeIds": [
        "CCPL:by-4.0"
      ],
      "isDfsgFree": true,
      "isVrmsFree": true
    },
    {
      "licenseId": "GPL-2.0-or-later",
      "alternativeIds": [
        "GPL2+",
        "GPL2-or-later",
        "GPL2 or any later version"
      ],
      "isDfsgFree": true,
      "isVrmsFree": true
    }
  ]
}

If you agree on this approach, I’d be happy to start implementing it.

Edit: DSFG -> DFSG

@gardenappl
Copy link
Owner

So we have two JSON files: one from SPDX and one of our own, and we merge them at runtime? That sounds good me, I was thinking of doing something similar but never got around to it. In theory we could pull the SPDX one dynamically and cache it.

@gardenappl
Copy link
Owner

gardenappl commented Oct 15, 2023

Personally though I'd prefer to keep our data in a simpler format like CSV or TSV, only because then it's easier sort the license list in the source code, and in general since the data is pretty much supplied manually, I think tab-separated values will make for less typing.

@gardenappl
Copy link
Owner

gardenappl commented Oct 15, 2023

License ID	DFSG?	vrms?	Aliases...
0BSD	true	true	Zero-Clause BSD
GPL-2.0-or-later	true	true	GPL2+	GPL2-or-later	GPL2 or any later version

is this too hacky or no? I know a bit of jq so if we need to convert this to JSON at some point, that shouldn't be a huge issue.

@mahlzahn
Copy link
Author

mahlzahn commented Oct 15, 2023

I also thought of csv or tsv as simpler format, but then I thought that json should be the preferred format to be read with python (without extra packages). Nevertheless, I tried with the following sample free_licenses.tsv file

#ID	DFSG?	Aliases
0BSD	True	Zero-Clause BSD
Beerware	True
# CC-BY is ambiguous for versions 1.0, 2.0, etc.
CC-BY		CCPL:by
CC-BY-4.0	True	CCPL:by-4.0
GPL-2.0-or-later	True	GPL2+	GPL2-or-later	GPL2 or any later version

and implemented this little code to read the file

import spdx_license_list as spdx

class License():
    def __init__(self, license_id, dfsg_free=False, *aliases, osi_approved=None, fsf_libre=None):
        self.license_id = license_id
        if type(dfsg_free) == str:
            self.dfsg_free = dfsg_free.lower() in ['true', 'yes', 'y', '1']
        else:
            self.dfsg_free = bool(dfsg_free)
        self.aliases = list(filter(bool, aliases))
        self.osi_approved = osi_approved
        self.fsf_libre = fsf_libre
        if license_id in spdx.LICENSES:
            spdx_license = spdx.LICENSES[license_id]
            if spdx_license.name not in self.aliases:
                self.aliases.append(spdx_license.name)
            if osi_approved is None:
                self.osi_approved = spdx_license.osi_approved
            if fsf_libre is None:
                self.fsf_libre = spdx_license.fsf_libre

with open('free_licenses.tsv') as f:
    for line in f.read().splitlines():
        if line and line[0] != '#':
            print(License(*line.split('\t')).__dict__)

which yields

{'license_id': '0BSD', 'dfsg_free': True, 'aliases': ['Zero-Clause BSD', 'BSD Zero Clause License'], 'osi_approved': True, 'fsf_libre': False}
{'license_id': 'Beerware', 'dfsg_free': True, 'aliases': ['Beerware License'], 'osi_approved': False, 'fsf_libre': False}
{'license_id': 'CC-BY', 'dfsg_free': False, 'aliases': ['CCPL:by'], 'osi_approved': None, 'fsf_libre': None}
{'license_id': 'CC-BY-4.0', 'dfsg_free': True, 'aliases': ['CCPL:by-4.0', 'Creative Commons Attribution 4.0 International'], 'osi_approved': False, 'fsf_libre': True}
{'license_id': 'GPL-2.0-or-later', 'dfsg_free': True, 'aliases': ['GPL2+', 'GPL2-or-later', 'GPL2 or any later version', 'GNU General Public License v2.0 or later'], 'osi_approved': True, 'fsf_libre': True}

Also, I realized that probably we don’t need the field/parameter is_vrms_free because all licenses we add in the tsv file can be considered free for vrms. And if needed we can later add other licenses with separate files.

Edit: I found the very nice and always up-to-date SPDX database for python: https://github.com/JJMC89/spdx-license-list. A bot is automatically pushing always the latest SPDX release and it has all information we need. I incorporated its information in above source code.

@gardenappl
Copy link
Owner

Should we do anything special with the "ethical" licenses? They could just be an extra field in the TSV.

I know that that's a niche feature and the movement behind it, is... well, it was never tremendously popular. But I do have at least one AUR package on my system which uses the Hippocratic license: https://aur.archlinux.org/packages/zsh-abbr

So I'd rather not exclude them.

@gardenappl
Copy link
Owner

gardenappl commented Apr 15, 2024

Should we do anything special with the "ethical" licenses?

I'll just remove the "ethical source" licenses. After looking through the AUR metadata archives, literally nobody in the AUR uses any of those licenses; except for the one aforementioned package using the Hippocratic license, but that license is actually in the SPDX database already.

@gardenappl
Copy link
Owner

I have some other suggestions regarding this, but I will implement them in a re-write. I'm not sure when exactly I'll end up rewriting vrms, but I need to do that sooner rather than later because of Arch's adoption of SPDX expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants