Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] update licenses #675

Open
wants to merge 6 commits into
base: 2023.06-software.eessi.io
Choose a base branch
from

Conversation

MartinsNadia
Copy link

Open the new one to fix the origin branch

Copy link

eessi-bot bot commented Aug 19, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 19, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software

@MartinsNadia
Copy link
Author

MartinsNadia commented Aug 19, 2024

WIP

adding a new licenses update script PR 457;
adding draft yml file

Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had an initial look, perhaps you can add the current output of the script in a gist and link to this issue?

uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use specific version here, see other workflows

fi

- name: Create a PR (if changes detected)
uses: peter-evans/create-pull-request@v5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs specific commit

commit-message: "Auto PR: Update licenses"
title: "Auto PR: Update licenses"
body: ${{ steps.check_licenses.outputs.patch }}
branch: main #fork branch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct branch for EESSI

Comment on lines +36 to +45
- name: Create a PR (if changes detected)
id: create_pull_request
uses: peter-evans/create-pull-request@5e914681df9dc83aa4e4905692ca88beb2f9e91f # v7.0.5
if: steps.check_licenses.outputs.patch != ''
with:
commit-message: "Auto PR: Update licenses"
title: "Auto PR: Update licenses"
body: ${{ steps.check_licenses.outputs.patch }}
branch: update-licenses-${{ github.run_number }}
base: [ "*-software.eessi.io" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that if a PR adds software, then it will trigger another PR to add the licence of the software to the yaml file? Instead, wouldn't you want to make a suggestion that they add the licence as part of the original PR?

pull_request:
branches: [ "*-software.eessi.io" ]
permissions:
contents: write # set permissions for writing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This action is being given a lot of power at a global level, do we really need it to have write permissions? Isn't it enough that we get a notification if it fails?

Comment on lines +47 to +59
- name: Apply patch (if no PR created)
if: steps.create_pull_request.outputs.number == '' && steps.check_licenses.outputs.patch != ''
run: |
if [ -f license_update.patch ] && [ -s license_update.patch ]; then
git apply license_update.patch
else
echo "No changes to apply"
fi
git add licenses.json
git diff --cached --exit-code || git commit -m "Update licenses.json"
git push
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than auto-apply a patch, can a code suggestion be made (or a PR to the source repo using a token without write permission to EESSI)?

@@ -0,0 +1,59 @@
name: Check and update licenses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate file?

Comment on lines +29 to +30
python update_licenses.py --source=pypi TensorFlow
python update_licenses.py --source=github:easybuilders/easybuild EasyBuild
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are going to need to extract the software name from the easystack file, that is complicated. Perhaps instead you should add easystack (and easyconfig) support to update_licenses.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also help you decide where to look for licenses (since you have access to source_urls)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to gather software (and extension) names via Lmod and then check them against the yaml file.

branch: update-licenses-${{ github.run_number }}
base: [ "*-software.eessi.io" ]

- name: Apply patch (if no PR created)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't the PR be created? Wouldn't that mean the previous step will have failed?

@hvelab
Copy link
Contributor

hvelab commented Feb 27, 2025

spare thoughts right now:

  • i kept the retrieval date because the original script did so, but i dont see much sense in keeping it, I would keep the license url instead
  • there are a lot of not found but i expect them to be less of them as right now some source_urls are still formatted with EB syntax but it my priority to fix it tomorrow
  • need to fix the "needs manual action" part as its case sensitive right now with the "Other" licenses

},
"ATK/2.38.0-GCCcore-13.2.0": {
"License": [
"Other"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this end up with Other? I see it is GPL v2 at https://gitlab.gnome.org/Archive/atk

},
"Archive-Zip/1.68-GCCcore-12.2.0": {
"License": [
"Other"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, some of these are going to need some discussion. The license of this is at https://github.com/redhotpenguin/perl-Archive-Zip?tab=License-1-ov-file#readme and it is GPL1+ or an artistic licence. If we are going to put Other we'll need a link to the licence if possible and possibly another field permission_to_redistribute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this is the plain response from the ecosyste.ms API: https://repos.ecosyste.ms/api/v1/repositories/lookup?url=https%3A%2F%2Fgithub.com%2Fredhotpenguin%2Fperl-Archive-Zip

Will work today on a "smarter" way to do this and to scrape other sources because there are much more not found licenses than found or other

},
"Arrow/16.1.0-gfbf-2023b": {
"License": [
"Other"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License for this is https://github.com/apache/arrow/blob/main/LICENSE.txt ("Apache-2.0")

"Retrieved From": "not found"
},
"Bazel/6.3.1-GCCcore-12.3.0": {
"License": "not found",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
"BeautifulSoup/4.12.2-GCCcore-12.3.0": {
"License": [
"Other"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Retrieved From": "https://www.fftw.org"
},
"FFmpeg/6.0-GCCcore-13.2.0": {
"License": "not found",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvelab
Copy link
Contributor

hvelab commented Mar 3, 2025

Updates:

  • Fixed bug and now finds more licenses
  • Now shows the real api call from where it got the license
  • For the "Other" and "not found", does an scraping to go find the LICENSE file -> we need to find a way to retrieve the spdx from there
  • This needs to be improved for the packages with "Others"
  • For the totally "not found", shows source_urls and homepages, seems that most of them need to be sanitized or do the scraping from there, add more keywords

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants