Make scancode parallelism configurable #610

RomanIakovlev · 2024-10-23T11:15:50Z

Fixes #609

…file

document._metadata.links.self.href is used in construct file path or blob name when storing the harvested data. It should reflect the _schemaVersion of PodExtract. Added test to verify this.

1. In AbstractProcessor, _schemaVersion is the combination of schemaVersion or toolVersion along the class hierarchy. 2. Most component related processors, e.g. mavenExtract or npmExtract, which are subClasses of abstractClealyDefinedProcessors, overrride toolVersion(), see comments at AbstractProcessor.toolVersion(). This convention was introduced in commit "isolate toolVersion from schemaVersion". The exception is PodExtract. This commit aligns PodExtract with the rest of the component related processors.

This is for fix to exclude .git directory content in recent PR (#525). Bump up the version to allow reharvest of pod components.

The recent fix to exclude content in the .git directory (#525) from pod packages will cause the file count to be different from the previous version. Update the toolVersion for PodExtract to 2.0.0 to reflect this.

Fix fetching latest version for some pod components

The "always" traversal policy behaves as follows: - if the tool result (e.g. licensee) for a specific component exist, the component will be refetched and the tool will be rerun. - if the tool result for a specific component is missing, using the "always" policy leads to a "Unreachable for reprocessing" status and the tool being skipped. The "always" traversal policy is basically a rerun for all the previously ran tools. It is somewhat cumbersome in the case to retriger harvest, especially for integration tests. The proposed new policy make reharvest simpler: - When the tool result for a component is available, the tool will be rerun and tool result updated, similar to the "always" policy. - When the tool result for a component is not available, the component will be fetched and the tool will be run. In summary, this "reharvestAlways" policy is to rerun the harvest tools if results exist and run the harvest tools if results are missing.

Derive license from info.license over classifiers in pypi registry data

Deploy dev crawler via GitHub action

Introduce a new traversal policy

…er-merge Deploys to dev on master merge

APP_VERSION replaces it

add sha and version to ‘/‘ endpoint

Update spdx

yashkohli88 and others added 30 commits January 30, 2024 13:42

Licensee, Ruby and its dependency updated in Dockerfile and DevDocker…

f1f1433

…file

Update Licensee, Ruby and its dependency

a43d7c8

Update scancode-toolkit to latest version and adjust CLI usage

a4d2fda

Remove commented-out lines from Dockerfile

26d53aa

Use CMD instead of ENTRYPOINT in DevDockerfile

b83414f

Update code to work with new scancode version output format

d987bef

Reformat and clean-up launch.json

39455b7

Update cdConfig.js with new scancode CLI arguments

740c762

Update fixtures

236234b

Add a unit test to verify self link for PodExtract

1dd0c45

document._metadata.links.self.href is used in construct file path or blob name when storing the harvested data. It should reflect the _schemaVersion of PodExtract. Added test to verify this.

Bump up the version in PodExtract

bbf6133

This is for fix to exclude .git directory content in recent PR (#525). Bump up the version to allow reharvest of pod components.

Add a unit test to PodExtractTests

9424f37

Bump up the tool version in PodExtract

32eaf2b

The recent fix to exclude content in the .git directory (#525) from pod packages will cause the file count to be different from the previous version. Update the toolVersion for PodExtract to 2.0.0 to reflect this.

Updated request-promise-native

14a3dca

Updated request-promise-native

8dc8733

Updated request-promise-native

fa37b7f

Merge branch 'master' into update-scancode-toolkit

a1bff29

Merge branch 'master' into yk/licensee-upgrade

347047b

Merge branch 'master' into qt/pod_tool_version

1edb1bb

Updated request-promise-native

39ddcda

Updated request-promise-native

b83702b

Merge branch 'master' into qt/pod_tool_version

a3ffa52

Update rimraf to latest version

527b1d6

Replace rimraf with native function

ab0c920

Merge branch 'master' into update-scancode-toolkit

6dc32f1

Use latest ScanCode version

a4e59a5

Add .prettierignore file

af3e7f7

Update fixtures for ScanCode version 32.1.0

64ab636

Merge branch 'master' into qt/pod_tool_version

766cb35

qtomlinson and others added 27 commits August 12, 2024 13:32

Merge branch 'master' into qt/fix_pod_latest_version

6a55052

Merge pull request #588 from qtomlinson/qt/fix_pod_latest_version

dc8d5a2

Fix fetching latest version for some pod components

Merge branch 'master' into qt/fix_pypi_license

2c1f105

Action to deploy to dev

7655ebc

Don't automatically run on master push yet

da414ce

Merge pull request #586 from qtomlinson/qt/fix_pypi_license

85544f8

Derive license from info.license over classifiers in pypi registry data

Merge branch 'master' into deploy-dev-action

4728d99

Merge pull request #599 from clearlydefined/deploy-dev-action

a394fde

Deploy dev crawler via GitHub action

Deploys to dev on master merge

c27d0de

Merge branch 'master' into qt/add_policy

20d440b

Merge pull request #598 from qtomlinson/qt/add_policy

f0fb76a

Introduce a new traversal policy

Merge branch 'master' into enable-dev-deploy-on-master-merge

be282d4

Merge pull request #601 from clearlydefined/enable-dev-deploy-on-mast…

3e1fdc8

…er-merge Deploys to dev on master merge

add sha and version to ‘/‘ endpoint

ce50ee8

Add app version to logger

506fa2a

point to branch of deploy workflow to test

7c43743

Update action branch

9ca70dd

Update workflow branch

7155482

Full branch needed

a4f31b4

Update to current release of workflow

79fc235

Remove build numbder

99cb4a3

APP_VERSION replaces it

Merge pull request #574 from clearlydefined/elr/ver-sha

ef89a38

add sha and version to ‘/‘ endpoint

Update spdx

52e7c26

Merge branch 'prod' into master

edbcaf5

Merge pull request #606 from clearlydefined/ljones140/bump-spdx

0738316

Update spdx

Make scancode parallelism configurable

e1ed7bd

RomanIakovlev changed the base branch from master to prod October 23, 2024 11:17

RomanIakovlev closed this Oct 23, 2024

RomanIakovlev deleted the roman/scancode_parallelism branch October 23, 2024 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make scancode parallelism configurable #610

Make scancode parallelism configurable #610

RomanIakovlev commented Oct 23, 2024

Make scancode parallelism configurable #610

Make scancode parallelism configurable #610

Conversation

RomanIakovlev commented Oct 23, 2024