-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make scancode parallelism configurable #610
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
document._metadata.links.self.href is used in construct file path or blob name when storing the harvested data. It should reflect the _schemaVersion of PodExtract. Added test to verify this.
1. In AbstractProcessor, _schemaVersion is the combination of schemaVersion or toolVersion along the class hierarchy. 2. Most component related processors, e.g. mavenExtract or npmExtract, which are subClasses of abstractClealyDefinedProcessors, overrride toolVersion(), see comments at AbstractProcessor.toolVersion(). This convention was introduced in commit "isolate toolVersion from schemaVersion". The exception is PodExtract. This commit aligns PodExtract with the rest of the component related processors.
This is for fix to exclude .git directory content in recent PR (#525). Bump up the version to allow reharvest of pod components.
The recent fix to exclude content in the .git directory (#525) from pod packages will cause the file count to be different from the previous version. Update the toolVersion for PodExtract to 2.0.0 to reflect this.
Fix fetching latest version for some pod components
The "always" traversal policy behaves as follows: - if the tool result (e.g. licensee) for a specific component exist, the component will be refetched and the tool will be rerun. - if the tool result for a specific component is missing, using the "always" policy leads to a "Unreachable for reprocessing" status and the tool being skipped. The "always" traversal policy is basically a rerun for all the previously ran tools. It is somewhat cumbersome in the case to retriger harvest, especially for integration tests. The proposed new policy make reharvest simpler: - When the tool result for a component is available, the tool will be rerun and tool result updated, similar to the "always" policy. - When the tool result for a component is not available, the component will be fetched and the tool will be run. In summary, this "reharvestAlways" policy is to rerun the harvest tools if results exist and run the harvest tools if results are missing.
Derive license from info.license over classifiers in pypi registry data
Deploy dev crawler via GitHub action
Introduce a new traversal policy
…er-merge Deploys to dev on master merge
APP_VERSION replaces it
add sha and version to ‘/‘ endpoint
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #609