Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip indexing PCAWG excluded donors by using new metadata JSONL file #40

Open
junjun-zhang opened this issue Mar 27, 2019 · 0 comments
Open

Comments

@junjun-zhang
Copy link

To reflect correct PCAWG datasets associated publication, data files belonging to PCAWG excluded donors should not be indexed. Previous issue: icgc-dcc/dcc-portal#524

The repository indexer currently uses a JSONL file to retrieve PCAWG metadata information, this file is pulled from http://pancancer.info/data_releases/latest/release_may2016.v1.4.with_consensus_calls.jsonl

The DCC bioinfo team has created a new JSONL file with PCAWG excluded donors removed: http://pancancer.info/data_releases/latest/release_may2016.v1.4.with_consensus_calls.excluded_donors_removed.jsonl

Switching to the new JSONL should get us the desired index.

This needs to be tested thoroughly, there could be unintended side affect when some PCAWG donors removed from the JSONL. The DCC Bioinfo team can help with testing and investigating if things break unexpectedly.

@rosibaj rosibaj added the SP:2 Agile Points 2 label Mar 27, 2019
@rosibaj rosibaj added this to the ARGO - Sprint 1907 milestone Mar 27, 2019
@rosibaj rosibaj added Scope Change SP:3 backlog AB:1:Backlog and removed SP:2 Agile Points 2 labels Mar 27, 2019
@rosibaj rosibaj removed this from the ARGO - Sprint 1907 milestone Apr 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants