Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WES and Discovery Search to FASP Python examples #108

Open
3 of 7 tasks
ianfore opened this issue Aug 11, 2020 · 2 comments
Open
3 of 7 tasks

Add WES and Discovery Search to FASP Python examples #108

ianfore opened this issue Aug 11, 2020 · 2 comments
Assignees

Comments

@ianfore
Copy link
Collaborator

ianfore commented Aug 11, 2020

https://github.com/ianfore/FASPclient has example python code to

  1. query data
  2. obtain urls via DRS
  3. run a compute
    It currently queries BigQuery directly and submits a pipelines directly to GCP Life Sciences pipeline.
    We want to convert the script so steps 1 and 3 use the equivalent GA4GH APIs.

Creating issue here to track the following

  • Modify FASPScript1.py to submit jobs via WES instead of Life Sciences Pipeline
  • Make BigQuery datasets searchable through Discovery Search
  • 1000Genomes
  • COPDGene
  • GECCO
  • Modify FASPScript1.py to use Discovery Search API to search the data
  • Create scrambled/fake versions of COPDGene, GECCO
  • Put DiscoverySearch datasets under authentication and authorization – stretch?
@ianfore
Copy link
Collaborator Author

ianfore commented Aug 11, 2020

8/11 session
WES
Worked through submitting a WES job (MD5 checksum) on a file specified as a URL obtained from DRS. See checksum.wdl in FASPclient. For debugging purposes we ended up doing this with a relatively small file. Submitted everything via postman.
Discovery
Identified 1000 Genomes views in BigQuery which link subject and specimen data with ids for .bam files. These work as DRS ids for BioDataCatalyst DRS Server. The views were created as queries on a table which is an import of PFB (Avro) from BDC. Due to Presto's preferences for working with views created a table to be used for Discovery Search.

Additional tasks based on 8/11 session

  • Give Ian account in DNAStack WES - Max
  • Write python call to WES Server DNAStack - Ian
  • Add 1000 Genomes dataset to GA4GH Presto adapter - Jonathan
  • Check the MD5 workflow works on a larger file i.e. BAM - Ian

@ianfore
Copy link
Collaborator Author

ianfore commented Aug 13, 2020

Created onek_genomes dataset with lower-cased name per Jonathan's request. Presto needs lower case names. The ssd_drs table is also lower-cased. Granted BigQuery Viewer role on the dataset to the DNAStask service accounts used for Presto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants