You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The mismatch is due at least in part to Homo_sapiens.GRCh37.75.genbank.fa from SolveBio having already been "flattened in place" by pyfasta, removing headers.
It would be useful to have the original "non-flattened" fasta file if its available.
Otherwise, for now I am going to modify in a "build source data" branch the stored md5 check to match the fasta file I generate from current public sources, and will compare later by other means to make sure the sequence content is the same between the fasta data we are able to currently download from public source and SolveBio's.
The text was updated successfully, but these errors were encountered:
davecap
changed the title
Confirm fastq data source from public URL matches SolveBio internal
Confirm correct fastq data source is used
Oct 3, 2016
davecap
changed the title
Confirm correct fastq data source is used
Confirm correct FASTA data source is used
Oct 3, 2016
SolveBio documents using:
https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13
Primary assembled chr can be downloaded as so:
The resulting Homo_sapiens.GRCh37.75.genbank.fa file does not match the original fastq MD5:
The mismatch is due at least in part to Homo_sapiens.GRCh37.75.genbank.fa from SolveBio having already been "flattened in place" by pyfasta, removing headers.
It would be useful to have the original "non-flattened" fasta file if its available.
Otherwise, for now I am going to modify in a "build source data" branch the stored md5 check to match the fasta file I generate from current public sources, and will compare later by other means to make sure the sequence content is the same between the fasta data we are able to currently download from public source and SolveBio's.
The text was updated successfully, but these errors were encountered: