Our colleagues are routinely utilizing the fastq-peek.sh software to assess NGS data generated in the laboratory. The outputs, however, do not capture GC-content which is necessary for more robust quality assessment of specimens.
Provide a solution to calculate and report GC content for our collegues.
Update the fastq-peek.sh software to calculate and report GC content.
To calculate GC content, we will:
- Count the number of G and C nucleotides within the input fastq file (
GC_COUNT
) - Count the total number of nucleotides within the input fastq file (
TOTAL_BASE_COUNT
) - Calcuate GC content as a percentage:
(GC Count / Total Base Count) *100
(GC_PERCENT
)
To report the GC content, we will print the GC_PERCENT
value to stdout
, i.e. "GC content in {input_fastq_file}: {GC_PERCENT}%"
- Create a dev environment and a Git branch to ensure development does not interfere with production software
- Add bash one-liners to fastq-peek.sh to calculate:
GC_COUNT
,TOTAL_BASE_COUNT
, &GC_PERCENT
- Print the calculated
GC_PERCENT
value to stdout. - Test solution using benchmark read data
- Calculated GC Content for this input fastq file should equal 50%
- Create a PR and merge these changes to main
- Issue a static version release with semantic versioning (minor release)