-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First stable version #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 22 of 55 files at r1.
Reviewable status: 22 of 55 files reviewed, 5 unresolved discussions (waiting on @g-pyxl)
README.md
line 28 at r1 (raw file):
- Integration with PanelApp for importing gene lists from predefined panels - Custom padding to gene regions - Live edit regions of interest in results viewer
I think you said you'd removed this option
README.md
line 35 at r1 (raw file):
- BED file management system with draft and published states ## Functionality schema
in this image there's a user action "modify" but I thought this had been removed?
README.md
line 108 at r1 (raw file):
1. Build the Docker image:docker build -t bed-file-generator .
specify tag (for Readme could just be generic) ??
README.md
line 113 at r1 (raw file):
2. Run the container:docker run -p 5000:5000 -e BED_GENERATOR_FLASK_KEY=<your_secret_key> bed-file-generator
again specify tag?
README.md
line 152 at r1 (raw file):
- Uses the IGV.js library for genomic visualization - Integrates with PanelApp for gene panel data - Utilises Ensembl VEP and TARK RESTful APIs for metadata
is there any version information for these that should be noted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 25 of 55 files at r1.
Reviewable status: 46 of 55 files reviewed, 13 unresolved discussions (waiting on @g-pyxl)
README.md
line 151 at r1 (raw file):
- Developed by the Synnovis Bioinformatics Team @ Guy's and St Thomas' NHS Foundation Trust (2024) - Uses the IGV.js library for genomic visualization - Integrates with PanelApp for gene panel data
also version info for the panelapp API?
app/bed_generator/api.py
line 207 at r1 (raw file):
return [max(matching_grch37, key=lambda x: int(x['stable_id_version']))] # If no MANE transcripts or matching GRCh37 transcript, selects the transcript with the highest version number.
not sure about this logic (not that the code is wrong, but that this isn't necessarily the most appropriate choice). I think if the transcript hasn't been provided and there's no MANE this needs to be a decision made by a clinical scientist. As a minimum I think any transcripts selected by this logic should be flagged?
app/bed_generator/api.py
line 370 at r1 (raw file):
if len(unknown_features) > 1: for feature in unknown_features: feature['alert'] = f"Coordinate {coord} overlaps multiple unknown genes."
what would this actually mean? how can a feature overlap "multiple unknown genes"?
app/bed_generator/api.py
line 413 at r1 (raw file):
'version': panel['version'], 'version_created': panel['version_created'], 'genes': fetch_genes_for_panel(panel['id'], include_amber=True, include_red=True)
does the user have the option to exclude amber and red genes?
app/bed_generator/bed_generator.py
line 32 at r1 (raw file):
""" loc_region = f"chr{r['loc_region']}" if add_chr_prefix else r['loc_region'] loc_start = r['loc_start'] - padding
Just want to check that this is fine for reverse strand genes, i.e. that elsewhere the code is handling that start and end in that context might be flipped so needs to be handled first?
app/bed_generator/routes.py
line 95 at r1 (raw file):
print("Results:", results) print("assembly:", assembly) print("initial_query:", initial_query) # Debug print
is this debug statement still needed?
app/bed_generator/routes.py
line 120 at r1 (raw file):
# Recalculate loc_start and loc_end based on new padding for result in results: result['loc_start'] = int(result['original_loc_start']) - padding_5
same comment as before re padding- assume reverse strand gene start and stop coords are handled so Start isn't > than stop?
app/bed_generator/routes.py
line 197 at r1 (raw file):
@bed_generator_bp.route('/submit_for_review', methods=['POST']) def submit_for_review(): try:
This is a very long try/except statement which is a bit hard to follow, especially with all the next ifs/fors. If it's working I'm happy for it to be left for now, but could an issue be added to the repo to consider reviewing/refining this in a future update?
…ion to simplify and futureproof
…d for BED file publishing. Logic to retrieve all RNA and empty hg19 entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 37 of 56 files reviewed, 13 unresolved discussions (waiting on @rebeccahaines1)
README.md
line 28 at r1 (raw file):
Previously, rebeccahaines1 wrote…
I think you said you'd removed this option
README updated to reflect this.
README.md
line 35 at r1 (raw file):
Previously, rebeccahaines1 wrote…
in this image there's a user action "modify" but I thought this had been removed?
Have updated the schema to remove this.
README.md
line 108 at r1 (raw file):
Previously, rebeccahaines1 wrote…
specify tag (for Readme could just be generic) ??
README updated for this.
README.md
line 113 at r1 (raw file):
Previously, rebeccahaines1 wrote…
again specify tag?
README updated.
README.md
line 151 at r1 (raw file):
Previously, rebeccahaines1 wrote…
also version info for the panelapp API?
README updated to include versions.
README.md
line 152 at r1 (raw file):
Previously, rebeccahaines1 wrote…
is there any version information for these that should be noted here?
Have also added versions used for these tools.
app/bed_generator/api.py
line 207 at r1 (raw file):
Previously, rebeccahaines1 wrote…
not sure about this logic (not that the code is wrong, but that this isn't necessarily the most appropriate choice). I think if the transcript hasn't been provided and there's no MANE this needs to be a decision made by a clinical scientist. As a minimum I think any transcripts selected by this logic should be flagged?
I have added a new field within the results that informs the user of logical decisions made in selecting transcripts. I have also added a check within the bed review process that informs the user that a clinical review must be made for any non-default (MANE) transcript selections within the BED file. User now has to review and tick a box accepting that this has been reviewed before submission will be accepted.
app/bed_generator/api.py
line 370 at r1 (raw file):
Previously, rebeccahaines1 wrote…
what would this actually mean? how can a feature overlap "multiple unknown genes"?
The only scenario that this will pick up is where a significantly large genomic region in coordinates is used as a query. This is very much an edge case, but the VEP overlap endpoint may return multiple "unknown" features if the region queried is large enough. I think the term "uncharacterised genomic regions" as opposed to genes is probably more appropriate and have updated to reflect this.
app/bed_generator/api.py
line 413 at r1 (raw file):
Previously, rebeccahaines1 wrote…
does the user have the option to exclude amber and red genes?
These are excluded from the frontend by default. The approach here is to retrieve the data in the backend to avoid performing secondary API calls if the user chooses to include amber/red genes.
app/bed_generator/bed_generator.py
line 32 at r1 (raw file):
Previously, rebeccahaines1 wrote…
Just want to check that this is fine for reverse strand genes, i.e. that elsewhere the code is handling that start and end in that context might be flipped so needs to be handled first?
Have included screenshots within another reply - padding is applied based on strand.
app/bed_generator/routes.py
line 95 at r1 (raw file):
Previously, rebeccahaines1 wrote…
is this debug statement still needed?
No longer needed and removed in latest commit.
app/bed_generator/routes.py
line 120 at r1 (raw file):
Previously, rebeccahaines1 wrote…
same comment as before re padding- assume reverse strand gene start and stop coords are handled so Start isn't > than stop?
Have adjusted the logic to simplify this and handle in routes. Forward and reverse are handled as you've described and have performed some double checks with a mixed range. Attaching image:
- Query using APOE (forward) and BRCA1 (reverse)
- 50bp padding has been applied to 5'.
- Padding applied discretely based on identifier strand direction within the same session
app/bed_generator/routes.py
line 197 at r1 (raw file):
Previously, rebeccahaines1 wrote…
This is a very long try/except statement which is a bit hard to follow, especially with all the next ifs/fors. If it's working I'm happy for it to be left for now, but could an issue be added to the repo to consider reviewing/refining this in a future update?
Agreed. I've refactored this down into smaller, (hopefully) easier to follow functions in relevant modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 55 files at r1, 14 of 14 files at r3, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @g-pyxl)
app/bed_generator/panels.json
line 0 at r3 (raw file):
this is a dump of panelapp data. please can it be removed before this branch is merged in to main
app/bed_generator/utils.py
line 234 at r3 (raw file):
panel_id: The ID of the panel. """ genes = fetch_genes_for_panel(panel_id, include_amber=True, include_red=True)
is it clear to the user from the input page that amber and red genes will be includes and/or could/should there be an option to exclude them? Option to exclude can be a future development.
app/bed_manager/routes.py
line 33 at r3 (raw file):
@login_required def submit_bed_file(): # Logic for submitting a new BED file
? is something missing here?
app/bed_manager/routes.py
line 39 at r3 (raw file):
@login_required def review_bed_file(file_id): # Logic for reviewing a BED file
see line 33
app/static/js/results.js
line 12 at r3 (raw file):
settings = data; // Set the default assembly in the dropdown document.getElementById('genome-select').value = settings.defaultAssembly === 'GRCh38' ? 'hg38' : 'hg19';
? GRCh37 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last commit will ideally remove as much redundancy as possible, and will update the README to include some end-to-end testing from this latest version (including using generated BEDs with some of our pipeline components to compare outputs).
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @rebeccahaines1)
app/bed_generator/panels.json
line at r3 (raw file):
Previously, rebeccahaines1 wrote…
this is a dump of panelapp data. please can it be removed before this branch is merged in to main
Removed.
app/bed_generator/utils.py
line 234 at r3 (raw file):
Previously, rebeccahaines1 wrote…
is it clear to the user from the input page that amber and red genes will be includes and/or could/should there be an option to exclude them? Option to exclude can be a future development.
They are excluded by default, but stored in the session query to prevent the need for additional API calls if they user chooses to include these. A default selection would look like this:
And when an option is ticked:
Unticking will remove these genes again.
app/bed_manager/routes.py
line 33 at r3 (raw file):
Previously, rebeccahaines1 wrote…
? is something missing here?
Old logic, this was changed and function no longer needed. Removed.
app/bed_manager/routes.py
line 39 at r3 (raw file):
Previously, rebeccahaines1 wrote…
see line 33
Old logic, this was changed and function no longer needed. Removed.
app/static/js/results.js
line 12 at r3 (raw file):
Previously, rebeccahaines1 wrote…
? GRCh37 ?
This is the genome selection for IGV. The options available to IGV are 'hg38' and 'hg19'.
The code here is storing the initially query from the frontend as settings. If the assembly selected and passed from the initial query == "GRCh38", set IGV to 'hg38'. If anything else, set IGV to 'hg19' by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @g-pyxl)
app/bed_manager/routes.py
line 33 at r3 (raw file):
Previously, g-pyxl (George Doyle) wrote…
Old logic, this was changed and function no longer needed. Removed.
ok. please can you delete from the coed to avoid future confusion
app/bed_manager/routes.py
line 39 at r3 (raw file):
Previously, g-pyxl (George Doyle) wrote…
Old logic, this was changed and function no longer needed. Removed.
see line 33 above
Merged new changes from MVP branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 7 files at r5, 3 of 8 files at r6, 39 of 39 files at r7, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @g-pyxl)
app/bed_generator/transcript_processing.py
line 2 at r7 (raw file):
""" transcript_processing.py - Functions for processing transcript data.
I think all of these functions are duplicated from app/bed_generator/api.py
app/bed_generator/panels.json
line 0 at r7 (raw file):
I think this file needs deleting
app/bed_generator/routes.py
line 151 at r6 (raw file):
# Check both original coordinates and rsID presence is_variant = ( int(result['original_loc_start']) == int(result['original_loc_end']) or
is this assuming a variant is always a single base change? what about indels?
app/bed_generator/api.py
line 4 at r7 (raw file):
api.py - Fetches and processes variant information from Ensembl, TARK, and PanelApp APIs. Functions:
this list is incomplete- please can it be reviewed
app/bed_generator/genes.json
line 0 at r7 (raw file):
Think this file needs to be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main update accounts for padding to variants over 1bp. Various files removed, docstrings updated.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @rebeccahaines1)
app/bed_generator/api.py
line 4 at r7 (raw file):
Previously, rebeccahaines1 wrote…
this list is incomplete- please can it be reviewed
Updated all module docstrings
app/bed_generator/genes.json
line at r7 (raw file):
Previously, rebeccahaines1 wrote…
Think this file needs to be deleted?
Deleted
app/bed_generator/panels.json
line at r7 (raw file):
Previously, rebeccahaines1 wrote…
I think this file needs deleting
Deleted
app/bed_generator/routes.py
line 151 at r6 (raw file):
Previously, rebeccahaines1 wrote…
is this assuming a variant is always a single base change? what about indels?
I have applied the following handler for this in routes.py to account for longer variants. Tested using various indel identifiers, downloads and db entries both applying correct padding for these cases.
Handle SNPs
if entry.get('is_snp', False) or entry.get('rsid'):
if snp_padding:
# For multi-bp SNPs, apply padding to start and end directly
if entry['loc_start'] != entry['loc_end']:
result['loc_start'] = int(entry['loc_start']) - snp_padding
result['loc_end'] = int(entry['loc_end']) + snp_padding
else:
# For single-bp SNPs, calculate from center point
center = int(entry['loc_start'])
result['loc_start'] = center - snp_padding
result['loc_end'] = center + snp_padding
return result
app/bed_generator/transcript_processing.py
line 2 at r7 (raw file):
Previously, rebeccahaines1 wrote…
I think all of these functions are duplicated from app/bed_generator/api.py
Removed - was part of a refactoring branch and accidentally included
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r8, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @g-pyxl)
Initial commit for review/testing.
This change is