The AnGeLi tool on the Bahler Lab website http://bahlerweb.cs.ucl.ac.uk/cgi-bin/GLA/GLA_input is used to look up terms associated with gene lists e.g. FPYO or GO terms.
It runs on a webserver controlled by the UCL CS department.
The data is stored in a flat, tab-seperated value file with a txt extension.
Python 3 pip install gffutils requests
Please refer to the code for exact implementation :)
This script focuses on updating the GO and FYPO terms associated with the genes. Thus, where a field is noted as not having a source (NA) then the value from the original file is used. Otherwise, the latest values are deemed to be correct.
The script will open the original AnGeLiDatabase file and parse the content into a map. The sources will then be downloaded into memory and parsed. Values will be calculated (see code for particular details) and a new results map will be created The script will output the new AnGeLiDatabase file.
The original file, AnGeLiDatabase.txt downloaded 18/12/2024, is stored alongside this README.md file. It is used as the original source for some values that are considered static.
There is also a test list of genes in the file XXXXX. The output from the updated database can be compared to the original output to confirm that the changes are working as expected.
NOTES:
- The primary field of joining is the Systematic_ID e.g. SPAC1002.01
- Mapped value is fomrfrom source noted below, if there is no value then the original value is used.
- GO and FYPO fields are ordered, it has not been verified is this is a requirement, but for sake of simplicity this convention has been retained.
- For GO and FYPO we will scan the directory and fetch the latest file. This is because the date is embedded in the file name
Heading details
Header Name | Description | Sample Value |
---|---|---|
Short name | A shorthand name for the field | Mass |
Long name | A slightly more descriptive name | Molecular weight (kDa) |
Scale of measurement | What is the measure, typically this is a unit measurement or a flag | Metric OR Binary |
Group | The type of the field | Protein Feature |
Source | Where the data comes from | PomBase |
Author | The author? | This will be copied from the "old" file |
Update | The date the data was last updated | 2025-01-05 |
Link | A link to the source or reference | http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005826 |
Field Details