Skip to content

Commit

Permalink
issue #85 - data release helper code
Browse files Browse the repository at this point in the history
  • Loading branch information
davmlaw committed Aug 15, 2024
1 parent 6aa1db6 commit ab9a69c
Show file tree
Hide file tree
Showing 3 changed files with 68 additions and 4 deletions.
14 changes: 11 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
## [unreleased]
## [0.2.26] 2024-08-15

Bumped version to 0.2.26 to catch up with data release. Only new client functionality is #81 'data_release' helper functions

All other changes in this release were for data (and contained in data_v0.2.26)

### Added

- New GFFs: RefSeq RS_2023_10, Ensembl 110, 111
- #81 New 'data_release' code eg 'get_latest_combo_file_urls' that looks on GitHub to find latest data
- New GFFs: RefSeq RS_2023_10, Ensembl 111, 112
- #79 - RefSeq MT transcripts
- #66 - We now store 'Note' field (thanks holtgrewe for suggestion)
- Added requirements.txt for 'generate_transcript_data' sections
- client / JSON data schema version compatability check
Expand All @@ -15,6 +21,7 @@
- #64 - Split code/data versions. json.gz are now labelled according to data schema version (thanks holtgrewe)
- Renamed 'CHM13v2.0' to 'T2T-CHM13v2.0' so it could work with biocommons bioutils
- #72 - Correctly handle ncRNA_gene genes (thanks holtgrewe for reporting)
- #73 - HGNC ID was missing for some chrMT genes in Ensembl

## [0.2.21] - 2023-08-14

Expand Down Expand Up @@ -209,7 +216,8 @@

- Initial commit

[unreleased]: https://github.com/SACGF/cdot/compare/v0.2.21...HEAD
[unreleased]: https://github.com/SACGF/cdot/compare/v0.2.26...HEAD
[0.2.26]: https://github.com/SACGF/cdot/compare/v0.2.21...v0.2.26
[0.2.21]: https://github.com/SACGF/cdot/compare/v0.2.20...v0.2.21
[0.2.20]: https://github.com/SACGF/cdot/compare/v0.2.19...v0.2.20
[0.2.19]: https://github.com/SACGF/cdot/compare/v0.2.18...v0.2.19
Expand Down
2 changes: 1 addition & 1 deletion cdot/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "0.2.21"
__version__ = "0.2.26"


def get_data_schema_int(version: str) -> int:
Expand Down
56 changes: 56 additions & 0 deletions cdot/data_release.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import re
import requests
import cdot

from cdot import get_data_schema_int


def get_latest_data_release_tag_name():
latest_data_release = get_latest_data_release()
return latest_data_release.get('tag_name')

def _get_version_from_tag_name(tag_name, data_version=False):
""" Returns None if doesn't match required prefix """
release_prefix = "v"
if data_version:
release_prefix = "data_" + release_prefix

if not tag_name.startswith(release_prefix):
return None
return tag_name.lstrip(release_prefix)


def get_latest_data_release():
client_data_schema = get_data_schema_int(cdot.__version__)

url = "https://api.github.com/repos/SACGF/cdot/releases"
response = requests.get(url)
json_data = response.json()
for release in json_data:
tag_name = release['tag_name'] # Should look like 'v0.2.25' for code or 'data_v0.2.25' for data
# We require a data version
data_version = _get_version_from_tag_name(tag_name, data_version=True)
if data_version is None:
continue

data_schema = get_data_schema_int(data_version)
if data_schema != client_data_schema:
continue
return release
return {}

def get_latest_combo_file_urls(annotation_consortia, genome_builds):
# lower case everything to be case insensitive
annotation_consortia = {x.lower() for x in annotation_consortia}
genome_builds = {x.lower() for x in genome_builds}

file_urls = []
if latest_data_release := get_latest_data_release():
for asset in latest_data_release["assets"]:
browser_download_url = asset["browser_download_url"]
filename = browser_download_url.rsplit("/")[-1]
if m := re.match(r"cdot-(\d+\.\d+\.\d+)\.(refseq|ensembl)\.(.+)\.json\.gz", filename):
_version, annotation_consortium, genome_build = m.groups()
if annotation_consortium.lower() in annotation_consortia and genome_build.lower() in genome_builds:
file_urls.append(browser_download_url)
return file_urls

0 comments on commit ab9a69c

Please sign in to comment.