Improve identifying run dirs #460

kedhammar · 2025-01-22T10:57:01Z

No description provided.

… remove superfluous .abspath method

codecov-commenter · 2025-01-22T10:58:14Z

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Project coverage is 27.90%. Comparing base (0fc69ca) to head (d7b8a92).
Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
taca/utils/bioinfo_tab.py	0.00%	5 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #460   +/-   ##
=======================================
  Coverage   27.90%   27.90%           
=======================================
  Files          37       37           
  Lines        5487     5487           
=======================================
  Hits         1531     1531           
  Misses       3956     3956

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kedhammar · 2025-01-22T10:58:27Z

Running interactively on preproc yields:

In [29]:     # Pattern explained:
    ...:     # 6-8Digits_(maybe ST-)AnythingLetterornumberNumber_Number_AorBLetterornumberordash
    ...:     illumina_rundir_re = re.compile("\d{6,8}_[ST-]*\w+\d+_\d+_[AB]?[A-Z0-9\-]+")
    ...:     # E.g. 20250121_AV242106_B2425434199
    ...:     element_rundir_re = re.compile("\d{8}_AV242106_[AB]\d+")
    ...:
    ...:     for inst_brand in CONFIG["bioinfo_tab"]["data_dirs"]:
    ...:         for data_dir in CONFIG["bioinfo_tab"]["data_dirs"][inst_brand]:
    ...:             if os.path.exists(data_dir):
    ...:                 potential_run_dirs = glob.glob(os.path.join(data_dir, "*"))
    ...:                 potential_run_dirs += glob.glob(os.path.join(data_dir, "nosync", "*"))
    ...:
    ...:                 for run_dir in potential_run_dirs:
    ...:                     if os.path.isdir(run_dir):
    ...:                         if (
    ...:                             (
    ...:                                 inst_brand == "illumina"
    ...:                                 and illumina_rundir_re.match(os.path.basename(run_dir))
    ...:                             )
    ...:                             or (
    ...:                                 inst_brand == "element"
    ...:                                 and element_rundir_re.match(os.path.basename(run_dir))
    ...:                             )
    ...:                             or (
    ...:                                 inst_brand == "ont"
    ...:                                 and ONT_RUN_PATTERN.match(os.path.basename(run_dir))
    ...:                             )
    ...:                         ):
    ...:                             print(f"Working on {run_dir}")
    ...:
Working on /srv/ngi_data/sequencing/NextSeq/Runs/nosync/241218_VH00203_464_AAGJCYYM5
Working on /srv/ngi_data/sequencing/NextSeq/Runs/nosync/250108_VH00203_466_AAG7CFHM5
Working on /srv/ngi_data/sequencing/NovaSeqXPlus/20250117_LH00202_0183_B22LLNLLT4
Working on /srv/ngi_data/sequencing/NovaSeqXPlus/nosync/20241220_LH00188_0192_A22VF5TLT3
Working on /srv/ngi_data/sequencing/AV242106/20250121_AV242106_B2425434199
Working on /srv/ngi_data/sequencing/promethion/20250121_1736_3D_PAY19545_3981329c
Working on /srv/ngi_data/sequencing/promethion/20250121_1736_3C_PAY19839_e006ba72
Working on /srv/ngi_data/sequencing/promethion/20250121_1736_3A_PAY19841_3f6862fe
Working on /srv/ngi_data/sequencing/promethion/20250121_1736_3E_PAY19824_f5ffbd6e
Working on /srv/ngi_data/sequencing/promethion/nosync/20241217_1323_3B_PAS97705_68752b2f
Working on /srv/ngi_data/sequencing/promethion/nosync/20240409_1101_MN19414_FAY65773_f0326a36
Working on /srv/ngi_data/sequencing/promethion/nosync/20240408_1620_MN19414_FAY65773_f7998efd

I.e. looks like it's behaving as expected.

kedhammar · 2025-01-22T11:00:40Z

Ruff CI failure is due to Ruff update, addressed in #461

alneberg · 2025-01-22T11:06:49Z

taca/utils/bioinfo_tab.py

@@ -320,7 +303,7 @@ def get_ss_projects_illumina(run_dir):
    proj_tree = Tree()
    lane_pattern = re.compile("^([1-8]{1,2})$")
    sample_proj_pattern = re.compile("^((P[0-9]{3,5})_[0-9]{3,5})")
-    run_name = os.path.basename(os.path.abspath(run_dir))
+    run_name = os.path.basename(run_dir)


Not sure why this change is needed, is it just to clean up? I asked chatgpt and the old one has an advantage if there are trailing slashes in the run_dir path?

It was mostly to clean up yeah. When I tested the code interactively, I found that removing the method made no difference to how it behaved. The input of the method should always be from a glob search, so I figured it would be fairly consistent as well. I can revoke this particular change if you prefer to keep it?

Yes normally, but I believe you can also run it manually and give a run_dir manually?

Touché
0fc69ca

kedhammar · 2025-01-22T12:25:13Z

taca/utils/bioinfo_tab.py

                            )
-                        ) or (inst_brand == "element" or inst_brand == "ont"):


The bug is in this block, it only skips processing for "archived", causing it to encounter an error when trying to instantiate an ONT run from the "no_backup" dir

Right, and it's fixed by always checking against the ONT_RUN_PATTERN before updating statusdb? 👍

Yes, now it will only act on run dirs whose name matches the pattern of the instrument type, we've essentially moved from a blacklist to a whitelist approach.

alneberg

Thank you!

…finding-run-dirs

kedhammar added 4 commits January 22, 2025 11:43

base processing of run dirs on pattern matching for all platforms and…

b746830

… remove superfluous .abspath method

remove unused dict

14955e8

simplify logic

115849a

vlog

0605160

kedhammar self-assigned this Jan 22, 2025

kedhammar added bug enhancement no validation labels Jan 22, 2025

kedhammar requested review from alneberg and aanil January 22, 2025 11:00

alneberg reviewed Jan 22, 2025

View reviewed changes

kedhammar commented Jan 22, 2025

View reviewed changes

use abspath for potential manual rundir input

0fc69ca

alneberg approved these changes Jan 22, 2025

View reviewed changes

Merge commit '048c78117c8552ec57492c823d238181ad1280da' into improve-…

d7b8a92

…finding-run-dirs

kedhammar merged commit e60b055 into NationalGenomicsInfrastructure:master Jan 22, 2025
7 checks passed

kedhammar deleted the improve-finding-run-dirs branch January 22, 2025 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve identifying run dirs #460

Improve identifying run dirs #460

kedhammar commented Jan 22, 2025

codecov-commenter commented Jan 22, 2025 •

edited

Loading

kedhammar commented Jan 22, 2025

kedhammar commented Jan 22, 2025

alneberg Jan 22, 2025

kedhammar Jan 22, 2025

alneberg Jan 22, 2025

kedhammar Jan 22, 2025

kedhammar Jan 22, 2025

alneberg Jan 22, 2025

kedhammar Jan 22, 2025

alneberg left a comment

Improve identifying run dirs #460

Improve identifying run dirs #460

Conversation

kedhammar commented Jan 22, 2025

codecov-commenter commented Jan 22, 2025 • edited Loading

Codecov Report

kedhammar commented Jan 22, 2025

kedhammar commented Jan 22, 2025

alneberg Jan 22, 2025

Choose a reason for hiding this comment

kedhammar Jan 22, 2025

Choose a reason for hiding this comment

alneberg Jan 22, 2025

Choose a reason for hiding this comment

kedhammar Jan 22, 2025

Choose a reason for hiding this comment

kedhammar Jan 22, 2025

Choose a reason for hiding this comment

alneberg Jan 22, 2025

Choose a reason for hiding this comment

kedhammar Jan 22, 2025

Choose a reason for hiding this comment

alneberg left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 22, 2025 •

edited

Loading