Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve identifying run dirs #460

Conversation

kedhammar
Copy link

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Project coverage is 27.90%. Comparing base (0fc69ca) to head (d7b8a92).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
taca/utils/bioinfo_tab.py 0.00% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #460   +/-   ##
=======================================
  Coverage   27.90%   27.90%           
=======================================
  Files          37       37           
  Lines        5487     5487           
=======================================
  Hits         1531     1531           
  Misses       3956     3956           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kedhammar
Copy link
Author

Running interactively on preproc yields:

In [29]:     # Pattern explained:
    ...:     # 6-8Digits_(maybe ST-)AnythingLetterornumberNumber_Number_AorBLetterornumberordash
    ...:     illumina_rundir_re = re.compile("\d{6,8}_[ST-]*\w+\d+_\d+_[AB]?[A-Z0-9\-]+")
    ...:     # E.g. 20250121_AV242106_B2425434199
    ...:     element_rundir_re = re.compile("\d{8}_AV242106_[AB]\d+")
    ...:
    ...:     for inst_brand in CONFIG["bioinfo_tab"]["data_dirs"]:
    ...:         for data_dir in CONFIG["bioinfo_tab"]["data_dirs"][inst_brand]:
    ...:             if os.path.exists(data_dir):
    ...:                 potential_run_dirs = glob.glob(os.path.join(data_dir, "*"))
    ...:                 potential_run_dirs += glob.glob(os.path.join(data_dir, "nosync", "*"))
    ...:
    ...:                 for run_dir in potential_run_dirs:
    ...:                     if os.path.isdir(run_dir):
    ...:                         if (
    ...:                             (
    ...:                                 inst_brand == "illumina"
    ...:                                 and illumina_rundir_re.match(os.path.basename(run_dir))
    ...:                             )
    ...:                             or (
    ...:                                 inst_brand == "element"
    ...:                                 and element_rundir_re.match(os.path.basename(run_dir))
    ...:                             )
    ...:                             or (
    ...:                                 inst_brand == "ont"
    ...:                                 and ONT_RUN_PATTERN.match(os.path.basename(run_dir))
    ...:                             )
    ...:                         ):
    ...:                             print(f"Working on {run_dir}")
    ...:
Working on /srv/ngi_data/sequencing/NextSeq/Runs/nosync/241218_VH00203_464_AAGJCYYM5
Working on /srv/ngi_data/sequencing/NextSeq/Runs/nosync/250108_VH00203_466_AAG7CFHM5
Working on /srv/ngi_data/sequencing/NovaSeqXPlus/20250117_LH00202_0183_B22LLNLLT4
Working on /srv/ngi_data/sequencing/NovaSeqXPlus/nosync/20241220_LH00188_0192_A22VF5TLT3
Working on /srv/ngi_data/sequencing/AV242106/20250121_AV242106_B2425434199
Working on /srv/ngi_data/sequencing/promethion/20250121_1736_3D_PAY19545_3981329c
Working on /srv/ngi_data/sequencing/promethion/20250121_1736_3C_PAY19839_e006ba72
Working on /srv/ngi_data/sequencing/promethion/20250121_1736_3A_PAY19841_3f6862fe
Working on /srv/ngi_data/sequencing/promethion/20250121_1736_3E_PAY19824_f5ffbd6e
Working on /srv/ngi_data/sequencing/promethion/nosync/20241217_1323_3B_PAS97705_68752b2f
Working on /srv/ngi_data/sequencing/promethion/nosync/20240409_1101_MN19414_FAY65773_f0326a36
Working on /srv/ngi_data/sequencing/promethion/nosync/20240408_1620_MN19414_FAY65773_f7998efd

I.e. looks like it's behaving as expected.

@kedhammar kedhammar requested review from alneberg and aanil January 22, 2025 11:00
@kedhammar
Copy link
Author

Ruff CI failure is due to Ruff update, addressed in #461

@@ -320,7 +303,7 @@ def get_ss_projects_illumina(run_dir):
proj_tree = Tree()
lane_pattern = re.compile("^([1-8]{1,2})$")
sample_proj_pattern = re.compile("^((P[0-9]{3,5})_[0-9]{3,5})")
run_name = os.path.basename(os.path.abspath(run_dir))
run_name = os.path.basename(run_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this change is needed, is it just to clean up? I asked chatgpt and the old one has an advantage if there are trailing slashes in the run_dir path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mostly to clean up yeah. When I tested the code interactively, I found that removing the method made no difference to how it behaved. The input of the method should always be from a glob search, so I figured it would be fairly consistent as well. I can revoke this particular change if you prefer to keep it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes normally, but I believe you can also run it manually and give a run_dir manually?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touché
0fc69ca

)
) or (inst_brand == "element" or inst_brand == "ont"):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is in this block, it only skips processing for "archived", causing it to encounter an error when trying to instantiate an ONT run from the "no_backup" dir

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and it's fixed by always checking against the ONT_RUN_PATTERN before updating statusdb? 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now it will only act on run dirs whose name matches the pattern of the instrument type, we've essentially moved from a blacklist to a whitelist approach.

Copy link
Member

@alneberg alneberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@kedhammar kedhammar merged commit e60b055 into NationalGenomicsInfrastructure:master Jan 22, 2025
7 checks passed
@kedhammar kedhammar deleted the improve-finding-run-dirs branch January 22, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants