-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from moka-guys/development
v1.0.0 (#3) Co-Authored-By: Graeme Smith <[email protected]>
- Loading branch information
Showing
48 changed files
with
3,211 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
name: Samplesheet Validator | ||
|
||
on: | ||
push: | ||
branches: | ||
- master | ||
- 'feature/**' | ||
- 'development' | ||
jobs: | ||
build: | ||
|
||
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
python-version: ['3.10.6'] | ||
|
||
steps: | ||
- name: Checkout head | ||
uses: actions/checkout@v3 | ||
with: | ||
fetch-depth: 2 | ||
run: git checkout HEAD^ | ||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
- name: Install dependencies | ||
run: | | ||
python3 -m pip install --upgrade pip | ||
pip3 install flake8==6.0.0 wheel==0.38.4 pytest==7.2.1 | ||
pip3 install -r requirements.txt | ||
- name: Lint with flake8 | ||
run: | | ||
# stop the build if there are: | ||
# - syntax errors (E9) | ||
# - common assertion and comparison gotchas (F63) | ||
# - control flow gotchas (F7) | ||
# - undefined names (F82) | ||
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics | ||
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide | ||
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=120 --statistics | ||
- name: Test with pytest | ||
run: | | ||
python3 -m pytest |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
venv/ | ||
__pycache__/ | ||
build/ | ||
.coverage | ||
dist | ||
*.egg-info | ||
.pytest_cache | ||
*.log | ||
*.vscode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,11 @@ | ||
MIT License | ||
Copyright 2023 Synnovis | ||
|
||
Copyright (c) 2020 Graeme | ||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except | ||
in compliance with the License. You may obtain a copy of the License at | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
[http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0) | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. | ||
Unless required by applicable law or agreed to in writing, software distributed under the License | ||
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express | ||
or implied. See the License for the specific language governing permissions and limitations under | ||
the License. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,106 @@ | ||
# samplesheet_verifier | ||
# Samplesheet Validator | ||
|
||
Checks sample sheet naming and contents. Carries out a series of checks on the sample sheet and collects any errors | ||
that it identifies (SamplesheetCheck.errors_list). It also identifies whether or not a run is a TSO run from the sample | ||
sheet (SamplesheetCheck.tso). | ||
|
||
## Protocol | ||
|
||
Runs a series of checks on the sample sheet, collects any errors identified. Checks whether: | ||
* Sample sheet exists | ||
* Samplesheet name is valid (validates using the [seglh-naming](https://github.com/moka-guys/seglh-naming/) library) | ||
* Sequencer ID is in the list of allowed sequencer IDs supplied to the script | ||
* Samplesheet is not empty (>10 bytes) | ||
* Samplesheet is for a development run, using the development pan number supplied to the script | ||
* Samplesheet contains the minimum expected `[Data]` section headers: `Sample_ID, Sample_Name, index` | ||
* `Sample_ID` and `Sample_Name` match for each sample in the data section of the samplesheet | ||
* Sample name does not contain any illegal characters | ||
* Sample name is valid (validates using the [seglh-naming](https://github.com/moka-guys/seglh-naming/) library) | ||
* Pan numbers are in the list of allowed pan numbers supplied to the script | ||
* Library prep name in the sample name is in the list of allowed library prep names supplied to the script | ||
* Samplesheet contains any TSO samples | ||
|
||
## Usage | ||
|
||
### Python package | ||
|
||
The repository provides a python package which can be installed with: | ||
|
||
`python3 setup.py install` | ||
|
||
NB: Use the --user flag or install into an virtualenv/pipenv if not installing globally. | ||
|
||
```python | ||
|
||
from samplesheet_validator.samplesheet_validator import SamplesheetCheck | ||
|
||
sscheck_obj = SamplesheetCheck( | ||
samplesheet_path, # str | ||
sequencer_ids, # list | ||
panels, # list | ||
library_prep_names, # list | ||
tso_panels, # list | ||
dev_panno, # str | ||
logdir, # str | ||
) | ||
sscheck_obj.ss_checks() # Carry out samplesheeet validation | ||
|
||
print(sscheck_obj.errors_dict) # View the dictionary of error messages | ||
``` | ||
|
||
### Command line | ||
|
||
The environment must be set up as follows: | ||
```bash | ||
python3 -m venv venv | ||
source venv/bin/activate | ||
pip3 install -r requirements.txt | ||
``` | ||
|
||
The script can then be used as follows: | ||
```bash | ||
usage: Used to validate a samplesheet using the seglh-naming conventions | ||
|
||
Given an input samplesheet, will validate the samplesheet using seglh-naming conventions and output a logfile | ||
|
||
options: | ||
-h, --help show this help message and exit | ||
-S SAMPLESHEET_PATH, --samplesheet_path SAMPLESHEET_PATH | ||
Path to samplesheet requiring validation | ||
-SI SEQUENCER_IDS, --sequencer_ids SEQUENCER_IDS | ||
Comma separated string of allowed sequencer IDS | ||
-P PANELS, --panels PANELS | ||
Comma separated string of allowed panel numbers | ||
-R LIBRARY_PREP_NAMES, --library_prep_names LIBRARY_PREP_NAMES | ||
Comma separated string of allowed library prep names | ||
-T TSO_PANELS, --tso_panels TSO_PANELS | ||
Comma separated string of tso panels | ||
-D DEV_PANNO, --dev_panno DEV_PANNO | ||
Development pan number | ||
-L LOGDIR, --logdir LOGDIR | ||
Directory to save the output logfile to | ||
``` | ||
|
||
### Testing | ||
|
||
Test datasets are stored in [/test/data](../test/data). The script has a full test suite: | ||
* [test_samplesheet_validator.py](../test/test_samplesheet_validator.py) | ||
|
||
These tests should be run before pushing any code to ensure all tests in the GitHub Actions workflow pass. These can be run as follows: | ||
|
||
```bash | ||
python3 -m pytest | ||
``` | ||
**N.B. Tests and test cases/files MUST be maintained and updated accordingly in conjunction with script development** | ||
**N.B. This includes ensuring that the arguments passed to pytest in the [pytest.ini](pytest.ini) file are kept up to date** | ||
|
||
|
||
## Logging | ||
|
||
Logging is performed by [ss_logger](samplesheet_validator/ss_logger.py). The directory to save the log file to is supplied as an argument. The output log file is named by the script as follows: | ||
- `$LOGFILE_DIR/$RUNFOLDER_NAME_$TIMESTAMP_samplesheet_validator.log` | ||
|
||
The script also collects the error messages as it runs, which can be used by other scripts when this script is used as an import. | ||
|
||
|
||
### Developed by the Synnovis Genome Informatics Team |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
[pytest] | ||
addopts = -v --ignore=test/data/ --ignore=test/temp/ --cov=. --cov-report term-missing --sequencer_ids=NB551068,NB552085,M02353,M02631,A01229 --library_prep_names=ADX,NGS,TSO,SNP,DEV --tso_panels=Pan4969,Pan5085,Pan5112,Pan5114 --dev_panno=Pan5180 --panels=Pan5180,Pan4009,Pan2835,Pan4940,Pan4396,Pan5113,Pan5115,Pan4969,Pan5085,Pan5112,Pan5114,Pan5007,Pan5008,Pan5009,Pan5010,Pan5011,Pan5012,Pan5013,Pan5014,Pan5015,Pan5016,Pan4119,Pan4121,Pan4122,Pan4125,Pan4126,Pan4974,Pan4975,Pan4976,Pan4977,Pan4978,Pan4979,Pan4980,Pan4981,Pan4982,Pan4983,Pan4984,Pan4821,Pan4822,Pan4823,Pan4824,Pan4825,Pan4149,Pan4150,Pan4129,Pan4964,Pan4130,Pan5121,Pan5185,Pan5186,Pan5143,Pan5147,Pan4816,Pan4817,Pan5122,Pan5144,Pan5148,Pan4819,Pan4820,Pan4145,Pan4146,Pan4132,Pan4134,Pan4136,Pan4137,Pan4138,Pan4143,Pan4144,Pan4151,Pan4314,Pan4351,Pan4387,Pan4390,Pan4826,Pan4827,Pan4828,Pan4829,Pan4830,Pan4831,Pan4832,Pan4833,Pan4834,Pan4835,Pan4836 --logdir=. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
git+https://github.com/moka-guys/[email protected] | ||
setuptools==58.2.0 | ||
pytest==7.2.1 | ||
coverage==6.3.1 | ||
pytest==7.2.1 | ||
flake8==6.1.0 | ||
pytest-cov==4.1.0 |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
import os | ||
import argparse | ||
from .samplesheet_validator import SamplesheetCheck | ||
|
||
|
||
def get_arguments(): | ||
""" | ||
Uses argparse module to define and handle command line input arguments | ||
and help menu | ||
:return argparse.Namespace (object): Contains the parsed arguments | ||
""" | ||
parser = argparse.ArgumentParser( | ||
description=( | ||
"Given an input samplesheet, will validate the samplesheet using " | ||
"seglh-naming conventions and output a logfile" | ||
), | ||
usage="Used to validate a samplesheet using the seglh-naming conventions", | ||
) | ||
parser.add_argument( | ||
"-S", | ||
"--samplesheet_path", | ||
type=lambda x: is_valid_file(parser, x), | ||
required=True, | ||
help="Path to samplesheet requiring validation", | ||
) | ||
parser.add_argument( | ||
"-SI", | ||
"--sequencer_ids", | ||
type=str, | ||
required=True, | ||
help="Comma separated string of allowed sequencer IDS", | ||
) | ||
parser.add_argument( | ||
"-P", | ||
"--panels", | ||
type=str, | ||
required=True, | ||
help="Comma separated string of allowed panel numbers", | ||
) | ||
parser.add_argument( | ||
"-R", | ||
"--library_prep_names", | ||
type=str, | ||
required=True, | ||
help="Comma separated string of allowed library prep names", | ||
) | ||
parser.add_argument( | ||
"-T", | ||
"--tso_panels", | ||
type=str, | ||
required=True, | ||
help="Comma separated string of tso panels", | ||
) | ||
parser.add_argument( | ||
"-D", | ||
"--dev_panno", | ||
type=str, | ||
required=True, | ||
help="Development pan number", | ||
) | ||
parser.add_argument( | ||
"-L", | ||
"--logdir", | ||
type=lambda x: is_valid_dir(parser, x), | ||
required=True, | ||
help="Directory to save the output logfile to", | ||
) | ||
return parser.parse_args() | ||
|
||
|
||
def is_valid_file(parser: argparse.ArgumentParser, file: str) -> str: | ||
""" | ||
Check file path is valid | ||
:param parser (argparse.ArgumentParser): Holds necessary info to parse cmd | ||
line into Python data types | ||
:param file (str): Input argument | ||
""" | ||
if not os.path.exists(file): | ||
parser.error(f"The file {file} does not exist!") | ||
else: | ||
return file | ||
|
||
|
||
def is_valid_dir(parser: argparse.ArgumentParser, dir: str) -> str: | ||
""" | ||
Check directory path is valid | ||
:param parser (argparse.ArgumentParser): Holds necessary info to parse cmd | ||
line into Python data types | ||
:param file (str): Input argument | ||
""" | ||
if not os.path.isdir(dir): | ||
parser.error(f"The directory {dir} does not exist!") | ||
else: | ||
return dir | ||
|
||
|
||
if __name__ == "__main__": | ||
parsed_args = get_arguments() | ||
sscheck_obj = SamplesheetCheck( | ||
parsed_args.samplesheet_path, | ||
parsed_args.sequencer_ids, | ||
parsed_args.panels, | ||
parsed_args.library_prep_names, | ||
parsed_args.tso_panels, | ||
parsed_args.dev_panno, | ||
parsed_args.logdir, | ||
) | ||
sscheck_obj.ss_checks() # Carry out samplesheeet validation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
import datetime | ||
|
||
TIMESTAMP = str(f"{datetime.datetime.now():%Y%m%d_%H%M%S}") | ||
|
||
# Specifies the layout of log records in the final output | ||
LOGGING_FORMATTER = "%(asctime)s - SAMPLESHEET_VALIDATOR - %(levelname)s - %(message)s" | ||
|
||
LOG_MSGS = { | ||
"ss_present": "Samplesheet with supplied name exists (%s)", | ||
"ss_absent": "Samplesheet with supplied name does not exist (%s)", | ||
"ssname_valid": "Samplesheet name is valid (%s)", | ||
"ssname_invalid": "Samplesheet name is invalid (%s). Exception: %s", | ||
"sequencer_id_valid": "Sequencer ID in samplesheet name is valid", | ||
"sequencer_id_invalid": "Sequencer id not in allowed list (%s, %s)", | ||
"ss_not_empty": "Samplesheet is (>10 bytes)", | ||
"ss_empty": "Samplesheet empty (<10 bytes)", | ||
"found_header_line": "Line in samplesheet identified as a header line", | ||
"found_sample_line": "Line in samplesheet identified as containing a sample", | ||
"error_extracting_headers": "An error was encountered when extracting headers from the samplesheet: %s", | ||
"found_empty_line": "Line in samplesheet is an empty line", | ||
"col_extraction_error": "Exception raised while attempting to extract %s from sample line %s: %s", | ||
"headers_as_expected": "Expected headers present in samplesheet", | ||
"headers_err": "Header(/s) missing from [Data] section: '%s'", | ||
"samplenames_match": "All sample names and sample IDS match", | ||
"nonmatching_samplenames": "The following Sample IDs do not match the corresponding Sample Name: (%s)", | ||
"no_illegal_chars": "Sample name %s contains no illegal characters in column %s", | ||
"illegal_chars": "Sample name contains invalid characters (%s: %s)", | ||
"sample_name_valid": "Sample name valid: %s (%s)", | ||
"sample_name_invalid": "Sample name invalid (%s). Exception: %s", | ||
"valid_panno": "Pan no is valid: %s", | ||
"invalid_panno": "Pan no is invalid: %s (%s: %s)", | ||
"valid_library_prep_name": "Library prep name is valid: %s", | ||
"library_prep_name_err": "Library prep name not in allowed list (%s, %s)", | ||
"dev_run": "Samplesheet is from a development run: %s", | ||
"not_dev_run": "Samplesheet is not from a development run: %s", | ||
"tso_run": "Samplesheet is for a TSO run", | ||
"not_tso_run": "Samplesheet is not for a TSO run", | ||
"sschecks_not_passed": "Samplesheet did not pass checks: %s", | ||
"sschecks_passed": "Samplesheet passed all checks %s", | ||
} |
Oops, something went wrong.