Skip to content

3. Tutorial for MAW Python

Mahnoor Zulfiqar edited this page Jun 29, 2023 · 5 revisions

Tutorial of MAW-Python Workflow with CWL

MAW-Python is dependent on the results obtained from MAW-R and MAW-MetFrag. Following the Workflow_Python_Script_all.py, the beginning calls all the packages, then the functions are defined. Around the line 3002, the arguments for the python3 command are written.

# Define the command-line arguments
parser = argparse.ArgumentParser(description='MAW-Py')
parser.add_argument('--file_id', type=str, help='file_id')
parser.add_argument('--msp_file', type=str, help='path to spec result CSV file')
parser.add_argument('--gnps_dir', type=str, help='path to GNPS directory')
parser.add_argument('--hmdb_dir', type=str, help='path to HMDB directory')
parser.add_argument('--mbank_dir', type=str, help='path to MassBank directory')
#name it metfrag_candidate
parser.add_argument('--metfrag_candidate_list', type=str, action='append', help='path to MetFrag candidate table CSV file')
parser.add_argument('--ms1data', type=str, help='path to MS1 data CSV file')
parser.add_argument('--score_thresh', type=float, default=0.75, help='score threshold for MetFrag results (default: 0.75)')
# Parse the command-line arguments
args = parser.parse_args()

1. MAW-R Results Post-processing

The first function post processes the results from GNPS, MassBank and HMDB and writes new files named liked proc.csv. For GNPS, the Compound names and SMILES need to be curated, so MAW-python uses RDKit and PubChemPy to retrieve this information. For HMDB, the original results only have the ID, so a sdf file is downloaded from this link to extract further information. It also removes any low scoring candidates.

print("spec_postproc starts")
msp_file_df = spec_postproc(msp_file, gnps_dir, hmdb_dir, mbank_dir, file_id)

2 MetFrag Results Post-processing

The second step is to post-process the results from MetFrag, where any low scroing candidates are simply removed.

print("metfrag_postproc starts")
ms1data_df = metfrag_postproc(ms1data, metfrag_candidate_list, file_id, score_thresh)

3. Candidate Selection

Candidate selection function creates a folder /CandidateSelection/ which stores a file called ChemMN.tsv for each feature. This .tsv file can be imported as a network in Cytoscape to visualize a chemical similarity network. Keep Source and Target nodes as the Names of the candidates such as M_1(MassBank candidate ranked 1) and use tanimoto score as edge. The second type of file is a ranked list of candidates generated with MAW called sorted_candidate_list.csv. If there are more files, merge the results from all mergedResults-with-one-Candidates.csv files from each file.

print("CandidateSelection starts")
CandidateSelection_SimilarityandIdentity_Metfrag(file_id = file_id, msp_file = msp_file_df, 
ms1data = ms1data_df, standards = False)

4. Classification

Classification function is performed for all features. Such features have classification based on their SMILES, which are taken as input by ClassyFire to generate chemical classes.

classification(resultcsv = file_id + "_mergedResults-with-one-Candidates.csv")