Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipaPy2 PR #629

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
a338bbc
started ipapy2 clustering tool
hechth Dec 4, 2023
f95e14a
added empty lines
hechth Dec 4, 2023
4982aba
added help text
hechth Dec 4, 2023
8990fcf
added citation
hechth Dec 4, 2023
2e6e58f
added tests
hechth Dec 4, 2023
3c28584
lint
hechth Dec 4, 2023
357a3e5
Update macros.xml
ChangLiuuczlcl8 Dec 7, 2023
8286e5a
Add files via upload
ChangLiuuczlcl8 Dec 7, 2023
c5a98a0
Update ipapy2_clustering.py
ChangLiuuczlcl8 Dec 7, 2023
d9b87e9
Update ipapy2_clustering.xml
ChangLiuuczlcl8 Dec 7, 2023
9705593
Add files via upload
ChangLiuuczlcl8 Dec 8, 2023
8c5bafd
Add files via upload
ChangLiuuczlcl8 Dec 8, 2023
a7f5ff8
Add files via upload
ChangLiuuczlcl8 Dec 8, 2023
c217fe1
first
ChangLiuuczlcl8 Dec 11, 2023
94af151
Update tools/ipapy2/ipapy2_clustering.xml
hechth Dec 12, 2023
76dc71a
Merge branch 'ipa_stub' of github.com:RECETOX/galaxytools into ipa_stub
hechth Dec 12, 2023
04166cd
changed ionization to select
hechth Dec 12, 2023
651874d
Update tools/ipapy2/ipapy2_MS1_annotation.xml
ChangLiuuczlcl8 Dec 12, 2023
5e4d5e9
delete outdated files
ChangLiuuczlcl8 Dec 13, 2023
895aee6
add compute bio and gibbs sampler
ChangLiuuczlcl8 Dec 20, 2023
a37e307
fix error in macros
ChangLiuuczlcl8 Dec 20, 2023
f28a256
Merge pull request #474 from ChangLiuuczlcl8/ipa_stub
hechth Jul 25, 2024
f539a24
Merge branch 'master' into ipa_stub
hechth Jul 25, 2024
fc7b6e0
Merge branch 'master' into ipa_stub
hechth Aug 15, 2024
b088e09
Merge branch 'master' into ipa
Jan 10, 2025
3fabee0
fixed 3 tests
acquayefrank Jan 23, 2025
cf93eb6
fixed a few more tests
acquayefrank Jan 23, 2025
822ee55
cleaner test data
acquayefrank Jan 24, 2025
67f7cfe
cleaner test data
acquayefrank Jan 24, 2025
2c265d3
Merge branch 'RECETOX:master' into ipa
acquayefrank Jan 27, 2025
c8bb50e
working state
acquayefrank Jan 27, 2025
6e41fbf
Merge branch 'ipa' of github.com:acquayefrank/galaxytools into ipa
acquayefrank Jan 27, 2025
5be1b0d
Update tools/ipapy2/.shed.yml
acquayefrank Jan 28, 2025
77a0c8b
made changes based on code review
acquayefrank Jan 29, 2025
4cfdafd
completed ms annotation code cleanup
acquayefrank Jan 30, 2025
b3b9e04
finished refactoring
acquayefrank Jan 31, 2025
fe475ce
lint
hechth Feb 3, 2025
40f4207
lint and fixed tests
hechth Feb 3, 2025
5411edf
removed not needed code
hechth Feb 3, 2025
0694a63
added min and max values
hechth Feb 3, 2025
d806e37
lint
hechth Feb 3, 2025
96b7fb8
Merge branch 'RECETOX:master' into ipa
acquayefrank Feb 4, 2025
c978d8b
Update tools/ipapy2/ipapy2_MS1_annotation.xml
acquayefrank Feb 4, 2025
fc5625d
added some extra references in the README
acquayefrank Feb 4, 2025
a8c80e2
added imporvements from code review
acquayefrank Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions tools/ipapy2/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: ipaPy2
owner:
acquayefrank marked this conversation as resolved.
Show resolved Hide resolved
remote_repository_url: "https://github.com/RECETOX/galaxytools/tree/master/tools/ipapy2"
homepage_url: "https://github.com/francescodc87/ipaPy2"
categories:
- Metabolomics
description: "Mass spectrometry data annotation tool."
long_description: "New Python implementation of the Integrated Probabilistic Annotation (IPA) - A Bayesian annotation method for LC/MS data integrating biochemical relations, isotope patterns and adduct formation."
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "{{ tool_name }} tool from the ipaPy2 package"
suite:
name: suite_ipapy2
description: tools from the ipaPy2 suite are used for annotation of mass spectrometry data
type: repository_suite_definition
113 changes: 113 additions & 0 deletions tools/ipapy2/ipapy2_MS1_annotation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
import argparse
import os

import pandas as pd
from ipaPy2 import ipa


def main(args):
df = pd.read_csv(args.mapped_isotope_patterns, keep_default_na=False)
df = df.replace("", None)
all_adducts = pd.read_csv(args.all_adducts, keep_default_na=False)
all_adducts = all_adducts.replace("", None)
ncores = int(os.environ.get("GALAXY_SLOTS")) if args.ncores is None else args.ncores
acquayefrank marked this conversation as resolved.
Show resolved Hide resolved
ppmunk = args.ppmunk if args.ppmunk else args.ppm
ppmthr = args.ppmthr if args.ppmthr else 2 * args.ppm
hechth marked this conversation as resolved.
Show resolved Hide resolved

annotations = ipa.MS1annotation(
df,
all_adducts,
ppm=args.ppm,
me=args.me,
ratiosd=args.ratiosd,
ppmunk=ppmunk,
ratiounk=args.ratiounk,
ppmthr=ppmthr,
pRTNone=args.pRTNone,
pRTout=args.pRTout,
ncores=int(ncores),
)
annotations_flat = pd.DataFrame()
for peak_id in annotations:
annotation = annotations[peak_id]
annotation["peak_id"] = peak_id
annotations_flat = pd.concat([annotations_flat, annotation])
annotations_file = (
args.MS1_annotations if args.MS1_annotations else "MS1_annotations.csv"
)
annotations_flat.to_csv(annotations_file, index=False)
hechth marked this conversation as resolved.
Show resolved Hide resolved


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--mapped_isotope_patterns",
type=str,
required=True,
help="A csv file containing the MS1 data. Ideally obtained from map_isotope_patterns",
)
parser.add_argument(
"--all_adducts",
type=str,
required=True,
help="A csv file containing the information on all the possible adducts given the database. Ideally obtained from compute_all_adducts",
)
parser.add_argument(
"--ppm",
type=float,
required=True,
help="accuracy of the MS instrument used.",
)
parser.add_argument(
"--me",
type=float,
default=5.48579909065e-04,
help="accurate mass of the electron. Default 5.48579909065e-04",
)
acquayefrank marked this conversation as resolved.
Show resolved Hide resolved
parser.add_argument(
"--ratiosd",
type=float,
default=0.9,
help="acceptable ratio between predicted intensity and observed intensity of isotopes.",
)
parser.add_argument(
"--ppmunk",
type=float,
help="pm associated to the 'unknown' annotation. If not provided equal to ppm.",
)
parser.add_argument(
"--ratiounk",
type=float,
default=0.5,
help="isotope ratio associated to the 'unknown' annotation.",
)
parser.add_argument(
"--ppmthr",
type=float,
help="maximum ppm possible for the annotations. if not provided equal to 2*ppm.",
)
parser.add_argument(
"--pRTNone",
type=float,
default=0.8,
help="multiplicative factor for the RT if no RTrange present in the database.",
)
parser.add_argument(
"--pRTout",
type=float,
default=0.4,
help="multiplicative factor for the RT if measured RT is outside the RTrange present in the database.",
)
parser.add_argument(
"--MS1_annotations",
type=str,
help="MS1 annotation file for outputting results.",
)
parser.add_argument(
"--ncores",
type=int,
default=None,
help="number of cores to use for the computation.",
)
args = parser.parse_args()
main(args)
92 changes: 92 additions & 0 deletions tools/ipapy2/ipapy2_MS1_annotation.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<tool id="ipapy2_MS1_annotation" name="IPA MS1 annotation" version="@TOOL_VERSION@+galaxy0" profile="21.09">
<macros>
<import>macros.xml</import>
</macros>

<requirements>
<requirement type="package" version="@TOOL_VERSION@">ipapy2</requirement>
</requirements>

<command detect_errors="exit_code"><![CDATA[
python3 '${__tool_directory__}/ipapy2_MS1_annotation.py'
--mapped_isotope_patterns '${mapped_isotope_patterns}'
--all_adducts '${all_adducts}'
--ppm ${ppm}
--me ${me}
--ratiosd ${ratiosd}
#if $ppmunk
--ppmunk ${ppmunk}
#else
--ppmunk ${ppm}
#end if
--ratiounk ${ratiounk}
#if $ppmthr
--ppmthr ${ppmthr}
#else
--ppmthr 0
#end if
--pRTNone ${pRTNone}
--pRTout ${pRTout}
--MS1_annotations ${MS1_annotations}
]]></command>

<inputs>
<param label="Mapped isotope patterns" name="mapped_isotope_patterns" type="data" format="csv" help="A csv file containing the MS1 data. Ideally obtained from map_isotope_patterns" />
<param label="all possible adducts" name="all_adducts" type="data" format="csv" help="A csv file containing the information on all the possible adducts given the database. Ideally obtained from compute_all_adducts" />
hechth marked this conversation as resolved.
Show resolved Hide resolved
<param label="ppm" name="ppm" type="float" help="accuracy of the MS instrument used."/>
hechth marked this conversation as resolved.
Show resolved Hide resolved
<section name="unknown" title="unknown settings">
acquayefrank marked this conversation as resolved.
Show resolved Hide resolved
<param name="ppmunk" type="float" optional="true">
hechth marked this conversation as resolved.
Show resolved Hide resolved
<label>ppm for unknown</label>
<help>ppm associated to the 'unknown' annotation. If not provided equal to ppm.</help>
</param>
<param name="ratiounk" type="float" optional="true" value="0.5">
hechth marked this conversation as resolved.
Show resolved Hide resolved
<label>isotope ratio for unknown</label>
<help>isotope ratio associated to the 'unknown' annotation.</help>
</param>
</section>
<section name="optional_settings" title="optional settings">
<param name="me" type="float" value="5.48579909065e-04">
<label>mass of the electron.</label>
<help>accurate mass of the electron. Default 5.48579909065e-04.</help>
</param>
hechth marked this conversation as resolved.
Show resolved Hide resolved
<param name="ratiosd" type="float" value="0.9" optional="true">
<label>intensity ratio</label>
<help>acceptable ratio between predicted intensity and observed intensity of isotopes</help>
</param>
hechth marked this conversation as resolved.
Show resolved Hide resolved
<param name="ppmthr" type="float" optional="true">
<label>ppm threshold</label>
<help>maximum ppm possible for the annotations. if not provided equal to 2*ppm.</help>
</param>
hechth marked this conversation as resolved.
Show resolved Hide resolved
<param name="pRTNone" type="float" optional="true" value="0.8">
<label>no RT factor</label>
<help>multiplicative factor for the RT if no RTrange present in the database.</help>
</param>
<param name="pRTout" type="float" optional="true" value="0.4">
<label>outside RT factor</label>
<help>multiplicative factor for the RT if measured RT is outside the RTrange present in the database.</help>
</param>
</section>
</inputs>

<outputs>
<data label="${tool.name} on ${on_string}" name="MS1_annotations" format="csv"/>
</outputs>

<tests>
<test>
<param name="mapped_isotope_patterns" value="mapped_isotope_patterns.csv"/>
<param name="all_adducts" value="all_adducts.csv"/>
<param name="ppm" value="3"/>
<output name="MS1_annotations" file="MS1_annotations.csv"/>
</test>
</tests>

<help><![CDATA[
::
Annotation of the dataset base on the MS1 information. Prior probabilities
are based on mass only, while post probabilities are based on mass, RT,
previous knowledge and isotope patterns.
]]></help>
acquayefrank marked this conversation as resolved.
Show resolved Hide resolved

<expand macro="citations"/>
</tool>
165 changes: 165 additions & 0 deletions tools/ipapy2/ipapy2_MS2_annotation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
import argparse
import os

import pandas as pd
from ipaPy2 import ipa


def main(args):
df = pd.read_csv(args.mapped_isotope_patterns, keep_default_na=False)
df = df.replace("", None)
acquayefrank marked this conversation as resolved.
Show resolved Hide resolved
dfMS2 = pd.read_csv(args.MS2_fragmentation_data, keep_default_na=False)
dfMS2 = dfMS2.replace('', None)
all_adducts = pd.read_csv(args.all_adducts, keep_default_na=False)
all_adducts = all_adducts.replace("", None)
MS2_DB = pd.read_csv(args.MS2_DB, keep_default_na=False)
MS2_DB = MS2_DB.replace("", None)

ncores = int(os.environ.get("GALAXY_SLOTS")) if args.ncores is None else args.ncores
ppmthr = args.ppmthr if args.ppmthr else 2 * args.ppm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as for the previous wrapper

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this in the XML wrapper. I did this based on the idea that a function should be self-contained and not depend too much on external conditions. Thus, the .py file exposes some functionality to the. XML file.
I did not want to assume that said parameter will be set in the .XML, even though the check is done there as well.


annotations = ipa.MSMSannotation(
df,
dfMS2,
all_adducts,
MS2_DB,
ppm=args.ppm,
ratiosd=args.ratiosd,
ppmunk=args.ppmunk,
ratiounk=args.ratiounk,
ppmthr=ppmthr,
pRTNone=args.pRTNone,
pRTout=args.pRTout,
mzdCS=args.mzdCS,
ppmCS=args.ppmCS,
CSunk=args.CSunk,
evfilt=args.evfilt,
ncores=ncores
)
annotations_flat = pd.DataFrame()
for peak_id in annotations:
annotation = annotations[peak_id]
annotation["peak_id"] = peak_id
annotations_flat = pd.concat([annotations_flat, annotation])
annotations_file = (
args.MS2_annotations if args.MS2_annotations else "MS2_annotations.csv"
)
acquayefrank marked this conversation as resolved.
Show resolved Hide resolved
annotations_flat.to_csv(annotations_file, index=False)


if __name__ == "__main__":
parser = argparse.ArgumentParser()
acquayefrank marked this conversation as resolved.
Show resolved Hide resolved
parser.add_argument(
"--mapped_isotope_patterns",
type=str,
required=True,
help="A csv file containing the MS1 data. Ideally obtained from map_isotope_patterns",
)
parser.add_argument(
"--MS2_fragmentation_data",
type=str,
required=True,
help="A csv file containing the MS2 fragmentation data",
)
parser.add_argument(
"--all_adducts",
type=str,
required=True,
help="A csv file containing the information on all the possible adducts given the database. Ideally obtained from compute_all_adducts",
)
parser.add_argument(
"--MS2_DB",
type=str,
required=True,
help="A csv file containing the MS2 database",
)
parser.add_argument(
"--ppm",
type=float,
required=True,
help="accuracy of the MS instrument used.",
)
parser.add_argument(
"--me",
type=float,
default=5.48579909065e-04,
help="accurate mass of the electron. Default 5.48579909065e-04",

)
acquayefrank marked this conversation as resolved.
Show resolved Hide resolved
parser.add_argument(
"--ratiosd",
type=float,
default=0.9,
help="acceptable ratio between predicted intensity and observed intensity of isotopes.",
)
parser.add_argument(
"--ppmunk",
type=float,
help="pm associated to the 'unknown' annotation. If not provided equal to ppm.",
)
parser.add_argument(
"--ratiounk",
type=float,
default=0.5,
help="isotope ratio associated to the 'unknown' annotation.",
)
parser.add_argument(
"--ppmthr",
type=float,
help="maximum ppm possible for the annotations. if not provided equal to 2*ppm.",
)
parser.add_argument(
"--pRTNone",
type=float,
default=0.8,
help="multiplicative factor for the RT if no RTrange present in the database.",
)
parser.add_argument(
"--pRTout",
type=float,
default=0.4,
help="multiplicative factor for the RT if measured RT is outside the RTrange present in the database.",
)
parser.add_argument(
"--mzdCS",
type=int,
default=0,
help="""maximum mz difference allowed when computing cosine similarity
scores. If one wants to use this parameter instead of ppmCS, this
must be set to 0. Default 0.""",
)
parser.add_argument(
"--ppmCS",
type=int,
default=10,
help="""maximum ppm allowed when computing cosine similarity scores.
If one wants to use this parameter instead of mzdCS, this must be
set to 0. Default 10.""",
)
parser.add_argument(
"--CSunk",
type=float,
default=0.7,
help="""cosine similarity score associated with the 'unknown' annotation.
Default 0.7""",
)
parser.add_argument(
"--evfilt",
type=bool,
default=False,
help="""Default value False. If true, only spectrum acquired with the same
collision energy are considered.""",
)
parser.add_argument(
"--MS2_annotations",
type=str,
help="MS2 annotation file for outputting results.",
)
parser.add_argument(
"--ncores",
type=int,
default=None,
help="number of cores to use for the computation.",
)
args = parser.parse_args()
main(args)
Loading
Loading