diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..22b5c14 --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +.python-version +.venv/ +.github/ +docs/_build/ +specification/annotation-schema.md diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst new file mode 100644 index 0000000..fb9e4f5 --- /dev/null +++ b/CONTRIBUTING.rst @@ -0,0 +1,55 @@ +############ +Contributing +############ + +This document briefly describes how to contribute to +`mzPAF `_. + + + +Before you begin +################ + +If you have an idea for a feature, use case to add or an approach for a bugfix, +you are welcome to communicate it with the community by opening a +thread in `GitHub Issues `_. + + + +Documentation local setup +######################### + +To work on the documentation and get a live preview, install the requirements +and run ``sphinx-autobuild``: + +.. code-block:: sh + + pip install -r ./docs/requirements.txt + sphinx-autobuild ./docs/ ./docs/_build/ + +Then browse to http://localhost:8000 to watch the live preview. + + + +How to contribute +################# + +- Fork `mzPAF `_ on GitHub to + make your changes. +- Commit and push your changes to your + `fork `_. +- Ensure that the tests and documentation (both Python docstrings and files in + ``/docs/``) have been updated according to your changes. Python + docstrings are formatted in the + `numpydoc style `_. +- Open a + `pull request `_ + with these changes. You pull request message ideally should include: + + - A description of why the changes should be made. + - A description of the implementation of the changes. + - A description of how to test the changes. + +- The pull request should pass all the continuous integration tests which are + automatically run by + `GitHub Actions `_. diff --git a/README.md b/README.md index 93e6f00..af536ed 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,118 @@ # mzPAF Peak Annotation Format -The mzPAF proposed standard is a specification for a fragment ion peak annotation format for mass spectra, focused on peptides. This provides for a standardized format for describing the origin of fragment ions to be used in spectral libraries, other formats that aim to describe fragment ions, and software tools that annotate fragment ions. +## About -The main home page for mzPAF is at the PSI web site: [https://psidev.info/mzPAF](https://psidev.info/mzPAF) +mzPAF is a specification for a fragment ion peak annotation format for mass spectra, focused on +peptides. This provides for a standardized format for describing the origin of fragment ions to be +used in spectral libraries, other formats that aim to describe fragment ions, and software tools +that annotate fragment ions. -# Status +- Official mzPAF homepage: [psidev.info/mzPAF](https://psidev.info/mzPAF) +- mzPAF documentation: [mzpaf.readthedocs.io](https://mzpaf.readthedocs.io) -Updated: 2024-10-15 +## Status -The specification has been resubmitted to the PSI Document Process and is undergoing final community review. It is anticipated to become a formal PSI standard near the end of 2024. +_Updated: 2024-10-15_ + +The specification has been resubmitted to the PSI Document Process and is undergoing final +community review. It is anticipated to become a formal PSI standard near the end of 2024. -# Available Materials - The current DRAFT specification: [mzPAF_specification_v1.0-draft15.pdf](https://github.com/HUPO-PSI/mzPAF/blob/main/specification/mzPAF_specification_v1.0-draft15.pdf?raw=true) - Example annotated spectra: [Examples](https://github.com/HUPO-PSI/mzPAF/tree/main/examples) -- The GitHub repo associated with mzPAF: [https://github.com/HUPO-PSI/mzPAF](https://github.com/HUPO-PSI/mzPAF) -- The GitHub repo assocated with the related mzSpecLib standard: [https://github.com/HUPO-PSI/mzSpecLib](https://github.com/HUPO-PSI/mzSpecLib) +## In short + +- mzPAF is a single string of characters, case sensitive, without length limit +- Multiple possible explanations are comma-separated +- Deltas of observed – theoretical _m/z_ values are prefixed with a slash (`/`) +- Confidence of annotations are prefixed with an asterisk (`*`) + +The basic format of each annotation is: + +``` +annotation1/delta,annotation2/delta,... +``` + +or: + +``` +annotation1/delta*confidence,annotation2/delta*confidence,... +``` + +For example: + +``` +b2-H2O/3.2ppm,b4-H2O^2/3.2ppm +``` + +or: + +``` +b2-H2O/3.2ppm*0.75,b4-H2O^2/3.2ppm*0.25 +``` + +mzPAF supports: + +- Annotations of multiple analytes: `1@y12/0.13,2@b9-NH3/0.23` +- Mass deltas in ppm instead of _m/z_ unit: `y1/-1.4ppm` +- Confidence levels per annotation: `y1/-1.4ppm*0.75` +- Advanced ion notation: `[ion type](neutral loss)(isotope)(adduct type)(charge)`, e.g.: `y4-H2O+2i[M+H+Na]^2`: + - Ion types: + - Peptide ion series (a, b, c, x, y, z): `y4` + - Unknown ions: `?` + - Immonium ions: `IY` + - Internal fragment ions: `m3:6` + - Intact precursor ions: `p^2` + - A set of reference ions: `r[TMT127N]` + - Named compounds: `_{Urocanic Acid}` + - Chemical formulas: `f{C16H22O}` + - Smiles: `s{CN=C=O}[M+H]` + - Embedded ProForma annotations: `0@b2{LC[Carbamidomethyl]}` + - Neutral gains and losses: `y2+CO-H2O` + - Isotopes: `y2+2i` + - Adduct types: `y2[M+H]` + - Charge states: `^2` +- Multiple peaks per annotation: `&y7/-0.001` and `y7/0.000*0.95` + +Read the +[full DRAFT specificiation](https://github.com/HUPO-PSI/mzPAF/blob/main/specification/mzPAF_specification_v1.0-draft14.docx?raw=true) +for more details and examples. + +## Getting started + +### mzPAF in Python + +The [mzPAF Python package](https://mzpaf.readthedocs.io/en/latest/implementations/python/) can +parse mzPAF strings into their components, convert to the JSON representation, or serialize back +to an mzPAF string. + +```python +>>> import mzpaf +>>> annotations = mzpaf.parse_annotation("b2-H2O/3.2ppm*0.75,b4-H2O^2/3.2ppm*0.25") +>>> print(annotations[0].to_json()) +{'neutral_losses': ['-H2O'], 'isotope': 0, 'adducts': [], 'charge': 1, 'analyte_reference': None, 'mass_error': {'value': 3.2, 'unit': 'ppm'}, 'confidence': 0.75, 'molecule_description': {'series_label': 'peptide', 'series': 'b', 'position': 2, 'sequence': None}} +>>> print(anno[0].serialize()) +'b2-H2O/3.2ppm*0.75' +``` + +Learn more at the +[package documentation](https://mzpaf.readthedocs.io/en/latest/implementations/python/). + +### mzPAF regular expressions + +The mzPAF specification includes regular expressions for parsing mzPAF strings. These can be used +in any programming language that supports regular expressions. + +Learn more at the +[mzPAF regex documentation](https://mzpaf.readthedocs.io/en/latest/implementations/regex/). + +### mzPAF Lark grammar + +mzPAF has also been defined as a +[Lark grammar](https://mzpaf.readthedocs.io/en/latest/implementations/lark/). + +### Links + +- The mzPAF GitHub repo: [github.com/HUPO-PSI/mzPAF](https://github.com/HUPO-PSI/mzPAF) +- The GitHub repo for the related mzSpecLib standard: [github.com/HUPO-PSI/mzSpecLib](https://github.com/HUPO-PSI/mzSpecLib) +- HUPO-PSI homepage: [psidev.info](https://www.psidev.info/) diff --git a/docs/.readthedocs.yaml b/docs/.readthedocs.yaml index 31c93e2..d218f1f 100644 --- a/docs/.readthedocs.yaml +++ b/docs/.readthedocs.yaml @@ -19,3 +19,4 @@ python: path: implementations/python extra_requirements: - docs + - requirements: docs/requirements.txt diff --git a/docs/conf.py b/docs/conf.py index abc0a5a..5aa2081 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -1,5 +1,53 @@ """Configuration file for the Sphinx documentation builder.""" +# Scripts +import json +import shutil +from pathlib import Path + +import jsonschema2md +import pandas as pd + + +def get_jsonschema_docs(input_json, output_markdown): + """Generate markdown documentation from a JSON schema.""" + parser = jsonschema2md.Parser() + with open(input_json, encoding="utf-8") as f_in: + output_md = parser.parse_schema(json.load(f_in)) + + with open(output_markdown, "w", encoding="utf-8") as f_out: + f_out.writelines(output_md) + + +def get_reference_molecules_md(input_json, output_markdown): + """Generate a markdown table of reference molecules.""" + df = pd.read_json(input_json).T + buf = df.to_markdown().replace(' nan ', ' ') + with open(output_markdown, 'wt') as fh: + fh.write(buf) + + +get_jsonschema_docs( + "../specification/annotation-schema.json", + "../specification/annotation-schema.md" +) +get_jsonschema_docs( + "../specification/reference_data/reference_molecule_schema.json", + "../specification/reference_data/reference_molecule_schema.md" +) + +get_reference_molecules_md( + "../specification/reference_data/reference_molecules.json", + "../specification/reference_data/reference_molecules.md" +) + +if not Path("_static/img/lark-railroad-diagram.svg").exists(): + shutil.copy( + "../specification/grammars/schema_images/Annotation.svg", + "_static/img/lark-railroad-diagram.svg" + ) + + # Project information project = "mzPAF" author = "HUPO-PSI" @@ -16,7 +64,7 @@ "sphinx_click.ext", "myst_parser", ] -source_suffix = [".rst"] +source_suffix = [".rst", ".md"] master_doc = "index" exclude_patterns = ["_build"] @@ -46,6 +94,7 @@ "python": ("https://docs.python.org/3", None), "psims": ("https://mobiusklein.github.io/psims/docs/build/html/", None), "pyteomics": ("https://pyteomics.readthedocs.io/en/stable/", None), + "mzspeclib": ("https://mzspeclib.readthedocs.io/en/latest/", None), } diff --git a/docs/contributing.rst b/docs/contributing.rst new file mode 100644 index 0000000..a021d3e --- /dev/null +++ b/docs/contributing.rst @@ -0,0 +1 @@ +.. include:: ../CONTRIBUTING.rst diff --git a/docs/implementations/json/index.rst b/docs/implementations/json/index.rst new file mode 100644 index 0000000..c4ec07d --- /dev/null +++ b/docs/implementations/json/index.rst @@ -0,0 +1,36 @@ +########### +JSON Schema +########### + +About +===== + +Instead of representing mzPAF as a single string, it can alternatively be expressed as a JSON +object. This format is more compatible for inter-program communication, especially through web +APIs. You can find the JSON schema for mzPAF on GitHub via the following link: + +https://raw.githubusercontent.com/HUPO-PSI/mzPAF/main/specification/annotation-schema.json + +Replace ``main`` in the URL with the desired version tag to access the schema for a particular +version. + +Examples +======== + +.. literalinclude:: ../../../specification/annotation-example-1.json + :language: json + +.. literalinclude:: ../../../specification/annotation-example-2.json + :language: json + +.. literalinclude:: ../../../specification/annotation-example-3.json + :language: json + + + +Full schema documentation +========================= + +.. include:: ../../../specification/annotation-schema.md + :parser: myst_parser.sphinx_ + :start-line: 4 diff --git a/docs/implementations/lark/index.rst b/docs/implementations/lark/index.rst new file mode 100644 index 0000000..a309c39 --- /dev/null +++ b/docs/implementations/lark/index.rst @@ -0,0 +1,17 @@ +############ +Lark grammar +############ + + +About +===== + +[todo] + + +Railroad diagram +================ + +.. figure:: ../../_static/img/lark-railroad-diagram.svg + :alt: Lark grammar + diff --git a/docs/implementations/python/api.rst b/docs/implementations/python/api.rst index 08ec3f3..860b993 100644 --- a/docs/implementations/python/api.rst +++ b/docs/implementations/python/api.rst @@ -7,6 +7,8 @@ Python API :imported-members: + .. manually documented as parse_annotation is undocumented + .. autofunction:: parse_annotation Parse a string into one or more :class:`IonAnnotationBase` instances. diff --git a/docs/implementations/python/index.rst b/docs/implementations/python/index.rst index 7d589cf..fb7e568 100644 --- a/docs/implementations/python/index.rst +++ b/docs/implementations/python/index.rst @@ -2,10 +2,19 @@ Python implementation ##################### +About +===== + +.. include:: ../../../implementations/python/README.md + :parser: myst_parser.sphinx_ + + +Full API documentation +====================== + .. toctree:: :caption: Contents :maxdepth: 2 :glob: * - diff --git a/docs/implementations/regex/index.rst b/docs/implementations/regex/index.rst new file mode 100644 index 0000000..8178b3e --- /dev/null +++ b/docs/implementations/regex/index.rst @@ -0,0 +1,25 @@ +################### +Regular expressions +################### + +mzPAF has been defined in several regular expression dialects. + +.. tip:: + + Regex101.com is a great tool to test regular expressions. Try out the mzPAF regex there: + `regex101.com/r/gDPlJu/1 `_. + +Python +====== + +.. literalinclude:: ../../../specification/grammars/regex_sre.py + :language: python + :linenos: + + +Javascript ECMA +=============== + +.. literalinclude:: ../../../specification/grammars/regex_ecma.js + :language: javascript + :linenos: diff --git a/docs/index.rst b/docs/index.rst index 8069bc7..da495cf 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,7 +1,6 @@ .. include:: ../README.md :parser: myst_parser.sphinx_ - .. toctree:: :caption: About :hidden: @@ -9,5 +8,6 @@ :glob: Home - implementations/index - specification/index + Specification + Implementations + Contributing diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 0000000..7ee5714 --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,8 @@ +sphinx +pydata-sphinx-theme +numpydoc +sphinx_click +myst-parser +sphinx-autobuild +jsonschema2md +pandas diff --git a/docs/specification/index.rst b/docs/specification/index.rst index 130bf73..bba8ec3 100644 --- a/docs/specification/index.rst +++ b/docs/specification/index.rst @@ -1,4 +1,16 @@ -Specification Documents -======================= +###################### +Specification document +###################### -The mzPAF specification draft can be found on `GitHub `_. +.. toctree:: + :hidden: + :glob: + + Specification document + ./* + +.. + TODO: Add when released + +The mzPAF specification drafts can be found on +`GitHub `_. diff --git a/docs/specification/reference-molecules.rst b/docs/specification/reference-molecules.rst new file mode 100644 index 0000000..8efce1e --- /dev/null +++ b/docs/specification/reference-molecules.rst @@ -0,0 +1,35 @@ +################### +Reference molecules +################### + +About +===== + +.. include:: ../../specification/reference_data/README.md + :parser: myst_parser.sphinx_ + :start-line: 2 + :end-line: -1 +.. + skip including title and last line with reference to this page + +See :ref:`Reference molecule ions` in the specification document for more information. + + +Reference molecule table +======================== + +The following analytes can be annotated as reference molecules with the ``r`` prefix and the +listed name between square brackets (e.g. ``r[TMT127N]``). + +.. include:: ../../specification/reference_data/reference_molecules.md + :parser: myst_parser.sphinx_ + + +JSON schema +=========== + +The ``reference_molecules.json`` file is defined by the following schema: + +.. include:: ../../specification/reference_data/reference_molecule_schema.md + :parser: myst_parser.sphinx_ + :start-line: 3 diff --git a/specification/reference_data/README.md b/specification/reference_data/README.md index 66ee22a..4bd14a4 100644 --- a/specification/reference_data/README.md +++ b/specification/reference_data/README.md @@ -1,13 +1,16 @@ # mzPAF specification reference data files -The mzPAF specification uses these files as auxiliary reference data so that enumerated values can be extended without altering the specification document. +The mzPAF specification uses `specification/reference_data/reference_molecules.json` as auxiliary +reference data. In this way, the set of reference molecules can be extended without updating the +specification document itself. -- reference_molecules.json - Easily software parsable list of "reference molecules" often seen in peptide fragmentation spectra, but - not normal peptide fragments, including isobaric labeling reagent related molecules, monosaccharides, nucleotides, etc. These - molecules may be inidividual charged ions (typically protonated), or may be used as neutral losses as appropriate. +The following files are available: -- reference_molecules.md - Human-readable markdown tabular version of reference_molecules.json +- `reference_molecules.json`: Software parsable list of "reference molecules" often seen in + peptide fragmentation spectra, but not normal peptide fragments. This includes isobaric labeling + reagent related molecules, monosaccharides, nucleotides, etc. These molecules may be individual + charged ions (typically protonated), or may be used as neutral losses as appropriate. -- reference_molecule_schema.json - JSON schema for reference_molecules.json +- `reference_molecule_schema.json`: JSON schema defining the structure of the JSON file -- reference_mol_to_md.py - Python script to transform reference_molecules.json into a markdown table \ No newline at end of file +A human-readable table with all reference molecules is available on https://mzpaf.readthedocs.io. diff --git a/specification/reference_data/reference_mol_to_md.py b/specification/reference_data/reference_mol_to_md.py deleted file mode 100644 index 8d9005a..0000000 --- a/specification/reference_data/reference_mol_to_md.py +++ /dev/null @@ -1,7 +0,0 @@ -import pandas as pd - -df = pd.read_json("reference_molecules.json").T -buf = df.to_markdown().replace(' nan ', ' ') - -with open('./reference_molecules.md', 'wt') as fh: - fh.write(buf) \ No newline at end of file diff --git a/specification/reference_data/reference_molecule_schema.md b/specification/reference_data/reference_molecule_schema.md new file mode 100644 index 0000000..8e99c95 --- /dev/null +++ b/specification/reference_data/reference_molecule_schema.md @@ -0,0 +1,34 @@ +# HUPO-PSI mzSpecLib reference molecule and ion list + +*Describe reference molecules or ions found in spectral libraries* + +## Pattern Properties + +- **`.{1,}`**: Refer to *[#/definitions/molecule](#definitions/molecule)*. +## Definitions + +- **`molecule`** *(object)*: A single molecule that may be present as a reporter ion or signature ion, or be a component of a neutral loss. + - **`name`** *(string)*: The formal name for this molecule by which it should be referenced. + - **`cv_term`** *(array)* + - **Items** *(string)* + - **`neutral_mass`** *(number)*: The neutral mass of the molecule not including any charge or charge carrier. + - **`molecule_type`** *(string)*: A categorical label for this molecule. + + Examples: + ```json + "monosaccharide" + ``` + + ```json + "reporter" + ``` + + ```json + "reporter+balance" + ``` + + - **`ion_mz`** *(number)*: The m/z of the molecule if it is expected to be reasonably different from the uncharged version. + - **`chemical_formula`** *(string)*: The elemental formula of the neutral molecule. + - **`ion_chemical_formula`** *(string)*: The chemical formula of the charged molecule. + - **`references`** *(array)*: An array of sources and references describing this entity. + - **Items** *(string)*