Skip to content

Commit

Permalink
Update example
Browse files Browse the repository at this point in the history
  • Loading branch information
bittremieux committed Jan 10, 2025
1 parent 58cc6c5 commit c5bb8c2
Show file tree
Hide file tree
Showing 11 changed files with 18,775 additions and 18,738 deletions.
9 changes: 6 additions & 3 deletions docs/pages/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@ title: "mzQC Examples"
permalink: /examples/
---

The following use cases provide several hands-on examples of how mzQC files are structured and can be used:
These introductory use cases provide examples of how mzQC files are structured and can be used:

- [Representing QC data for an individual mass spectrometry run](intro_run/)
- [Deriving QC data from multiple related mass spectrometry runs](intro_set/)
- [Tracking instrument performance using controlled QC samples](intro_qc2/)
- [Batch correction](metabo-batches/)

The following use cases demonstrate how mzQC files can be used for real-life quality control reporting:

- [Tracking instrument performance using controlled QC samples](example_qc2_longitudinal/)
- [Batch correction in metabolomics](example_batch_correction/)

Additionally, for more advanced usage, mzQC can closely interoperate with several other file formats developed by the Proteomics Standards Initiative:

Expand Down
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
150 changes: 150 additions & 0 deletions docs/pages/worked-examples/example_batch_correction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
layout: page
title: "Batch Correction in Metabolomics with mzQC"
permalink: /examples/example_batch_correction/
---

This document demonstrates the use of the mzQC file format for capturing and comparing quality metrics before and after batch correction in a metabolomics study.
The mzQC example provided is based on data from GC-ToF-MS analysis of polar metabolites from an _Arabidopsis_ nucleotype-plasmotype diallel study, as described in [Wehrens et al. (2016)](https://dx.doi.org/10.1007%2Fs11306-016-1015-8).
The full mzQC file is available [here](https://github.com/HUPO-PSI/mzQC/tree/main/specification_documents/examples/example_batch_correction.mzQC).

Batch effects in mass spectrometry data can obscure biological signals and compromise downstream analyses.
This example illustrates how mzQC can store and compare quality metrics for evaluating the impact of batch correction methods.
By leveraging the structured format of mzQC, users can:
- Track the effects of batch correction using quality metrics.
- Easily compare data before and after correction.
- Visualize and analyze metrics for quality assurance.

## Data

The dataset includes 240 GC-ToF-MS runs from the _set3_ data of Wehrens et al. (2016).
We will use the following data files, derived from the `BC.RData` file on the [GitHub repository of the original study](https://github.com/rwehrens/BatchCorrMetabolomics):
- `set3.peakarea.csv`: Unprocessed peak area data.
- `set3.uncorrected.PCA.csv`: Principal component analysis (PCA) results before batch correction.
- `set3.corrected.PCA.csv`: PCA results after batch correction.

Batch correction was performed using the [BatchCorrMetabolomics](https://github.com/rwehrens/BatchCorrMetabolomics) R package, with scripts provided in [`example_batch_correction.R`](example_batch_correction.R).
The corrected PCA results (`set3.corrected.PCA.csv`) capture the batch-adjusted data.

## Quality metrics

The mzQC format can organize data into `runQuality` and `setQuality` sections, with `runQuality` used to store metrics corresponding to individual runs and `setQuality` capturing metrics for collections of runs.
This distinction helps in organizing and analyzing data at both granular and holistic levels.
Quality metrics in mzQC capture data characteristics and processing outcomes.

**Metrics for individual runs**

Each run is represented by its own `runQuality`, with metrics specific to that run.
For instance:

```
"runQualities": [
{
"metadata": {
"inputFiles": [
{
"location": "file://tmp/GCMS-ToF-sample-10.mzML",
"name": "GCMS-ToF-sample-10",
"fileFormat": {
"accession": "MS:1000584",
"name": "mzML format"
},
"fileProperties": [
{
"accession": "MS:1000031",
"name": "instrument model",
"value": "GC-ToF-MS (Agilent 6890 GC coupled to a Leco Pegasus III MS)"
}
]
},
{
"location": "file://tmp/GCMS-ToF-sample-10.mztab",
"name": "GCMS-ToF-sample-10",
"fileFormat": {
"accession": "MS:1003389",
"name": "mzTab-M"
}
}
]
},
"qualityMetrics": [
{
"accession": "MS:4000103",
"name": "number of identified quantification data points",
"description": "The number of identified data points for quantification purposes within the run after user defined acceptance criteria are applied. These data points may be for example XIC profiles, isotopic pattern areas, or reporter ions (see MS:1001805). The used type should be noted in the metadata or analysis methods section of the recording file for the respective run. In case of multiple acceptance criteria (FDR) available in proteomics, PSM-level FDR should be used for better comparability.",
"value": 57,
"unit": {
"accession": "UO:0000189",
"name": "count unit"
}
}
]
},
]
```

This provides detailed information for each individual run, ensuring granularity in quality control.

### Analysis of all runs

For multiple runs, metrics are aggregated in the `setQuality` section.
This allows for analysis of batch effects, performance trends, and overall data quality.
For example:

**Before batch correction:**

```
{
"accession": "MS:4000092",
"name": "identified MS1 feature area principal component analysis result",
"description": "A table with the PCA results of identified MS1 feature areas.",
"value": {
"MS:4000086": ["GCMS-ToF-sample-10", "GCMS-ToF-sample-100", "GCMS-ToF-sample-101", ...],
"MS:4000081": [-3.3489633839, 0.4191257477, 6.8241553933, ...],
"MS:4000082": [-2.3414347017, 2.0552198422, 1.5142354815, ...],
"MS:4000083": [-1.486755263, -0.3965900879, 1.1636677021, ...],
"MS:4000084": [-0.2766203768, 1.7808802633, 0.1736233713, ...],
"MS:4000085": [-2.6836316103, -2.0202377954, -3.0888055462, ...],
"MS:4000089": [13, 16, 17, ...],
"MS:4000088": [4, 7, 7, ...]
}
}
```

**After batch correction:**

```
{
"accession": "MS:4000094",
"name": "batch-corrected identified MS1 feature area principal component analysis result",
"description": "A table with the PCA results of identified MS1 feature areas after batch-correction.",
"value": {
"MS:4000086": ["GCMS-ToF-sample-10", "GCMS-ToF-sample-100", "GCMS-ToF-sample-101", ...],
"MS:4000081": [-0.4378513055, 0.041082478, 5.464116568, ...],
"MS:4000082": [-1.3379076029, 2.0719734906, 3.1049060343, ...],
"MS:4000083": [2.4957145183, 2.0074886436, 2.6374608754, ...],
"MS:4000084": [2.195431331, -1.3532219705, 1.9931159041, ...],
"MS:4000085": [0.7936133863, -0.1016825037, -0.9434314272, ...],
"MS:4000089": [13, 16, 17, ...],
"MS:4000088": [4, 7, 7, ...]
}
}
```

This specific QC metric is a table metric, with various columns represented by controlled vocabulary (CV) terms.
Each column corresponds to a specific aspect of the PCA results, such as the run names (`MS:4000086`), principal component values (`MS:4000081`, `MS:4000082`, en `MS:4000083`), batch labels (`MS:4000088`), or injection sequence labels (`MS:4000089`).

### Comparing metrics

The structured PCA results in mzQC allow side-by-side comparison of metrics before and after batch correction.

before | after
--- | ---
![PCA before batch correction](../../pages/figures/example_batch_correction_before.png) | ![PCA before after correction](../../pages/figures/example_batch_correction_after.png)

This facilitates:
- Quantitative assessment of batch correction effectiveness.
- Visualization of improvements via PCA plots.
- Identification of residual batch effects.

This example highlights how mzQC can streamline quality control processes in metabolomics and foster collaboration through standardized and transparent reporting.
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
---
layout: page
title: "Introduction to mzQC – Tracking Instrument Performance"
permalink: /examples/intro_qc2/
title: "Tracking Instrument Performance with mzQC"
permalink: /examples/example_qc2_longitudinal/
---

This document outlines the utilization of an mzQC file for quality control (QC) of a mass spectrometry proteomics experiment.
The mzQC file discussed here is derived from a QC2 sample, following protocols established in the publication, [QCloud: A cloud-based quality control system for mass spectrometry-based proteomics laboratories](https://doi.org/10.1371/journal.pone.0189209).
A QC2 sample is defined as a high complexity sample that mimics real samples analyzed in a proteomics laboratory, and is meant to be injected 1–5 times per week as a sample to test system suitability.

Here we demonstrate how real-life QC metrics are calculated for a single mass spectrometry run using tools such as QCloud.
You can view the complete structure of this mzQC example [here](https://github.com/HUPO-PSI/mzQC/tree/main/specification_documents/examples/intro_qc2.mzQC).
You can view the complete structure of this mzQC example [here](https://github.com/HUPO-PSI/mzQC/tree/main/specification_documents/examples/example_qc2_longitudinal.mzQC).

## File description

Expand Down Expand Up @@ -217,6 +217,6 @@ The structured data in mzQC allows for effective visualization and analysis, suc
This can help identify any deviations or potential issues with the mass spectrometry process, prompting timely maintenance and calibration actions to maintain optimal performance.
For example, Levey-Jennings charts can be used to enable quick visual assessment of instrument stability or drift, critical for high-stakes or high-throughput proteomics workflows:

![Levey-Jennings control chart](../../pages/figures/intro_qc2_ljcc.png)
![Levey-Jennings control chart](../../pages/figures/example_qc2_longitudinal_ljcc.png)

This example demonstrates how QC information in mzQC files helps in monitoring instrument performance, ensuring that maintenance is proactive and timely, thereby preserving the integrity and effectiveness of subsequent analyses.
116 changes: 0 additions & 116 deletions docs/pages/worked-examples/metabo-batches.mzQC.md

This file was deleted.

Loading

0 comments on commit c5bb8c2

Please sign in to comment.