Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CV term for frames picked in IM dimension #361

Open
timosachsenberg opened this issue Jan 14, 2025 · 6 comments · May be fixed by #365
Open

CV term for frames picked in IM dimension #361

timosachsenberg opened this issue Jan 14, 2025 · 6 comments · May be fixed by #365

Comments

@timosachsenberg
Copy link
Contributor

Describe the question or discussion

Hi,
I may have missed something, but I could not find a suitable CV term for a spectrum obtained by centroiding/peak picking a frame – essentially a concatenated spectrum with a float data array – in the ion mobility (IM) dimension. The resulting, much sparser spectrum has a representation that is essentially the same as the original frame. Without an appropriate CV term, one would need to inspect the actual data to distinguish between raw data in concatenated format and IM picked data.

@mobiusklein
Copy link
Contributor

It's a larger issue about the data model of an ion mobility frame.

  • Is the spectrum in 3D? If it has an ion mobility array of some kind it indicates this but it's hacky
  • Is the m/z dimension centroided or profile? For IM instruments, historically, Bruker only exports centroids and Waters only exports profiles through ProteoWizard. Directly using new vendor bindings might differ
  • Is the ion mobility dimension centroid or profile (or start/end bounded)?
  • What is the ion mobility range of the frame, like the m/z range of a spectrum? (see https://github.com/ProteoWizard/pwiz/blob/master/pwiz/data/vendor_readers/Bruker/SpectrumList_Bruker.cpp#L369-L372 for example userParams in ProteoWizard)

These are missing from the ion mobility + DIA acquisition modes for mzML document located at https://docs.google.com/document/d/13TpfPOQGShp-RsDPB-FE6abLtipzopHXkLZ3jP3FAA0/edit?usp=sharing

I haven't seen many tools that do IM feature detection that write out the results in mzML without going all the way through to collapsing the IM dimension and producing flat centroid peak lists. There isn't an ecosystem of producers/consumers to justify it in the same way we produce/consume profile and centroid spectra in mzML.

We could add children of spectrum representation for each state explicitly, but that's liable to cause older software to crash. That leaves creating new spectrum attribute terms. Ideally we'd modify the semantic validator for mzML too then.

New terms:

  • ion mobility frame representation is a spectrum attribute
    • ion mobility point is a ion mobility frame representation: This spectrum a single IM point scan, not a 3D spectrum. Alternatively, this is the null case where the term is just not present.
    • ion mobility profile frame is a ion mobility frame representation: This spectrum has a continuous ion mobility dimension.
    • ion mobility centroid frame is a ion mobility frame representation: This spectrum has a centroided ion mobility dimension.
    • ion mobility feature frame is a ion mobility frame representation: This spectrum has some structured ion mobility dimension. (semantics TBD)
  • lowest observed ion mobility is a ion mobility frame representation
  • highest observed ion mobility is a ion mobility frame representation

@mobiusklein
Copy link
Contributor

mobiusklein commented Jan 19, 2025

I slapped together an example using a timsTOF data file I had lying around: ims_example.mzML.gz that was converted using ProteoWizard in 3D format.

By index:

  • 0 - The unprocessed spectrum as produced by ProteoWizard with some userParams added. The m/z dimension is centroided, but the ion mobility (IM) dimension is in profile mode so I marked it as ion mobility centroid frame ion mobility profile frame.
  • 1 - After doing feature detection on the frame, I added a non-standard array, feature identifier array, that acts as a secondary index for re-assembling IM-MS features for the frame, but otherwise serializes back into the same 3D format that the original data was in. The feature detection process drops a few data points. This is the usecase for ion mobility feature frame. I don't discard the profile information because I might want to know the width of the IM feature for grouping with MS2 features.
  • 2 - After doing feature map deisotoping and charge deconvolution, I now have charge states for the estimated monoisotopic m/z values of solved features, and all other features which did not fit an isotopic pattern over two time points are dropped. In addition to the feature identifier array, I have tracked a bunch of additional data from the deconvolution process, some of which don't follow the m/z array. This is still an ion mobility feature frame.
  • 3 - After centroiding all the deconvolved IM features, I just keep the m/z and IM of the peak apex, plus all the charge and intensity information, leaving the spectrum essentially a centroided peak list in all dimensions. This is the use-case for ion mobility centroid frame.

I don't have an example here for ion mobility point, since that is the non-3D format without an ion mobility array.

edit: fixed flipped term caught by Timo

@mobiusklein
Copy link
Contributor

@timosachsenberg for the OpenMS use-case you were discussing, is the ion mobility peak picker attempting to centroid in both dimensions or do feature detection in the IM dimension over centroids in the m/z dimension?

@timosachsenberg
Copy link
Contributor Author

Hi @mobiusklein ,
Thank you so much for your input, and my apologies for the delayed response.

I took a quick look, and for 0 you probably mean "ion mobility profile frame"? I think this would make for a good cv term!

Regarding your question: Different components within OpenMS/OpenSWATH handle feature detection and IM differently. For DDA, one focus is adapting existing algorithms to be compatible with IM data. To make this easy, we want to centroid in the m/z and IM dimensions first (pure centroiding without looking at isotopic patterns) and then running/adapting existing algorithms.
Most existing algorithms for (e.g. feature detection) should function ok-ish with such data (if IM mobility is ignored). But more importantly, we typically can easily modify existing algorithms to make use of the IM information in the separate float data array. E.g., changing our MS1 metabolite mass trace detection just needs to consider if (in addition to m/z tolerances) IM values are compatible (according to some tolerances) when tracing peaks. Likewise, assembly of mass traces into features can easily consider the IM values or IM ranges of the traces.

I hope I understood everything correctly, but I think this approach would give rise to a bit different CV terms because we do this data reduction step from raw to centroided m/z and IM in the beginning (e.g., during loading already).

Regarding "I don't discard the profile information because I might want to know the width of the IM feature for grouping with MS2 features." - this is an interesting idea I also thought about. I just wasn't sure if this information is needed at this point - do you have some experience here? We will keep that in mind.

@mobiusklein
Copy link
Contributor

Thank you for the feedback. You're right, I wrote profile in the mzML file but centroid in the post, I'll edit the text for case 0.

I may be using different nomenclature, I suspect what I call a feature, you call a mass trace, and what you call a feature is what I treat as a set of features describing an isotopic pattern.

If I understood correctly, you're looking to first port components to produce an ion mobility centroid frame representation from "any" input representation, which you can then run algorithms which work with centroid spectrum-like data. Later, you'd incrementally upgrade them to make use of the IM dimension, with the end-goal is for ion mobility centroid frame to be the representation you build features when handling IM data.

What would be the representation you would want to write the output in, or would that be featureXML?

RE the last point, I wanted to retain the whole feature profile because I wanted to be able to decouple the "IM trace extraction", "IM isotopic pattern fitting/charge deconvolution" and "RT feature extraction" stages of a program. Later I wanted to build something DIAUmpire-esque for Waters MSe data where knowing if ion mobility peaks overlapped and by how much along with retention time overlap. Naturally, it was also handy for data exploration and understanding the data shape between each phase. I think this is less of an issue for Bruker DIA-PASEF since you still have isolation windows.

By "needed at this point", do you mean that there's no ecosystem of tools consuming these intermediary representations because those available today just do the whole process end-to-end?

@timosachsenberg
Copy link
Contributor Author

I may be using different nomenclature, I suspect what I call a feature, you call a mass trace, and what you call a feature is what I treat as a set of features describing an isotopic pattern.

Yeah, I think we just use different nomenclature.

If I understood correctly, you're looking to first port components to produce an ion mobility centroid frame representation from "any" input representation, which you can then run algorithms which work with centroid spectrum-like data. Later, you'd incrementally upgrade them to make use of the IM dimension, with the end-goal is for ion mobility centroid frame to be the representation you build features when handling IM data.

Exactly.

What would be the representation you would want to write the output in, or would that be featureXML?
This is mainly a question how developer want to modularize their workflow.
Some would be happy to have a separate tool that creates the ion mobility centroid frame and write it out to mzML. Others might want to do it end to end (e.g., because they don't want to do the file io).
Ultimately each tool will output a feature format (e.g., featureXML) or something more condensed (we have tools that do some granular processing and some tools that bundle some processing steps (e.g., already report protein quantities).

RE the last point, I wanted to retain the whole feature profile because I wanted to be able to decouple the "IM trace extraction", "IM isotopic pattern fitting/charge deconvolution" and "RT feature extraction" stages of a program. Later I wanted to build something DIAUmpire-esque for Waters MSe data where knowing if ion mobility peaks overlapped and by how much along with retention time overlap. Naturally, it was also handy for data exploration and understanding the data shape between each phase. I think this is less of an issue for Bruker DIA-PASEF since you still have isolation windows.

Makes sense.

By "needed at this point", do you mean that there's no ecosystem of tools consuming these intermediary representations because those available today just do the whole process end-to-end?

I was mainly referring to keeping the full IM profile for further processing. We currently don't do this in OpenMS but I see that it makes sense for some data/instruments.

Thanks again for sharing your insights!

@mobiusklein mobiusklein linked a pull request Jan 22, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants