-
Notifications
You must be signed in to change notification settings - Fork 2
Search MS databases
This page has been generated automatically from a package vignette. Please do not edit, since your modifications will later be removed.
Create an instance of the Biodb
class:
mybiodb <- biodb::Biodb$new()
For this vignette, we will use an in-house database, built from a data frame. There are two other mass databases in biodb: Peakforest and Massbank database. However the Massbank database requires to download and extract the whole database first, which takes several minutes, while the access to Peakforest is currently restricted to the partners of the MetaboHUB project. We will then use a small custom database, built from Massbank Japan using the biodb script peak-extractor, specially for the purpose of our examples.
Create the database data frame:
header <- c('accession', 'formula', 'ms.mode', 'ms.level', 'peak.mz', 'peak.intensity', 'peak.relative.intensity', 'peak.formula', 'msprecannot', 'msprecmz', 'peak.attr')
db <- rbind.data.frame(
list('BML80005', 'C12H14N2O2', 'pos', 1, 219.1127765, 373076, 100 , NA_character_, NA_character_, NA_real_, NA_character_),
list('BML80012', 'C15H18O8', 'pos', 1, 327.1074765, 33174, 100 , NA_character_, NA_character_, NA_real_, NA_character_),
list('BML80013', 'C15H18O8', 'neg', 1, 325.0929235, 1595373, 56.15616, NA_character_, NA_character_, NA_real_, NA_character_),
list('BML80013', 'C15H18O8', 'neg', 1, 361.069602, 2841342, 100 , NA_character_, NA_character_, NA_real_, NA_character_),
list('BML80020', 'C17H21NO3', 'pos', 1, 288.1593765, 781807, 100 , NA_character_, NA_character_, NA_real_, NA_character_),
list('BML80020', 'C17H21NO3', 'pos', 1, 310.141318, 11409, 1.501502, NA_character_, NA_character_, NA_real_, NA_character_),
list('AU200951', 'C7H5F3O', 'neg', 2, 161.0238, 38176 , 100 , 'C7H4F3O-', '[M-H]-', 161.022 , '[M-H]-' ),
list('AU200951', 'C7H5F3O', 'neg', 2, 162.0274, 1780 , 4.604605, 'C6[13]CH4F3O-', '[M-H]-', 161.022 , NA_character_),
list('AU200951', 'C7H5F3O', 'neg', 2, 141.0167, 616 , 1.601602, 'C7H3F2O-', '[M-H]-', 161.022 , NA_character_),
list('AU200952', 'C7H5F3O', 'neg', 2, 161.0246, 6180 , 100 , 'C7H4F3O-', '[M-H]-', 161.022 , '[M-H]-' ),
list('AU200952', 'C7H5F3O', 'neg', 2, 141.0184, 1384 , 22.32232, 'C7H3F2O-', '[M-H]-', 161.022 , NA_character_),
list('AU200952', 'C7H5F3O', 'neg', 2, 121.0113, 1180 , 19.01902, 'C7H2FO-', '[M-H]-', 161.022 , NA_character_),
list('AU200952', 'C7H5F3O', 'neg', 2, 162.0282, 388 , 6.206206, 'C6[13]CH4F3O-', '[M-H]-', 161.022 , NA_character_),
list('AU200953', 'C7H5F3O', 'neg', 2, 121.0113, 828 , 100 , 'C7H2FO-', '[M-H]-', 161.022 , NA_character_),
list('AU200953', 'C7H5F3O', 'neg', 2, 141.0174, 300 , 36.13614, 'C7H3F2O-', '[M-H]-', 161.022 , NA_character_),
list('AU325851', 'C10H12N2O3S', 'neg', 2, 239.0502, 4580 , 100 , 'C10H11N2O3S-', '[M-H]-', 239.0496, '[M-H]-' ),
list('AU325851', 'C10H12N2O3S', 'neg', 2, 240.0525, 468 , 10.21021, 'C9[13]CH11N2O3S-', '[M-H]-', 239.0496, NA_character_),
list('AU325851', 'C10H12N2O3S', 'neg', 2, 241.0471, 312 , 6.806807, 'C10H11N2O3[34]S-', '[M-H]-', 239.0496, NA_character_),
list('AU341051', 'C9H10Cl2N2O', 'neg', 2, 231.0102, 30800 , 100 , 'C9H9Cl2N2O- [M-H]-', '[M-H]-', 231.0097, NA_character_),
list('AU341051', 'C9H10Cl2N2O', 'neg', 2, 233.0077, 13532 , 43.84384, 'C9H9Cl[37]ClN2O-', '[M-H]-', 231.0097, NA_character_),
list('AU341051', 'C9H10Cl2N2O', 'neg', 2, 232.0129, 2024 , 6.506507, 'C8[13]CH9Cl2N2O-', '[M-H]-', 231.0097, NA_character_),
list('AU341051', 'C9H10Cl2N2O', 'neg', 2, 185.9529, 1672 , 5.405405, 'C7H2Cl2NO-', '[M-H]-', 231.0097, NA_character_),
list('AU341051', 'C9H10Cl2N2O', 'neg', 2, 187.9496, 868 , 2.802803, 'C7H2Cl[37]ClNO-', '[M-H]-', 231.0097, NA_character_),
list('AU341051', 'C9H10Cl2N2O', 'neg', 2, 159.9737, 844 , 2.702703, 'C6H4Cl2N-', '[M-H]-', 231.0097, NA_character_),
list('AU341051', 'C9H10Cl2N2O', 'neg', 2, 161.9711, 404 , 1.301301, 'C6H4Cl[37]ClN-', '[M-H]-', 231.0097, NA_character_),
list('AU158001', 'C17H19NO3', 'pos', 2, 286.1456, 1073792, 100 , 'C17H20NO3+', '[M+H]+', 286.1438, '[M+H]+' ),
list('AU158001', 'C17H19NO3', 'pos', 2, 287.1488, 157332 , 14.61461, 'C16[13]CH20NO3+', '[M+H]+', 286.1438, NA_character_),
list('AU158001', 'C17H19NO3', 'pos', 2, 288.1514, 15604 , 1.401401, 'C15[13]C2H20NO3+', '[M+H]+', 286.1438, NA_character_),
list('AU158002', 'C17H19NO3', 'pos', 2, 286.1457, 1338896, 100 , 'C17H20NO3+', '[M+H]+', 286.1438, '[M+H]+' ),
list('AU158002', 'C17H19NO3', 'pos', 2, 287.1489, 227244 , 16.91692, 'C16[13]CH20NO3+', '[M+H]+', 286.1438, NA_character_),
list('AU158002', 'C17H19NO3', 'pos', 2, 229.0869, 20980 , 1.501502, 'C14H13O3+', '[M+H]+', 286.1438, NA_character_),
list('AU158002', 'C17H19NO3', 'pos', 2, 288.1513, 19640 , 1.401401, 'C15[13]C2H20NO3+', '[M+H]+', 286.1438, NA_character_),
list('AU158002', 'C17H19NO3', 'pos', 2, 201.0918, 19520 , 1.401401, 'C13H13O2+', '[M+H]+', 286.1438, NA_character_),
list('AU158002', 'C17H19NO3', 'pos', 2, 268.1343, 8808 , 0.600600, 'C17H18NO2+', '[M+H]+', 286.1438, NA_character_),
list('AU158002', 'C17H19NO3', 'pos', 2, 211.076 , 8660 , 0.600600, 'C14H11O2+', '[M+H]+', 286.1438, NA_character_),
list('AU116602', 'C4H6N2S', 'pos', 2, 115.0334, 6556 , 100 , 'C4H7N2S+', '[M+H]+', 115.0324, '[M+H]+' ),
list('AU116606', 'C4H6N2S', 'pos', 2, 115.0334, 39940 , 100 , 'C4H7N2S+', '[M+H]+', 115.0324, '[M+H]+' ),
list('AU116606', 'C4H6N2S', 'pos', 2, 116.0365, 2808 , 7.007007, 'C3[13]CH7N2S+', '[M+H]+', 115.0324, NA_character_),
list('AU116606', 'C4H6N2S', 'pos', 2, 117.0293, 2596 , 6.406406, 'C4H7N2[34]S+', '[M+H]+', 115.0324, NA_character_),
stringsAsFactors = FALSE)
names(db) <- header
Create a connector to the MS database:
conn <- mybiodb$getFactory()$createConn('mass.csv.file')
conn$setDb(db)
conn$setField('peak.mztheo', 'peak.mz')
This feature is mainly useful for biodb tests purposes.
You can request a list of M/Z values from MS databases. Depending on the database, the list of M/Z values will be more or less exhaustive.
Getting a list of M/Z values:
conn$getMzValues(max.results = 10)
You can restrict to a certain MS mode:
conn$getMzValues(max.results = 10, ms.mode = 'pos')
or ask for the peaks to be a precursor peaks:
conn$getMzValues(max.results = 10, precursor = TRUE)
or even ask for an MS level:
conn$getMzValues(max.results = 10, ms.level = 2)
Here is how to search for spectra that contain a certain M/Z value:
conn$searchMzRange(mz.min = 115, mz.max = 115.1, max.results = 5)
Another version is available that uses a tolerance instead of a range:
conn$searchMzTol(mz = 115, mz.tol = 0.1, mz.tol.unit = 'plain', max.results = 5)
You can also set mz.tol.unit
to 'ppm'
.
Both methods accept the following options:
Option | Default | Description |
---|---|---|
ms.mode |
NA |
Set to 'pos' or 'neg' to get only spectra from a certain MS mode. |
precursor |
FALSE |
When set to TRUE , the searched peak must be a precursor peak. |
ms.level |
0 | Set to an integer greater than 0 to get only spectra from this MS level. |
min.rel.int |
NA |
The minimum of relative intensity required for the peak, in percentage from 0.0 to 100.0 . |
You can search a match of your MSMS spectrum inside the MSMS spectra of the database. First, define the spectrum to match:
spectrum <- data.frame(mz = c(286.1456, 287.1488, 288.1514), rel.int = c(999, 158, 18))
Then search for a match:
conn$msmsSearch(spectrum, precursor.mz = 286.1438, mz.tol = 0.1, mz.tol.unit = 'plain', ms.mode = 'pos')
A data frame, ordered from highest score to lowest, is returned. It contains the following columns:
-
id
: Database spectrum identifiers. -
score
: The matching score. - N columns
peak.#
: Each column corresponds to a peak of the searched spectrum (from first to last peak). A-1
means that the peak has not been matched. An integer N greater than0
means that the peak has been matched with the Nth peak of the database spectrum.