Search MS databases

This page has been generated automatically from a package vignette. Please do not edit, since your modifications will later be removed.

Introduction

Create an instance of the Biodb class:

mybiodb <- biodb::Biodb$new()

For this vignette, we will use an in-house database, built from a data frame. There are two other mass databases in biodb: Peakforest and Massbank database. However the Massbank database requires to download and extract the whole database first, which takes several minutes, while the access to Peakforest is currently restricted to the partners of the MetaboHUB project. We will then use a small custom database, built from Massbank Japan using the biodb script peak-extractor, specially for the purpose of our examples.

Create the database data frame:

header <- c('accession', 'formula',      'ms.mode', 'ms.level', 'peak.mz',  'peak.intensity', 'peak.relative.intensity', 'peak.formula',       'msprecannot',  'msprecmz', 'peak.attr')
db <- rbind.data.frame(                                                                                                                      
       list('BML80005',  'C12H14N2O2',   'pos',     1,          219.1127765, 373076,          100     ,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80012',  'C15H18O8',     'pos',     1,          327.1074765, 33174,           100     ,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80013',  'C15H18O8',     'neg',     1,          325.0929235, 1595373,         56.15616,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80013',  'C15H18O8',     'neg',     1,          361.069602,  2841342,         100     ,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80020',  'C17H21NO3',    'pos',     1,          288.1593765, 781807,          100     ,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80020',  'C17H21NO3',    'pos',     1,          310.141318,  11409,           1.501502,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
	   list('AU200951',  'C7H5F3O',      'neg',     2,          161.0238,    38176  ,         100     ,                      'C7H4F3O-',           '[M-H]-',       161.022 ,   '[M-H]-'     ),
	   list('AU200951',  'C7H5F3O',      'neg',     2,          162.0274,    1780   ,         4.604605,                      'C6[13]CH4F3O-',      '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200951',  'C7H5F3O',      'neg',     2,          141.0167,    616    ,         1.601602,                      'C7H3F2O-',           '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200952',  'C7H5F3O',      'neg',     2,          161.0246,    6180   ,         100     ,                      'C7H4F3O-',           '[M-H]-',       161.022 ,   '[M-H]-'     ),
	   list('AU200952',  'C7H5F3O',      'neg',     2,          141.0184,    1384   ,         22.32232,                      'C7H3F2O-',           '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200952',  'C7H5F3O',      'neg',     2,          121.0113,    1180   ,         19.01902,                      'C7H2FO-',            '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200952',  'C7H5F3O',      'neg',     2,          162.0282,    388    ,         6.206206,                      'C6[13]CH4F3O-',      '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200953',  'C7H5F3O',      'neg',     2,          121.0113,    828    ,         100     ,                      'C7H2FO-',            '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200953',  'C7H5F3O',      'neg',     2,          141.0174,    300    ,         36.13614,                      'C7H3F2O-',           '[M-H]-',       161.022 ,   NA_character_),
	   list('AU325851',  'C10H12N2O3S',  'neg',     2,          239.0502,    4580   ,         100     ,                      'C10H11N2O3S-',       '[M-H]-',       239.0496,   '[M-H]-'     ),
	   list('AU325851',  'C10H12N2O3S',  'neg',     2,          240.0525,    468    ,         10.21021,                      'C9[13]CH11N2O3S-',   '[M-H]-',       239.0496,   NA_character_),
	   list('AU325851',  'C10H12N2O3S',  'neg',     2,          241.0471,    312    ,         6.806807,                      'C10H11N2O3[34]S-',   '[M-H]-',       239.0496,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          231.0102,    30800  ,         100     ,                      'C9H9Cl2N2O- [M-H]-', '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          233.0077,    13532  ,         43.84384,                      'C9H9Cl[37]ClN2O-',   '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          232.0129,    2024   ,         6.506507,                      'C8[13]CH9Cl2N2O-',   '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          185.9529,    1672   ,         5.405405,                      'C7H2Cl2NO-',         '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          187.9496,    868    ,         2.802803,                      'C7H2Cl[37]ClNO-',    '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          159.9737,    844    ,         2.702703,                      'C6H4Cl2N-',          '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          161.9711,    404    ,         1.301301,                      'C6H4Cl[37]ClN-',     '[M-H]-',       231.0097,   NA_character_),
	   list('AU158001',  'C17H19NO3',    'pos',     2,          286.1456,    1073792,         100     ,                      'C17H20NO3+',         '[M+H]+',       286.1438,   '[M+H]+'     ),
	   list('AU158001',  'C17H19NO3',    'pos',     2,          287.1488,    157332 ,         14.61461,                      'C16[13]CH20NO3+',    '[M+H]+',       286.1438,   NA_character_),
	   list('AU158001',  'C17H19NO3',    'pos',     2,          288.1514,    15604  ,         1.401401,                      'C15[13]C2H20NO3+',   '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          286.1457,    1338896,         100     ,                      'C17H20NO3+',         '[M+H]+',       286.1438,   '[M+H]+'     ),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          287.1489,    227244 ,         16.91692,                      'C16[13]CH20NO3+',    '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          229.0869,    20980  ,         1.501502,                      'C14H13O3+',          '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          288.1513,    19640  ,         1.401401,                      'C15[13]C2H20NO3+',   '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          201.0918,    19520  ,         1.401401,                      'C13H13O2+',          '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          268.1343,    8808   ,         0.600600,                      'C17H18NO2+',         '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          211.076 ,    8660   ,         0.600600,                      'C14H11O2+',          '[M+H]+',       286.1438,   NA_character_),
	   list('AU116602',  'C4H6N2S',      'pos',     2,          115.0334,    6556   ,         100     ,                      'C4H7N2S+',           '[M+H]+',       115.0324,   '[M+H]+'     ),
	   list('AU116606',  'C4H6N2S',      'pos',     2,          115.0334,    39940  ,         100     ,                      'C4H7N2S+',           '[M+H]+',       115.0324,   '[M+H]+'     ),
	   list('AU116606',  'C4H6N2S',      'pos',     2,          116.0365,    2808   ,         7.007007,                      'C3[13]CH7N2S+',      '[M+H]+',       115.0324,   NA_character_),
	   list('AU116606',  'C4H6N2S',      'pos',     2,          117.0293,    2596   ,         6.406406,                      'C4H7N2[34]S+',       '[M+H]+',       115.0324,   NA_character_),
            stringsAsFactors = FALSE)
names(db) <- header

Create a connector to the MS database:

conn <- mybiodb$getFactory()$createConn('mass.csv.file')
conn$setDb(db)
conn$setField('peak.mztheo', 'peak.mz')

Getting M/Z values from the database

This feature is mainly useful for biodb tests purposes.

You can request a list of M/Z values from MS databases. Depending on the database, the list of M/Z values will be more or less exhaustive.

Getting a list of M/Z values:

conn$getMzValues(max.results = 10)

You can restrict to a certain MS mode:

conn$getMzValues(max.results = 10, ms.mode = 'pos')

or ask for the peaks to be a precursor peaks:

conn$getMzValues(max.results = 10, precursor = TRUE)

or even ask for an MS level:

conn$getMzValues(max.results = 10, ms.level = 2)

Search for spectra containing a peak

Here is how to search for spectra that contain a certain M/Z value:

conn$searchMzRange(mz.min = 115, mz.max = 115.1, max.results = 5)

Another version is available that uses a tolerance instead of a range:

conn$searchMzTol(mz = 115, mz.tol = 0.1, mz.tol.unit = 'plain', max.results = 5)

You can also set mz.tol.unit to 'ppm'.

Both methods accept the following options:

Option	Default	Description
`ms.mode`	`NA`	Set to `'pos'` or `'neg'` to get only spectra from a certain MS mode.
`precursor`	`FALSE`	When set to `TRUE`, the searched peak must be a precursor peak.
`ms.level`	0	Set to an integer greater than 0 to get only spectra from this MS level.
`min.rel.int`	`NA`	The minimum of relative intensity required for the peak, in percentage from `0.0` to `100.0`.

Search for MSMS spectra by spectrum matching

You can search a match of your MSMS spectrum inside the MSMS spectra of the database. First, define the spectrum to match:

spectrum <- data.frame(mz = c(286.1456, 287.1488, 288.1514), rel.int = c(999, 158, 18))

Then search for a match:

conn$msmsSearch(spectrum, precursor.mz = 286.1438, mz.tol = 0.1, mz.tol.unit = 'plain', ms.mode = 'pos')

A data frame, ordered from highest score to lowest, is returned. It contains the following columns:

id: Database spectrum identifiers.
score: The matching score.
N columns peak.#: Each column corresponds to a peak of the searched spectrum (from first to last peak). A -1 means that the peak has not been matched. An integer N greater than 0 means that the peak has been matched with the Nth peak of the database spectrum.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search MS databases

Introduction

Getting M/Z values from the database

Search for spectra containing a peak

Search for MSMS spectra by spectrum matching

Clone this wiki locally