Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

Search MS databases

Pierrick Roger edited this page Mar 6, 2018 · 3 revisions

This page has been generated automatically from a package vignette. Please do not edit, since your modifications will later be removed.

Introduction

Create an instance of the Biodb class:

mybiodb <- biodb::Biodb$new()

For this vignette, we will use an in-house database, built from a data frame. There are two other mass databases in biodb: Peakforest and Massbank database. However the Massbank database requires to download and extract the whole database first, which takes several minutes, while the access to Peakforest is currently restricted to the partners of the MetaboHUB project. We will then use a small custom database, built from Massbank Japan using the biodb script peak-extractor, specially for the purpose of our examples.

Create the database data frame:

header <- c('accession', 'formula',      'ms.mode', 'ms.level', 'peak.mz',  'peak.intensity', 'peak.relative.intensity', 'peak.formula',       'msprecannot',  'msprecmz', 'peak.attr')
db <- rbind.data.frame(                                                                                                                      
       list('BML80005',  'C12H14N2O2',   'pos',     1,          219.1127765, 373076,          100     ,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80012',  'C15H18O8',     'pos',     1,          327.1074765, 33174,           100     ,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80013',  'C15H18O8',     'neg',     1,          325.0929235, 1595373,         56.15616,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80013',  'C15H18O8',     'neg',     1,          361.069602,  2841342,         100     ,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80020',  'C17H21NO3',    'pos',     1,          288.1593765, 781807,          100     ,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
       list('BML80020',  'C17H21NO3',    'pos',     1,          310.141318,  11409,           1.501502,                       NA_character_,        NA_character_, NA_real_,   NA_character_),
	   list('AU200951',  'C7H5F3O',      'neg',     2,          161.0238,    38176  ,         100     ,                      'C7H4F3O-',           '[M-H]-',       161.022 ,   '[M-H]-'     ),
	   list('AU200951',  'C7H5F3O',      'neg',     2,          162.0274,    1780   ,         4.604605,                      'C6[13]CH4F3O-',      '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200951',  'C7H5F3O',      'neg',     2,          141.0167,    616    ,         1.601602,                      'C7H3F2O-',           '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200952',  'C7H5F3O',      'neg',     2,          161.0246,    6180   ,         100     ,                      'C7H4F3O-',           '[M-H]-',       161.022 ,   '[M-H]-'     ),
	   list('AU200952',  'C7H5F3O',      'neg',     2,          141.0184,    1384   ,         22.32232,                      'C7H3F2O-',           '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200952',  'C7H5F3O',      'neg',     2,          121.0113,    1180   ,         19.01902,                      'C7H2FO-',            '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200952',  'C7H5F3O',      'neg',     2,          162.0282,    388    ,         6.206206,                      'C6[13]CH4F3O-',      '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200953',  'C7H5F3O',      'neg',     2,          121.0113,    828    ,         100     ,                      'C7H2FO-',            '[M-H]-',       161.022 ,   NA_character_),
	   list('AU200953',  'C7H5F3O',      'neg',     2,          141.0174,    300    ,         36.13614,                      'C7H3F2O-',           '[M-H]-',       161.022 ,   NA_character_),
	   list('AU325851',  'C10H12N2O3S',  'neg',     2,          239.0502,    4580   ,         100     ,                      'C10H11N2O3S-',       '[M-H]-',       239.0496,   '[M-H]-'     ),
	   list('AU325851',  'C10H12N2O3S',  'neg',     2,          240.0525,    468    ,         10.21021,                      'C9[13]CH11N2O3S-',   '[M-H]-',       239.0496,   NA_character_),
	   list('AU325851',  'C10H12N2O3S',  'neg',     2,          241.0471,    312    ,         6.806807,                      'C10H11N2O3[34]S-',   '[M-H]-',       239.0496,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          231.0102,    30800  ,         100     ,                      'C9H9Cl2N2O- [M-H]-', '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          233.0077,    13532  ,         43.84384,                      'C9H9Cl[37]ClN2O-',   '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          232.0129,    2024   ,         6.506507,                      'C8[13]CH9Cl2N2O-',   '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          185.9529,    1672   ,         5.405405,                      'C7H2Cl2NO-',         '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          187.9496,    868    ,         2.802803,                      'C7H2Cl[37]ClNO-',    '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          159.9737,    844    ,         2.702703,                      'C6H4Cl2N-',          '[M-H]-',       231.0097,   NA_character_),
	   list('AU341051',  'C9H10Cl2N2O',  'neg',     2,          161.9711,    404    ,         1.301301,                      'C6H4Cl[37]ClN-',     '[M-H]-',       231.0097,   NA_character_),
	   list('AU158001',  'C17H19NO3',    'pos',     2,          286.1456,    1073792,         100     ,                      'C17H20NO3+',         '[M+H]+',       286.1438,   '[M+H]+'     ),
	   list('AU158001',  'C17H19NO3',    'pos',     2,          287.1488,    157332 ,         14.61461,                      'C16[13]CH20NO3+',    '[M+H]+',       286.1438,   NA_character_),
	   list('AU158001',  'C17H19NO3',    'pos',     2,          288.1514,    15604  ,         1.401401,                      'C15[13]C2H20NO3+',   '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          286.1457,    1338896,         100     ,                      'C17H20NO3+',         '[M+H]+',       286.1438,   '[M+H]+'     ),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          287.1489,    227244 ,         16.91692,                      'C16[13]CH20NO3+',    '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          229.0869,    20980  ,         1.501502,                      'C14H13O3+',          '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          288.1513,    19640  ,         1.401401,                      'C15[13]C2H20NO3+',   '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          201.0918,    19520  ,         1.401401,                      'C13H13O2+',          '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          268.1343,    8808   ,         0.600600,                      'C17H18NO2+',         '[M+H]+',       286.1438,   NA_character_),
	   list('AU158002',  'C17H19NO3',    'pos',     2,          211.076 ,    8660   ,         0.600600,                      'C14H11O2+',          '[M+H]+',       286.1438,   NA_character_),
	   list('AU116602',  'C4H6N2S',      'pos',     2,          115.0334,    6556   ,         100     ,                      'C4H7N2S+',           '[M+H]+',       115.0324,   '[M+H]+'     ),
	   list('AU116606',  'C4H6N2S',      'pos',     2,          115.0334,    39940  ,         100     ,                      'C4H7N2S+',           '[M+H]+',       115.0324,   '[M+H]+'     ),
	   list('AU116606',  'C4H6N2S',      'pos',     2,          116.0365,    2808   ,         7.007007,                      'C3[13]CH7N2S+',      '[M+H]+',       115.0324,   NA_character_),
	   list('AU116606',  'C4H6N2S',      'pos',     2,          117.0293,    2596   ,         6.406406,                      'C4H7N2[34]S+',       '[M+H]+',       115.0324,   NA_character_),
            stringsAsFactors = FALSE)
names(db) <- header

Create a connector to the MS database:

conn <- mybiodb$getFactory()$createConn('mass.csv.file')
conn$setDb(db)
conn$setField('peak.mztheo', 'peak.mz')

Getting M/Z values from the database

This feature is mainly useful for biodb tests purposes.

You can request a list of M/Z values from MS databases. Depending on the database, the list of M/Z values will be more or less exhaustive.

Getting a list of M/Z values:

conn$getMzValues(max.results = 10)

You can restrict to a certain MS mode:

conn$getMzValues(max.results = 10, ms.mode = 'pos')

or ask for the peaks to be a precursor peaks:

conn$getMzValues(max.results = 10, precursor = TRUE)

or even ask for an MS level:

conn$getMzValues(max.results = 10, ms.level = 2)

Search for spectra containing a peak

Here is how to search for spectra that contain a certain M/Z value:

conn$searchMzRange(mz.min = 115, mz.max = 115.1, max.results = 5)

Another version is available that uses a tolerance instead of a range:

conn$searchMzTol(mz = 115, mz.tol = 0.1, mz.tol.unit = 'plain', max.results = 5)

You can also set mz.tol.unit to 'ppm'.

Both methods accept the following options:

Option Default Description
ms.mode NA Set to 'pos' or 'neg' to get only spectra from a certain MS mode.
precursor FALSE When set to TRUE, the searched peak must be a precursor peak.
ms.level 0 Set to an integer greater than 0 to get only spectra from this MS level.
min.rel.int NA The minimum of relative intensity required for the peak, in percentage from 0.0 to 100.0.

Search for MSMS spectra by spectrum matching

You can search a match of your MSMS spectrum inside the MSMS spectra of the database. First, define the spectrum to match:

spectrum <- data.frame(mz = c(286.1456, 287.1488, 288.1514), rel.int = c(999, 158, 18))

Then search for a match:

conn$msmsSearch(spectrum, precursor.mz = 286.1438, mz.tol = 0.1, mz.tol.unit = 'plain', ms.mode = 'pos')

A data frame, ordered from highest score to lowest, is returned. It contains the following columns:

  • id: Database spectrum identifiers.
  • score: The matching score.
  • N columns peak.#: Each column corresponds to a peak of the searched spectrum (from first to last peak). A -1 means that the peak has not been matched. An integer N greater than 0 means that the peak has been matched with the Nth peak of the database spectrum.