Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

Database extraction

Pierrick Roger edited this page Mar 6, 2018 · 3 revisions

This page has been generated automatically from a package vignette. Please do not edit, since your modifications will later be removed.

Introduction

It is possible to download whole or part of a database and export it into a csv file.

Create an instance of the Biodb class:

mybiodb <- biodb::Biodb()

Retrieving an entire database whose entries contain only atomic values

For this example, we will use the Lipidmaps Structure database.

First, connect to the database:

lipids <- mybiodb$getFactory()$getConn('lipidmaps.structure')

Then get all entry IDs:

entry.ids <- lipids$getEntryIds()

Here we just get two entries, since retrieving them all would take too much time for this example:

entries <- mybiodb$getFactory()$getEntry('lipidmaps.structure', id = entry.ids[1:2])

Transform all entries into a single data frame:

df <- mybiodb$entriesToDataframe(entries)
print(df)

Export the data frame into a CSV file with R standard function:

write.csv(df, file = 'lipidmaps-structure.csv')

Retrieving part of a database whose entries contain non-atomic values

We will take for this example the Massbank database, since each entry contains a peaks list stored into a data frame.

First, connect to the database:

massbank <- mybiodb$getFactory()$getConn('massbank.jp')

Get some entry IDs, searching by M/Z value:

entry.ids <- massbank$searchMzTol(64, mz.tol = 0.3, max.results = 2)
mybiodb$getConfig()$disable('allow.huge.downloads')

Get all entries:

entries <- mybiodb$getFactory()$getEntry('massbank.jp', id = entry.ids)

Transform all entries into a single data frame:

df <- mybiodb$entriesToDataframe(entries, only.atomic = FALSE)
print(df)

The option only.atomic controls if only atomic values are put inside the data frame. If set to TRUE, then each entry will occupy only one line inside the data frame. If set to FALSE and an entry contains a non-atomic value (vector or data frame), then instead of occupying one line in the data frame it will occupy several lines, its atomic values being copied as many time as there are values inside the non-atomic value.

Export the data frame into a CSV file with R standard function:

write.csv(df, file = 'massbank.csv')