Skip to content

Commit

Permalink
minor changes in wording and spelling (#177)
Browse files Browse the repository at this point in the history
  • Loading branch information
bienerts authored and GitHub Enterprise committed Jul 1, 2021
1 parent f2800ca commit fe865ca
Show file tree
Hide file tree
Showing 7 changed files with 32 additions and 25 deletions.
12 changes: 6 additions & 6 deletions R/connections.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
#' Connection to Biomart
#' Connection to BioMart
#'
#' @description `r lifecycle::badge("experimental")`
#'
#' `connect_biomart()` creates a connection object of class [`ConnectionBiomart`] which contains
#' the `biomaRt` object of class [`biomaRt::Mart`][biomaRt::Mart-class] and the prefix of the object
#' which is used downstream for the query.
#'
#' @details This connects to the Ensembl data base of Biomart for human genes.
#' @details This connects to the Ensembl data base of BioMart for human genes.
#'
#' @param prefix (`string`)\cr gene ID prefix.
#'
Expand Down Expand Up @@ -37,15 +37,15 @@ connect_biomart <- function(prefix = c("ENSG", "GeneID")) {
slots = c(prefix = "character")
)

#' Get Annotations from Biomart
#' Get Annotations from BioMart
#'
#' @description `r lifecycle::badge("experimental")`
#'
#' Helper function to query annotations from `biomaRt`, for cleaned up gene IDs of
#' a specific ID variable and given [`biomaRt::Mart`][biomaRt::Mart-class].
#'
#' @param gene_ids (`character`)\cr gene IDs, e.g. `10329`.
#' @param id_var (`string`)\cr corresponding gene ID variable name in Biomart,
#' @param id_var (`string`)\cr corresponding gene ID variable name in BioMart,
#' e.g. `entrezgene_id`.
#' @param mart (`Mart`)\cr given [`biomaRt::Mart`][biomaRt::Mart-class] object.
#'
Expand Down Expand Up @@ -119,7 +119,7 @@ h_get_annotation_biomart <- function(gene_ids,
#' is extensible: It is simple to add new connections and corresponding query methods
#' for other data bases, e.g. company internal data bases. Please make sure to
#' follow the required format of the returned value.
#' - The Biomart queries might not return information for all the genes. This can be
#' - The BioMart queries might not return information for all the genes. This can be
#' due to different versions being used in the gene IDs and the queried Ensembl data base.
#'
#' @param genes (`character`)\cr gene IDs.
Expand Down Expand Up @@ -160,7 +160,7 @@ setGeneric(
#' @export
#'
#' @note This is currently used to strip away the `GeneID` prefix from Entrez gene IDs
#' so that they can be queried from Biomart.
#' so that they can be queried from BioMart
#'
#' @examples
#' h_strip_prefix(c("GeneID:11185", "GeneID:10677"), prefix = "GeneID")
Expand Down
2 changes: 1 addition & 1 deletion inst/WORDLIST
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Armen
Benjamini
BioConductor
Biomart
BioMart
Chendi
CPM
DESeq
Expand Down
4 changes: 2 additions & 2 deletions man/connect_biomart.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/h_get_annotation_biomart.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/h_strip_prefix.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/query.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 19 additions & 12 deletions vignettes/introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ knitr::opts_chunk$set(

## Acknowledgments

`hermes` is a successor of the Roche internal `rnaseqTools` R package, and therefore many code ideas have been borrowed from it. We therefore would like to thank the `rnaseqTools` authors for their work. In particular, we would like to acknowledge Chendi Liao and Joe Paulson for their guidance and explanations during the development of `hermes`. We also discussed the class design with Valerie Obenchain, and discussed RNAseq data standards with Armen Karapetyan. We borrowed some ideas from the Roche internal `biokitr` R package and discussed with its maintainer Daniel Marbach.
`hermes` is a successor of the Roche internal `rnaseqTools` R package, and therefore many code ideas have been borrowed from it. Therefore we would like to thank the `rnaseqTools` authors for their work.
In particular, we would like to acknowledge Chendi Liao and Joe Paulson for their guidance and explanations during the development of `hermes`. We also discussed the class design with Valerie Obenchain, and discussed RNAseq data standards with Armen Karapetyan. We borrowed some ideas from the Roche internal `biokitr` R package and discussed them with its maintainer Daniel Marbach.

Finally, as with any NEST product, `hermes` is only possible because of the whole NEST project team's work, and we are grateful for the larger team's support.
Finally, as with any NEST product, `hermes` is only possible because of the whole NEST project team's work, and we are grateful for the entire team's support.

Thanks a lot to everyone involved!

Expand All @@ -39,7 +40,7 @@ vignette(topic = "introduction", package = "hermes")
In this vignette you are going to learn how to:

* Import RNAseq count data into the `hermes` ready format.
* Annotate gene information automatically from a central database (e.g. Biomart).
* Annotate gene information automatically from a central database (e.g. BioMart).
* Add quality control (QC) flags to genes and samples.
* Filter the data set.
* Normalize the counts.
Expand All @@ -66,7 +67,7 @@ The data for `hermes` needs to be imported into the `HermesData` or `RangedHerme

### Importing a `SummarizedExperiment`

The simplest import route is from a `SummarizedExperiment` (SE) object. This is because a `HermesData` object
The simplest way to import data is from a `SummarizedExperiment` (SE) object. This is because a `HermesData` object
is just a special SE, with few additional requirements and slots.

In a nutshell, the object needs to have a `counts` assay, have certain
Expand All @@ -92,11 +93,11 @@ For a bit more details we can also call `summary()` on the object.
summary(object)
```

Note that here the "additional" columns refer to everything not mandatorily included in a `HermesData` object.
Note that here the "additional" columns refer to everything that is not mandatory to be included in a `HermesData` object.

### Importing an `ExpressionSet`

If we start from an `ExpressionSet`, we can first convert this to a `RangedSummarizedExperiment` and then import to `RangedHermesData`:
If we start from an `ExpressionSet`, we can first convert it to a `RangedSummarizedExperiment` and then import it to `RangedHermesData`:

```{r}
se <- makeSummarizedExperimentFromExpressionSet(expression_set)
Expand Down Expand Up @@ -134,7 +135,8 @@ allow for future extensions in this or other downstream packages.

### Connection to Database

The first step is to connect to a database. In `hermes` the only option is currently Biomart.
The first step is to connect to a database. In `hermes` the only option is currently databases that utilize the
BioMart software suite.
However due to the generic function design, it is simple to extend `hermes` with other data base
connections.

Expand Down Expand Up @@ -162,17 +164,20 @@ Then the second step is to query the gene annotations and save them in the objec
annotation(small_object) <- query(genes(small_object), connection)
```

Here we are using the `genes()` method to access the gene IDs (row names) of the `HermesData` object. Note that not all genes might be found in the data base and the corresponding rows would then be `NA` in the annotations.
Here we are using the `genes()` method to access the gene IDs (row names) of the `HermesData` object.
Note that not all genes might be found in the data base and the corresponding rows would then be `NA` in the annotations.

## Quality Control Flags

`hermes` provides automatic gene and sample flagging, as well as manual sample flagging functionality.

### Automatic Gene and Sample Flagging

For genes, it is counted how many samples don't pass a minimum expression CPM threshold. If too many, then this gene is flagged as a "low expression" gene.
For genes, it is counted how many samples don't pass a minimum expression CPM (counts per million reads mapped) threshold.
If too many, then this gene is flagged as a "low expression" gene.

For samples, two flags are provided. The "technical failure" flag is based on the average Pearson correlation with other samples. The "low depth" flag is based on the library size, i.e. the total sum of counts for a sample across all genes.
For samples, two flags are provided. The "technical failure" flag is based on the average Pearson correlation with other
samples. The "low depth" flag is based on the library size, i.e. the total sum of counts for a sample across all genes.

Thresholds for the above flags can be initialized with `control_quality()`, and the flags are added with `add_quality_flags()`.

Expand Down Expand Up @@ -271,7 +276,8 @@ A series of simple descriptive plots can be obtained by just calling `autoplot()
autoplot(object)
```

Note that individual plots from these can be produced with the series of `draw_*()` functions, see `?plot_all` for the detailed list. Then, these can be customized further.
Note that individual plots from these can be produced with the series of `draw_*()` functions, see `?plot_all` for the
detailed list. Then, these can be customized further.
For example, we can change the number and color of the bins in the library size histogram:

```{r}
Expand Down Expand Up @@ -327,7 +333,8 @@ autoplot(

### Correlation with Sample Variables

Afterwards it is easy to correlate the obtained principal components with the sample variables. We obtain a matrix of R-squared (R2) values for all combinations, which can again be visualized as a heatmap.
Subsequently it is easy to correlate the obtained principal components with the sample variables. We obtain a matrix of
R-squared (R2) values for all combinations, which can again be visualized as a heatmap.
See `?pca_cor_samplevar` for details.

```{r, fig.height=8}
Expand Down

0 comments on commit fe865ca

Please sign in to comment.