-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added R packaging tutorial #52
base: main
Are you sure you want to change the base?
Changes from all commits
c94c99c
6032e92
aea85ac
a5ea81b
3d3d5fa
f78c5c3
b548bbb
26ed033
12c0f03
2745011
1a64405
aea3b8d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
--- | ||
name: "Aims and workflows" | ||
teaching: 10 | ||
dependsOn: [ | ||
technology_and_tooling.r_packaging.setup | ||
] | ||
tags: [rpackaging] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. think the tag should be "R" to indicate this is the R stream of the packaging course |
||
attribution: | ||
- citation: > | ||
Wickham, Hadley, and Jennifer Bryan. R packages. " O'Reilly Media, Inc.", 2023. | ||
url: https://r-pkgs.org/ | ||
license: CC BY-NC-ND 4.0 | ||
--- | ||
|
||
# Basics: the aims and workflows | ||
|
||
## Aims | ||
|
||
We aim to create an R package, "regexcite", which contains a few simple functions for processing strings. In doing so, we will: | ||
|
||
- create the functions themselves | ||
- show how to set up version control | ||
- explain workflows for package development including: | ||
- package documentation | ||
- function documentation | ||
- unit testing | ||
- how to include package dependencies | ||
- how to help users to better understand how to use your package by creating vignettes | ||
|
||
There's a short presentation [here](dependencies/how_to_make_an_R_package.pdf), which explains a little more about R packages and software licenses. | ||
|
||
## Key workflows | ||
Key to the process of our package development will be the following commands, which you will execute repeatedly throughout its development: | ||
|
||
 | ||
|
||
This figure has been reproduced from "R Packages (2nd edition)", by Hadley Wickham and Jennifer Bryan. | ||
|
||
- `load_all()` loads all functions (both those internal to the package and exposed to the user -- more on this distinction later) so that a user can test them out interactively in the RStudio console. You should **never** during package development load code from your package into the environment directly by highlighting your own function then executing it. This is because `load_all()` much more accurately simulates the process of installing your package than relying on things defined in your global environment. | ||
- `test()` runs all unit tests for the package. A useful alternative is `test_active_file()` which runs only those unit tests given in a file currently active in RStudio. | ||
- `document()` builds the documentation for your package's functions using `roxygen`. It then allows you to browse the help page for your own functions by executing `?custom_function_name` (where `custom_function_name` is a name of a function you've created). | ||
- `check()` automatically builds and checks a source package, using all known best practices. If your package passes this, it is a good sign that it may be ready to share with others. | ||
|
||
One thing that the above workflow diagram is missing is a check of code unit testing coverage, which can be achieved via the `test_coverage()` function in the `covr` package. This returns the percentage of lines in functions which have unit testing coverage; testing coverage simply means that a given line is hit by a unit test. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
--- | ||
name: "Creating your package" | ||
teaching: 20 | ||
dependsOn: [ | ||
technology_and_tooling.r_packaging.basics | ||
] | ||
tags: [rpackaging] | ||
attribution: | ||
- citation: > | ||
Wickham, Hadley, and Jennifer Bryan. R packages. " O'Reilly Media, Inc.", 2023. | ||
url: https://r-pkgs.org/ | ||
license: CC BY-NC-ND 4.0 | ||
--- | ||
|
||
## Create package outline | ||
|
||
We are now going to create the skeleton of our package within your newly created directory. Fortunately, much of this process is automated, minimising the effort required by you! | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. was there meant to be something before this on creating the directory? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. never mind, I found it :) |
||
|
||
Execute `create_package("regexcite")` in the R console which will initialise a package called "regexcite" in your directory: | ||
|
||
- this will likely open up a new RStudio session so you may need to call `library(devtools)` again to load this package | ||
- notice that there now exists various files in the directory: | ||
- `DESCRIPTION` which is an editable file containing package metadata, which we will later edit. | ||
- `.Rbuildignore` lists files that we need to have around but that should not be included when building the R package from source. | ||
- `NAMESPACE` declares the functions your package exports for external use and the external functions your package imports from other packages. Generally, this should not be edited by hand. | ||
- the `R/` directory is the “business end” of your package. It will soon contain `.R` files with function definitions. | ||
|
||
Optionally, also make our package also a Git repository, with `use_git()`. | ||
|
||
## Create basic function | ||
Create `x <- "alfa,bravo,charlie,delta"`. Suppose we want to write a function that splits this into a vector of individual words: | ||
|
||
- this can be done with `unlist(strsplit(x, split=","))` but we'd like to create a function to do so. | ||
|
||
We now want to write a function we are going to call `strsplit1` which takes as input a string with a given separator and splits it into a vector of strings: | ||
|
||
- call `use_r("strsplit1")` which should create an `strsplit1.R` file within the `R/` folder. | ||
- write your `strsplit1` function in that file: | ||
|
||
```R | ||
strsplit1 <- function(x, split) { | ||
unlist(strsplit(x, split = split)) | ||
} | ||
``` | ||
|
||
Test drive your function by calling `load_all()` which should make the function available for you to play with | ||
|
||
- when developing an R package, we don't manually instantiate functions typically. | ||
- `load_all` provides a more robust way to test functions as it simulates what a user would experience by loading the package. | ||
|
||
Now check that the package as a whole works by calling `check()` | ||
- this runs `R CMD check`, which is executed in the shell and is the gold standard for checking that an R package is in full working order | ||
|
||
`check` should have raised a warning about `Non-standard license specification` and we will soon address this | ||
|
||
## Add descriptive info for your package | ||
|
||
Open up the `DESCRIPTION` file: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there a link to an official description of this file? eg. https://r-pkgs.org/description.html |
||
|
||
- make yourself the author; if you don’t have an ORCID, you can omit the `comment = ...` portion | ||
- add some descriptive text to the `Title` and `Description` fields | ||
|
||
Add an MIT license (see [course x](xx) for an intro to software licenses) via `use_mit_license()` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. link here looks like it needs to be filled in |
||
|
||
- this adds two license files to the folder: | ||
- `LICENSE` which contains the year and copyright owners | ||
- `LICENSE.md` which holds the full license info |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
name: "Package dependencies" | ||
teaching: 15 | ||
dependsOn: [ | ||
technology_and_tooling.r_packaging.testing | ||
] | ||
tags: [rpackaging] | ||
attribution: | ||
- citation: > | ||
Wickham, Hadley, and Jennifer Bryan. R packages. " O'Reilly Media, Inc.", 2023. | ||
url: https://r-pkgs.org/ | ||
license: CC BY-NC-ND 4.0 | ||
--- | ||
|
||
## Package dependencies | ||
|
||
Suppose we want to use the `tibble` function from `dplyr` to store our split string in a dataframe with a column 'order' which specifies the position of each element in the string and another column 'string' which contains the splitted string. To do this, we will create a function called `string_df(x, split)` that internally calls `tibble`. To do so, we will need to make the `dplyr` function a dependency of our package. | ||
|
||
Call `use_package("dplyr")`. Examine the `DESCRIPTION` file to see that this file has been added under the "Imports" section. | ||
|
||
Call `use_r("string_df")` to create a blank `.R` file. Insert the following into the file: | ||
|
||
``` | ||
string_df <- function(x, split) { | ||
x <- strsplit1(x, split) | ||
dplyr::tibble( | ||
order=seq_along(x), | ||
string=x | ||
) | ||
} | ||
``` | ||
|
||
Note that we have used `dplyr::tibble` rather than `tibble` to indicate that we are wanting to access a function that is amongst the exported variables of the `dplyr` package. In practice, when developing an R package, it should be your default behaviour to use `::` to access functions belonging to a specific package. This adds clarity to your code and helps you (and anyone else developing a package after you) to keep track of exactly which functions are being used from which packages. | ||
|
||
Create documentation for the above function. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how? docstring or R equivilent? |
||
|
||
Call `use_test()` from within the R file to create a test file, and add a unit test for `string_df`. | ||
|
||
Check that all is ok with the package (and the tests) by running `check()`. | ||
|
||
Try installing your package and running its functionality in a fresh R session. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
--- | ||
name: "Function documentation" | ||
teaching: 15 | ||
dependsOn: [ | ||
technology_and_tooling.r_packaging.creating_your_package | ||
] | ||
tags: [rpackaging] | ||
attribution: | ||
- citation: > | ||
Wickham, Hadley, and Jennifer Bryan. R packages. " O'Reilly Media, Inc.", 2023. | ||
url: https://r-pkgs.org/ | ||
license: CC BY-NC-ND 4.0 | ||
--- | ||
|
||
## Creating function documentation | ||
|
||
Function documentation tells future users of your package (yourself included) what inputs a function expects and the types of output it produces. Thankfully much of this process is entirely automated, reducing the demands on users. | ||
|
||
We are now going to create documentation for your recently created `strsplit1` function. | ||
|
||
Open again your `R/strsplit1.R` file and place your cursor within the `strsplit1` function body: | ||
- in your Rstudio window do *Code > Insert roxygen skeleton*, which should create boilerplate code above the function written in the `roxygen` style | ||
- add a title | ||
- describe the inputs and the return value | ||
- include an example | ||
|
||
Trigger creation of the documentation via `document()`. This will create a folder `man/` which houses automatically built documentation for your package. | ||
|
||
After doing so, test your documentation using `?strsplit1`. Does it appear as you hope it should to a user? If not, adjust the documentation and iteratively execute `document()` and `?strsplit1` until you are happy with it. | ||
|
||
## External and internal functions | ||
|
||
Notice that when you executed *Code > Insert roxygen skeleton* above, an `@export` was added near the bottom of your function documentation. This means that your `strsplit1` function will be accessible by users of the package; after users execute `library(regexcite)` they can then use your `strsplit1` function. | ||
|
||
To see that this is the case, examine the `NAMESPACE` file. You will see that `document()` added *export(strsplit1)* there which is the way that R packages keep tabs on which functions should be exposed to users. | ||
|
||
You may also want internal functions in your package, which you don't make available to users. For example, helper functions that are used by other functions in your package but which a user is unlikely to want to use themselves. To achieve this, just remove the `@export` from your function's documentation. | ||
|
||
|
||
## Test driving your package | ||
Double check that your package is all ok by running `check()`. | ||
|
||
Now that we have a package that works, we can install it via `install()`. | ||
|
||
Restart your R session and do `library(regexcite)` to load the package. Then test the package using: | ||
|
||
```R | ||
x <- "alfa,bravo,charlie,delta" | ||
strsplit1(x, split = ",") | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
name: "How to create an R package tutorial" | ||
id: rpackaging | ||
dependsOn: [ | ||
] | ||
files: [ | ||
setup.md, | ||
basics.md, | ||
creating_your_package.md, | ||
function_documentation.md, | ||
testing.md, | ||
dependencies.md, | ||
vignettes.md | ||
] | ||
attribution: | ||
- citation: > | ||
Wickham, Hadley, and Jennifer Bryan. R packages. " O'Reilly Media, Inc.", 2023. | ||
url: https://r-pkgs.org/ | ||
license: CC BY-NC-ND 4.0 | ||
summary: | | ||
This tutorial explains how to develop an R package, including how to write basic unit tests and create pedagogical vignettes which explain how to use a package | ||
--- | ||
|
||
This short course describes how to make an R package. It broadly follows Chapter 2 of the [free online version](https://r-pkgs.org/) of "R Packages (2nd edition)", by Hadley Wickham and Jennifer Bryan, but it also expands on it in various ways. Here, we aim only to cover those aspects of package development which a user most typically encounters during this process. The [R Packages](https://r-pkgs.org/) book goes into much more detail and is well worth reading in its entirety (particularly as it is short). | ||
|
||
Creating an R package is important because the fundamental unit of shareable code in R is a package. Packages can either be shared directly with collaborators or, if they are hosted on online repositories (such as [Github](https://github.com/)) and are accompanied by appropriate software licenses, others are able to use them. | ||
|
||
Creating an R package is not only useful for sharing code. Doing so also enforces good habits such as creating more modular code which is well-documented -- both of which are helpful for developing robust code which a future self will undoubtedly be thankful for. | ||
|
||
In this tutorial, we use a toy example to explain how to setup an R package. Whenever you are developing code, and especially if that code is intended for use by the community, it is important to take steps to minimise the risks of incorrect code. Unit testing is one way to achieve this, and in this tutorial, we explain how to write basic unit tests for your package. In R, it is also good practice to create a vignette (or multiple vignettes) which are typically pedagogical reference documents that explain to users how to use your package. The final part of this tutorial illustrates how to do this. |
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -0,0 +1,28 @@ | ||||
--- | ||||
name: "Setup" | ||||
dependsOn: [ | ||||
technology_and_tooling.r_packaging.index | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this depends on should just be empty
Suggested change
|
||||
] | ||||
tags: [rpackaging] | ||||
attribution: | ||||
- citation: > | ||||
Wickham, Hadley, and Jennifer Bryan. R packages. " O'Reilly Media, Inc.", 2023. | ||||
url: https://r-pkgs.org/ | ||||
license: CC BY-NC-ND 4.0 | ||||
|
||||
--- | ||||
|
||||
# Setup | ||||
|
||||
## Requirements | ||||
|
||||
To go through this tutorial, you need R installed on your machine, which can be freely downloaded from [R](https://www.r-project.org/). Whilst it is not prerequisite, we strongly suggest that you use the [RStudio](https://posit.co/download/rstudio-desktop/) IDE when developing your package since it helps to automate a number of steps (like automating the generation of docstrings for a function). | ||||
|
||||
You also need the [devtools](https://www.r-project.org/nosvn/pandoc/devtools.html) package installed on your machine which can be achieved by typing `install.packages(devtools)` from the R console. Internally, loading `devtools` also loads the `usethis` package which we will make extensive use of throughout this tutorial. | ||||
|
||||
## Before starting the tutorial | ||||
|
||||
1. Create a fresh folder where you intend to house an R package | ||||
2. Open up RStudio and change the working directory to be within the folder | ||||
3. Load `devtools` via `library(devtools)` (which loads `usethis`): | ||||
- this is the interface to a range of tools which greatly facilitate R package development |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
--- | ||
name: "Unit tests" | ||
teaching: 20 | ||
dependsOn: [ | ||
technology_and_tooling.r_packaging.function_documentation | ||
] | ||
tags: [rpackaging] | ||
attribution: | ||
- citation: > | ||
Wickham, Hadley, and Jennifer Bryan. R packages. " O'Reilly Media, Inc.", 2023. | ||
url: https://r-pkgs.org/ | ||
license: CC BY-NC-ND 4.0 | ||
--- | ||
|
||
## Unit tests | ||
|
||
We've tested our function `strsplit1` informally, but we would like to do so systematically to ensure that: the function works as intended across more general examples; and the function continues to work when we develop the package. To do so, we are going to create unit tests. | ||
|
||
A first step in this process is to call `use_testthat()`. This sets up the whole unit testing infrastructure used by the popular `testthat` package. You'll notice that by doing so, this creates a new folder in the package directory called `tests`. Within it are two things: a file called `testthat.R` -- generally, this file does not need to be manually changed; and a folder called `testthat` which will contain R files housing your unit tests. | ||
|
||
Call `use_test("strsplit1")` to create a file which contains a dummy example test. Try highlighting this code and running it interactively to check it runs ok. | ||
|
||
In developing a package, you will more typically do `test()` to run all of your unit tests. Try this | ||
- Note that tests are also run when `check()` is run | ||
|
||
Write a `test_that` function which checks that `strsplit1` can split a string with comma separation. The `testthat` library has a whole raft of testing functions but the most commonly used of these are `expect_equal` and `expect_true`. `expect_equal` checks that the output of running a function matches that of a known value; `expect_true` checks that a given statement is true. For example, `expect_true(2 == 2)` will execute without error whereas `expect_true(2 == 1)` will not. | ||
|
||
Write an additional separate test that checks if `strsplit1` can split a string with hyphen separation | ||
|
||
Try `test()` again to check that all unit tests pass. | ||
|
||
## Checking for appropriate function arguments | ||
|
||
It is good practice to check that a function's inputs are what are expected and to build your function to raise an informative error message if not. We are now going to modify the function to check that the input `x` is a character string and, if not, raise an error message. Our modified function is shown below: | ||
|
||
```R | ||
strsplit1 <- function(x, split) { | ||
|
||
if(!is.character(x)) | ||
stop("Input must be a character string") | ||
|
||
unlist(strsplit(x, split = split)) | ||
} | ||
``` | ||
|
||
We now want to check that, if an argument is supplied that is not a character string, then the function raises the appropriate error. To do so, write an additional test function in the same file that uses the following error check: | ||
|
||
```R | ||
expect_error(strsplit1(123, ","), "Input must be a character string") | ||
``` | ||
|
||
If calling `strsplit1(123, ",")` raises the error message "Input must be a character string" then the function has behaved as intended. | ||
|
||
Try `test()` again to check that all unit tests pass. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
name: "Package vignettes" | ||
teaching: 10 | ||
dependsOn: [ | ||
technology_and_tooling.r_packaging.dependencies | ||
] | ||
tags: [rpackaging] | ||
attribution: | ||
- citation: > | ||
Wickham, Hadley, and Jennifer Bryan. R packages. " O'Reilly Media, Inc.", 2023. | ||
url: https://r-pkgs.org/ | ||
license: CC BY-NC-ND 4.0 | ||
--- | ||
|
||
## Package vignettes | ||
|
||
Vignettes in R help others to understand the broader purpose of your package and how to use its functionality to solve real(ish) problems. Creating vignettes is really straightforward with the `usethis` package, and now we will create a really simple one. | ||
|
||
1. Call `use_vignette("playing_with_strings")` | ||
- You'll notice that this creates a folder called `vignettes` and within this folder there should be a Rmarkdown file with the correct name | ||
|
||
2. Edit the vignette, including both code and descriptive text to exemplify the use of your package for a new user of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we please put this within the "packaging_dependency_management" course as a separate stream on R packaging? we already have C++ and python packaging in there