Skip to content

Commit

Permalink
title change typo to R + white space formating
Browse files Browse the repository at this point in the history
  • Loading branch information
chendaniely committed Dec 7, 2024
1 parent 9152eb2 commit 925c831
Showing 1 changed file with 46 additions and 46 deletions.
92 changes: 46 additions & 46 deletions book/lectures/142-R-testing-example.qmd
Original file line number Diff line number Diff line change
@@ -1,54 +1,54 @@
---
title: Python testing example
title: R testing example
---

### Example of workflow for writing functions and tests for data science

Let's say we want to write a function
for a task we repeatedly are performing in our data analysis.
For example, summarizing the number of observations in each class.
This is a common task performed for almost every classification problem
to examine how many classes there are to understand if we are facing a binary
or multi-class classification problem,
as well as to examine whether there are any class imbalances
Let's say we want to write a function
for a task we repeatedly are performing in our data analysis.
For example, summarizing the number of observations in each class.
This is a common task performed for almost every classification problem
to examine how many classes there are to understand if we are facing a binary
or multi-class classification problem,
as well as to examine whether there are any class imbalances
that we may need to deal with before tuning our models.

#### 1. Write the function specifications and documentation - but do not implement the function:
The first thing we should do is write the function specifications and documentation. This can effectively represented by an empty function and `roxygen2`-styled documentation in R as shown below:

```{r}
#' Count class observations
#'
#' Creates a new data frame with two columns,
#' Count class observations
#'
#' Creates a new data frame with two columns,
#' listing the classes present in the input data frame,
#' and the number of observations for each class.
#'
#' @param data_frame A data frame or data frame extension (e.g. a tibble).
#' @param class_col unquoted column name of column containing class labels
#'
#' @return A data frame with two columns.
#' @return A data frame with two columns.
#' The first column (named class) lists the classes from the input data frame.
#' The second column (named count) lists the number of observations
#' The second column (named count) lists the number of observations
#' for each class from the input data frame.
#' It will have one row for each class present in input data frame.
#'
#' @export
#' @examples
#' count_classes(mtcars, am)
count_classes <- function(data_frame, class_col) {
count_classes <- function(data_frame, class_col) {
# returns a data frame with two columns: class and count
}
```

#### 2. Plan the test cases and document them:

Next, we should plan out our test cases and start to document them.
At this point we can sketch out a skeleton for our test cases with code,
but we are not yet ready to write them,
as we first will need to reproducibly create test data
that is useful for assessing whether your function works as expected.
So considering our function specifications,
some kinds of input we might anticipate our function may receive,
Next, we should plan out our test cases and start to document them.
At this point we can sketch out a skeleton for our test cases with code,
but we are not yet ready to write them,
as we first will need to reproducibly create test data
that is useful for assessing whether your function works as expected.
So considering our function specifications,
some kinds of input we might anticipate our function may receive,
and correspondingly what it should return is listed below:

##### Simple expected use test case #1
Expand Down Expand Up @@ -85,7 +85,7 @@ Dataframe (or tibble)

##### Simple expected use test case #2

- Dataframe with 2 classes, with 2 observations for one class,
- Dataframe with 2 classes, with 2 observations for one class,
and only one observation in the other

*Function input:*
Expand Down Expand Up @@ -199,7 +199,7 @@ class_labels
Error

```{r, eval=FALSE}
Error :
Error :
`data_frame` should be a dataframe or dataframe extension (e.g. a tibble)
```

Expand All @@ -215,21 +215,21 @@ With `testthat` we create a `test_that` statement for each related group of test
library(testthat)
test_that("`count_classes` should return a data frame, or tibble,
with the number of rows corresponding to the number of unique classes
in the `class_col` from the original dataframe. The new dataframe
will have a `class column` whose values are the unique classes,
and a `count` column, whose values will be the number of observations
test_that("`count_classes` should return a data frame, or tibble,
with the number of rows corresponding to the number of unique classes
in the `class_col` from the original dataframe. The new dataframe
will have a `class column` whose values are the unique classes,
and a `count` column, whose values will be the number of observations
for each class", {
# "expected use cases" tests to be added here
})
test_that("`count_classes` should return an empty data frame, or tibble,
test_that("`count_classes` should return an empty data frame, or tibble,
if the input to the function is an empty data frame", {
# "edge cases" test to be added here
})
test_that("`count_classes` should throw an error when incorrect types
test_that("`count_classes` should throw an error when incorrect types
are passed to the `data_frame` argument", {
# "error" tests to be added here
})
Expand Down Expand Up @@ -307,11 +307,11 @@ These are fall-back expectations that you can use when none of the other more sp
```{r}
#| error: true
test_that("`count_classes` should return a data frame, or tibble,
with the number of rows corresponding to the number of unique classes
in the `class_col` from the original dataframe. The new dataframe
will have a `class column` whose values are the unique classes,
and a `count` column, whose values will be the number of observations
test_that("`count_classes` should return a data frame, or tibble,
with the number of rows corresponding to the number of unique classes
in the `class_col` from the original dataframe. The new dataframe
will have a `class column` whose values are the unique classes,
and a `count` column, whose values will be the number of observations
for each class", {
expect_s3_class(count_classes(two_classes_2_obs, class_labels),
"data.frame")
Expand All @@ -321,15 +321,15 @@ for each class", {
two_classes_2_and_1_obs_output, ignore_attr = TRUE)
})
test_that("`count_classes` should return an empty data frame, or tibble,
test_that("`count_classes` should return an empty data frame, or tibble,
if the input to the function is an empty data frame", {
expect_equal(count_classes(one_class_2_obs, class_labels),
one_class_2_obs_output, ignore_attr = TRUE)
expect_equal(count_classes(empty_df, class_labels),
empty_df_output, ignore_attr = TRUE)
})
test_that("`count_classes` should throw an error when incorrect types
test_that("`count_classes` should throw an error when incorrect types
are passed to the `data_frame` argument", {
expect_error(count_classes(two_classes_two_obs_as_list, class_labels))
})
Expand All @@ -346,14 +346,14 @@ FINALLY!! We can write the function body for our function! And then call our tes
```{r}
#' Count class observations
#'
#' Creates a new data frame with two columns,
#' Creates a new data frame with two columns,
#' listing the classes present in the input data frame,
#' and the number of observations for each class.
#'
#' @param data_frame A data frame or data frame extension (e.g. a tibble).
#' @param class_col unquoted column name of column containing class labels
#'
#' @return A data frame with two columns.
#' @return A data frame with two columns.
#' The first column (named class) lists the classes from the input data frame.
#' The second column (named count) lists the number of observations for each class from the input data frame.
#' It will have one row for each class present in input data frame.
Expand All @@ -380,11 +380,11 @@ count_classes <- function(data_frame, class_col) {
:::

```{r}
test_that("`count_classes` should return a data frame, or tibble,
with the number of rows corresponding to the number of unique classes
in the `class_col` from the original dataframe. The new dataframe
will have a `class column` whose values are the unique classes,
and a `count` column, whose values will be the number of observations
test_that("`count_classes` should return a data frame, or tibble,
with the number of rows corresponding to the number of unique classes
in the `class_col` from the original dataframe. The new dataframe
will have a `class column` whose values are the unique classes,
and a `count` column, whose values will be the number of observations
for each class", {
expect_s3_class(count_classes(two_classes_2_obs, class_labels),
"data.frame")
Expand All @@ -394,15 +394,15 @@ for each class", {
two_classes_2_and_1_obs_output, ignore_attr = TRUE)
})
test_that("`count_classes` should return an empty data frame, or tibble,
test_that("`count_classes` should return an empty data frame, or tibble,
if the input to the function is an empty data frame", {
expect_equal(count_classes(one_class_2_obs, class_labels),
one_class_2_obs_output, ignore_attr = TRUE)
expect_equal(count_classes(empty_df, class_labels),
empty_df_output, ignore_attr = TRUE)
})
test_that("`count_classes` should throw an error when incorrect types
test_that("`count_classes` should throw an error when incorrect types
are passed to the `data_frame` argument", {
expect_error(count_classes(two_classes_two_obs_as_list, class_lables))
})
Expand Down

0 comments on commit 925c831

Please sign in to comment.