Project Standards

Overview

This document explains the project and programming conventions used for library(syntheval) and the evaluation of synthetic data. The document is a work-in-progress and should be updated as conventions are created or changed.

Principles

This project is heavily inspired by library(tidyverse) and library(tidymodels).

tidyverse

This project aims to follow the four guiding principles outlined in the tidytools manifesto:

Reuse existing data structures
Compose simple functions with the pipe
Embrace functional programming
Design for humans

Building smaller packages that handle discrete tasks instead of large packages that do everything is clearly a tidytools principle that is not listed. Our eventual goal is to reflect this design.

tidymodels

library(tidymodels) weds the unified modeling interface of library(caret) with tidy principles. Conventions for R Modeling Packages is a draft outline of principles to library(tidymodels). Here are a few important principles:

All results should be reproducible from run-to-run
Retain the minimally sufficient objects in the model object.
Every class should have a print method that gives a concise description of the object.

syntheval

All utility evaluations should start R/util_*.R, with * naming the functions or group of functions used.
All disclosure risk evaluations should start R/disc_*.R, with * naming the functions or group of functions used.
All evaluation metrics should accept an eval_data object as the first input.

Project Organization

Directories

Directories add structure to a project and make it possible to turn syntheval into library(syntheval)

R/ contains R functions as .R scripts
man/ contains .Rd documentation files. No manual editing should happen in this directory.
tests/ contains unit tests for functions

Documentation

There are several important places where documentation is captured:

The README contains information specific to the code base
roxygen2 skeletons contain information specific to functions
Some .R scripts contain in-line comments clarifying code

Out-of-date and incorrect documentation can be more damaging than no documentation at all. It is important that documentation is updated when changes are made. Check all of the above places after making changes to code.

Development Workflow

Open a GitHub issue
Checkout a new branch named iss### that corresponds to the related issue
Update the code
Build necessary tests for new code and updating existing tests for code changes
Run devtools::document() to update package documentation and the package NAMESPACE
Build and install the package (with Ctrl-Shift-b if using RStudio)
Run R CMD check (with Ctrl-Shift-E if using RStudio) and resolve any issues.
Push the code and put in a Pull Request to the version#.#.# branch. Request at least one reviewer for any changes to code or documentation.
Delete the remote branch (and possibly the local branch) when all changes are merged into the master branch
From time-to-time, new releases will be moved from version#.#.# to main. The main branch should be stable at all times and updated according to a release schedule.

Note: do not use devtools::load_all(.).

Note: use git merge master, not git rebase master if your Pull Request falls behind the master branch of the repository. This preserves the commit history.

Releases

Major changes should be tracked in NEWS.md. library(parnsip) is a good example.
Changes on the version#.#.# branch should be tracked at the top under the header syntheval (development version).
We are using semantic versioning.

Styles

Code Style

The project follows the tidyverse style guide.

One major exception is that all functions should include return() at the end of the function.

Package NAMESPACEs should be directly referenced with :: in all production code including R Markdown reports.

Argument names should be explicitly included in all function calls from library(syntheval). Arguments other than data or x should be explicitly included for most other function calls.

The tidyverse style guide is light on details about vertical spacing. Vertical spacing should be liberally used. For example:

if (x > 3) {

  "apple"

} else {

  "orange"

}

This project takes a functional programming approach. Functions should be heavily used. Each function should get its own .R script in the R/ directory.

Functions should be referentially transparent. Values and data should always be explicitly passed to the function through function arguments so that a function always returns the same output for a given set of arguments regardless of the environment.

Hard coding of values should be avoided in functions. When possible, values should be parameterized.

The project uses .Rproj to manage directory paths. setwd() and absolute file paths should never be used.

Naming Conventions

roxygen2

Every function should include a roxygen2 header.

The first line of the documentation should be a concise description of the function without a full stop
Every argument of the function should be documented with @param. Text should be in sentence case and end in a full stop.

Assertions and In-line Errors

Assertions, things expected to always be true about the code, should be tested in-line. healthinequality-code offers some good background.

Functions should contain logical tests that catch glaring errors when functions are called. Consider the following example from visit_sequence():

  valid_types <- c("default", "correlation", "proportion", "weighted total",
                   "weighted absolute total")
  
  if (!type %in% valid_types) {
    stop(
      "Error: 'type' argument must be one of: ",
      paste0(valid_types, collapse = ", ")
    )
  }

Tests

Whenever you are tempted to type something into a print statement or a debugger expression, write it as a test instead. — Martin Fowler

Every function should include a corresponding test file in tests/testthat/.

Use usethis::use_testthat() to create a new test file for library(syntheval). Test files have three layers:

expectations describe the expected result of a computation
tests are collections of expectations related to the same functionality
files are groups of related tests

Consider the following example from Advanced R:

context("String length")
library(stringr)

test_that("str_length is number of characters", {
  expect_equal(str_length("a"), 1)
  expect_equal(str_length("ab"), 2)
  expect_equal(str_length("abc"), 3)
})

test_that("str_length of factor is length of level", {
  expect_equal(str_length(factor("a")), 1)
  expect_equal(str_length(factor("ab")), 2)
  expect_equal(str_length(factor("abc")), 3)
})

test_that("str_length of missing is missing", {
  expect_equal(str_length(NA), NA_integer_)
  expect_equal(str_length(c(NA, 1)), c(NA, 1))
  expect_equal(str_length("NA"), 2)
})

Our workflow:

Every function should have tests. Write tests before writing a new function.
Develop code. Add tests as functionality changes.
Always run the tests after building the package with devtools::test()

A few suggestions:

Always write a test when you discover a bug
Test each behavior once and only once--if possible
Test simple code. Spend even more time testing complex/fragile code

Tests will focus on if correct values are returned by a function, if the return values are of the right class, and if error messages are thrown when necessary. The test workflow will also catch warnings and errors from all code called in the code base.

Here are common expect_*() functions:

expect_equal()
expect_identical()
expect_match()
expect_output()
expect_warning()
expect_error()
expect_is()
expect_true()
expect_false()

Note: do not use devtools::load_all(.) in test files.

Assertions should be used to catch user errors or unexpected results. Tests should be used to catch design errors and errors in the code base.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project-standards.md

project-standards.md

Project Standards

Overview

Contents

Principles

tidyverse

tidymodels

syntheval

Project Organization

Directories

Documentation

Development Workflow

Releases

Styles

Code Style

Naming Conventions

roxygen2

Assertions and In-line Errors

Tests

Files

project-standards.md

Latest commit

History

project-standards.md

File metadata and controls

Project Standards

Overview

Contents

Principles

tidyverse

tidymodels

syntheval

Project Organization

Directories

Documentation

Development Workflow

Releases

Styles

Code Style

Naming Conventions

roxygen2

Assertions and In-line Errors

Tests