Guide: Repo Management

This document outlines the conventions we currently use for managing git repos. The wiki page Development Priorities describes the context of our development work for small teams working with academic stakeholders.

REPO STRUCTURE

Branch Naming/Tagging Conventions

We use a variation of git flow, adapted for our small team. Our branch structure looks like:

graph LR;
    %% Define nodes
    subgraph Main Branch
        A1[main]
        A2[main]
        A3[main]
    end
    
    subgraph Development Branch
        B1[dev]
        B2[dev]
        B3[dev]
    end

    %% Main branch on top
    A1 --> A2
    A2 --> A3

    %% Development branch below
    B1 --> B2
    B2 --> B3

    %% Feature branches and merges
    A1 -.->|branch| B1
    
    subgraph dev-sri/feature
        C1[commit1]
        C2[commit2]
    end
    
    subgraph dev-bl/feature
        D1[commit1]
        D2[commit2]
    end

    B1 -->|branch| C1
    C1 --> C2
    C2 -->|pull request| B2
    B2 -->|branch| D1
    D1 --> D2
    D2 -->|pull request| B3
    B3 -.->|admin merge| A2

    %% Styles for clarity
    style A1 fill:#f9f,stroke:#333,stroke-width:2px
    style A2 fill:#f9f,stroke:#333,stroke-width:2px
    style A3 fill:#f9f,stroke:#333,stroke-width:2px
    style B1 fill:#ccf,stroke:#333,stroke-width:2px
    style B2 fill:#ccf,stroke:#333,stroke-width:2px
    style B3 fill:#ccf,stroke:#333,stroke-width:2px
    style C1 fill:#eec,stroke-width:0px
    style C2 fill:#eec,stroke-width:0px
    style D1 fill:#eec,stroke-width:0px
    style D2 fill:#eec,stroke-width:0px

We have a main branch that has "the current working public release".
We have a working integration branch called dev.
We use feature branches with the prefix dev-[cc]/[feature-name], where [cc] are the initials for the feature lead/owner. For example, my prefix is dev-sri or dev-ds depending on the project. The [feature-name] is a short lower snake-case name to provide some context. There should be at least one - in the name.

Because we don't use CI/CD or have multiple QA/release/feature teams, this branch structure is usually sufficient for our needs. Our projects are "research prototype quality" and do not have a user base that requires uptime guarantees or service level agreements. That said, we do have some specific needs:

Tags are used to mark releases that are used by research groups for a particular field trial, typically in a classroom over several days. These tagged commits are expected to be "ready for field use".

We sometimes need to run multiple dev branches. For example:

dev-next is used for developing volatile features that could disrupt dev, so it's a fork of dev and maintains sync with it. Features that are developed in dev-next are branched from it, and PRs are committed back to it in the same manner as dev
Likewise, when working with other development groups that have write access to our repo, they get their own dev-[groupname] branch that works similarly to dev-next, where [groupname] is a short orgname between 4-7 letters. Individual contributors to dev-[groupname] use the prefix convention of dev-[cc]/[feature-name] as described above.

PROCESSES

Updating an Old Codebase

We fork the original repo and add a version to the name. For example, TheRaptLab/meme would be forked as TheRaptLab/meme-2023 to create work for the 2023 grant. By making a fork, we retain the link to the old repo but also have a fresh new wiki, issues, pull requests history rather than having it all cluttered with the old codebase.

STEPS

choose fork
choose TheRaptLab, then fork name to meme-2023
in repo settings, check collaborators and teams...seems to have carried over

PULL REQUESTS

As I recall, when you pull a forked version of meme, you have the option of issuing a pull request against the fork OR the upstream original repo.

README Conventions

The top-level of the repo contains a README that points to the WIKI (if it exists) with a quick installation method to get going, along with any copyright or credit information related to the research group that is conducting the grant.

Managing Issues

We use Github Issues to track anything code-related that may require some kind of action.

For bugs: create a new issue following the Guide: Reporting Bugs guidelines.
For feature requests: create a new issue describing the feature you'd like to see, and indicate how important it is. For a template, you can review the Guide: Scoping Features wiki page.

Managing Feature Branches

Feature branches have a single owner, designated by the branch name prefix dev-[cc]/, where [cc] is the 2- or 3-letter initials of a developers. The full feature branch is called dev-[cc]/[feature-name] as described above.

Tip

Feature Branch Names can make use of these initial prefixes:

fix- or revert- : fixes something that went wrong
hotfix- : time-critical patch for a release branch
feature- : implementation of a new feature
example- or ex- : example of something to share
build- or update- : related specifically to the build system
refactor- or cleanup- : related to non-functional code formatting

Feature branches are created from the latest authoritative dev branch, which is assumed to have the latest stable code in it. Authors work on their feature, committing their progress with commit messages that use the [feature-name]: prefix for each. This helps keep track of what commits are associated with which feature branch in our commit history, as the original branch is deleted once it's merged into its parent dev branch.

When requesting help with one's own feature branch, the helper can branch from the original and duplicate the name, adding their initials after it. For example, Sri might help Ben with his dev-bl/ui-factor work by branching dev-bl/ui-factor-sri-review, letting Ben know that their code suggestions are in that branch. Since this is work derived from dev-bl, the prefix is retained even though Sri is creating the branch. The logic behind this is to ensure that the branches sort together; by retaining the same prefix, it's easier to see that Sri's helper branch is related to the one above it. Sri would prefix her commit messages with [feature-name] as with any other feature branch. In this case, it would be something like "ui-refactor-sri-review: alternate way to write functional component for clarity".

Feature branches, being owned by a single person, can pushed/pulled freely by the owner. For contributed help, it is up to the owner to do the merging into their branch. For example, Ben would test-merge Sri's dev-bl/ui-factor-sri-review branch and decide whether to keep it or not.

Branch Updates as You Work

Assuming you are working on some kind of feature branch of some kind:

Before working on a task, the developer creates a new feature branch under their name e.g. dev-sri/add-controllers. This branch will eventually become part of the pull request.
As work is committed to the feature, each commit message has the name of the feature, either dev-sri/add-controllers or just add-controllers (we rely on the commit author to identify the developer)
Continue as long as this needed, occasionally pushing your branch upstream to GitHub.

If you need to start work on another unrelated feature, branch from dev and not from your current feature branch to avoid weird intertwined branches. If you can't, then you should submit your current branch as a PR to close the work, then start a new one to continue. I sometimes use a pattern like feature-2 or feature-stage2 when this case arises.

Pull Request Authoring

Once your feature (or whatever) branch is ready to merge, push it up to GitHub and start authoring the official Pull Request!

For this repo, a PR includes documentation for key decisions made, key features added, and other context helpful for developers to understand what it does and what it impacts in the project. Ideally this is from both a user perspective (what can be seen) and from the developer perspective (underlying implementation).

1. Create the Pull Request

A pull request is issued (e.g. on dev from dev-sri/add-controllers) on dev, with the title prefix DRAFT: or WIP: to prevent the PR from being inadvertently pulled. Provide a title that is succinct and unambiguous regarding the scope of the PR.

Warning

Make sure you're picking the right origin (the fork, not the fork's origin; github sometimes will default to the wrong one).

2. Add Pull Request Description

There are several parts that I like to use, using the H2 ## header to delineate each part after the first.

Description Part A: Concise Summary

The idea is to orient people quickly to the context and purpose of the PR. This saves time and headache.

The first two items are written from the point of what the user sees:

First sentence: what the PR accomplishes, referring any specific issues in the repo by link or past branches
Terse list of new things that you can see in the UI or program features

The second two items are for what developers need to know;

Terse list of breaking changes, referring to future section
Terse list of underlying technological changes that are invisible to users, referring to future section

Description Part B: Provide Testing Steps

This section provides step-by-step instructions from git pull to running the app so the PR reviewers can SEE the feature and confirm it works.

See closed pull requests for examples.

Description Part C and Onward: Technical Note(s)

Here we can get into the weeds, providing the stuff that developers need to know about. Each of the following can be its own section. Here are three examples:

BREAKING CHANGES - including mitigations
TECHNICAL BACKGROUND INFO - an overview of the technical operation of the system, and how this codebase modifies or enhances it, with examples ideally.
API REFERENCE - a terse functional API that lists all the commands, methods, etc as needed

Tip

For an idea of what Sri does for pull requests, see the closed pull requests page here in the repo. Here are the general steps:

Pull Request (PR) Submission

We use PRs when merging branches into an integration or main branch. In our simplified git flow, these are the main and dev branches. The order of PR merging starts with a feature-branch being PR'd into the dev integration branch. When dev has reached the "all features merged" state, it's then PR'd into main and tagged by the release manager (typically an empowered user) who is responsible for updating the installation and config docs, writing tag notes, and ensuring that repo is tested and running from a clean install.

TIPS

Use the DRAFT: or WIP: prefix in the PR's title. This is to prevent the PR from being accidentally merged for the title.
Double-check that you are merging against the correct repo and branch. Typically, you will be submitting it to a dev branch. If the repo you are working from is a fork, make sure that the repo target is your fork, not the upstream parent!!!
Ask another developer to test the PR and see if it works. Provide testing instructions as described above.
If there is a specific developer you want to notify that the PR is ready for review, assign that developer in the pull request so it appears in their queue.

Testing and Accepting the Pull Request

After submitting the PR and removing the DRAFT or WIP prefix, assign a reviewer in GitHub on the PR page. For our protected main and dev branches, we use this procedure:

The reviewer posts their bug reports in the comments of the PR following Guide: Reporting Bugs.
The reviewer also uses GitHub PR Review to leave comments on files, add notes.
The PR author makes fixes until everything seems to pass
When everything looks good, the reviewer uses GitHub PR Review to approve the PR

After the approval criterion is met, the pull request can be merged through the GitHub PR interface, and its associated branch will be deleted.

MANAGING NON-CODE DEPENDENCIES

Many of our projects involve media assets or other repos that we don't want to include due to size or for ease of compartmentalization/ownership responsibility. This is a developing standard, but here is what we've done so far:

Binary Assets

This includes images, videos, PDFs, other files. These asset types are often large and do not play nicely with Git's diff operations, and make the repo EXTREMELY LARGE. Here are the options we've explored:

✅ Dedicated Media Folder That Is GitIgnored - We like to use "easy to copy and paste" conventions to make content management more intuitive for our academic end users. These media folders can be archived using ZIP or similar compression and versioned on a cloud storage solution for distribution.
✅ Git Subrepo Cloned Inside Main Repo - While we as developers don't like using Git to store binary files, we are fine if researchers prefer to manage their assets that way to coordinate contributions from all over. Instead of the Media Folder being a zip file, add instructions to git clone the asset repo as a certain name inside our repo.
💙 Asset Server w/Manifest - A future possibility is URSYS utility servers that can apply versioning to "sets" of files. The manifest listing these files can contain URIs pointing to to multiple sources as "stacks" of filesets that are combined into a single archive bundle. Not unlike the way Docker Compose works.
🚫 Git Submodules - Unlike the Subrepo approach, Submodules use the git submodule command to add it explicitly. This allows repo operations to be done as part of managing the main repo. However, we find it cumbersome to use. In practice, managing binary assets has a very different workflow, so this is not a suitable approach.
🚫 Git Large File Storage - GitLFS is useful on paper, but because this is often a paid options we avoid it, as we avoid including required paid options or external servers in our projects as outlined in Development Priorities.

Multiple Packages

In some projects it's useful to have more than one package, providing a split between "server" and "client" code that share a "common library". Called a "monorepo", this kind of repo organization has a central packages directory that includes multiple subprojects. The main advantage we like is the ease of making changes across different subprojects to develop a feature, and this can be done as a single commit. By comparison, using Git Submodules or Subrepos requires multiple git commit operations that are easy to get out-of-sync.

We are currently using the built-in workspaces that's provided by current versions of Node's npm utility. While we're aware of the improvements in yarn regarding workspaces, we choose the most widespread technologies when possible in an attempt to have the greatest available pool of experts and public documentation. In the past, we've used lerna but it is deprecated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly