Skip to content

Latest commit

 

History

History
339 lines (263 loc) · 23.8 KB

0018-bfd-release-versioning.md

File metadata and controls

339 lines (263 loc) · 23.8 KB

RFC Proposal

BFD needs a top-level, automation-friendly, human-readable, and overall meaningful version string to apply to releases and related artifacts. This will help:

  • release engineers maintain a change log to communicate enhancements and relative risk for deployment operations
  • product engineers reason about the sum of changes to the product over time
  • operations engineers target specific artifacts for deployments to various environments
  • automation engineers make automated processes automatic
  • performance engineers compare changes and relative performance
  • technical writers manage product documentation
  • product managers understand when features are reaching customers
  • customers anticipate new features as they become available
  • api users learn about and take advantage of both new and existing product features

Status

  • Status: Rejected
  • Implementation JIRA Ticket(s):

Table of Contents

Motivation

BFD already includes simple versions for certain aspects of the system but their utility is somewhat limited:

  • The FHIR API Server endpoint version is incremented to communicate backward incompatibility (v1 vs v2)
  • Pipeline ETL Manifest schema version also communicates backward incompatibility, but when/how this is incremented is somewhat arbitrary (v9 vs v10)
  • Database schema and migration script versions increments for each new script, featuring a requirement for backward compatibility between two consecutive versions (V100 and V101) but compatibility is not guaranteed between three (V100 and V102)
  • Jenkins build versions serve as a reference point to specific job executions, but are otherwise meaningless (#260 vs #261)

Though helpful, existing resource versions are insufficient for deriving relevant information for the change outside the specific contexts in which the versions are applied; in other words, these version strings are not semantic. However, it's possible to generate a meaningful top-level version string that can serve as a heuristic to facilitate managing, documenting, and communicating changes among BFD contributors and customers alike.

Non-Goals

Because the proposal already states the goals for introducing a meaningful release version string, it's helpful to state some non-goals or elements that are NOT motivating this proposal:

  • Internal Software Compatibility and Dependency Resolution: adopting a release version is for the express purpose of communicating change in its various forms, such as:
    • automated signaling for deployment operations
    • customer/user communications of feature availability
    • internal team communication for operations, planning, troubleshooting
    • product planning
  • General Software Versioning: this is for the byproducts of the software engineering process in a mono-repository to make sense of the changes introduced as part of releases and release artifacts, i.e. generated, deployable artifacts and reference material
  • Back-portability and Multi-Version Support: there are no such scenarios for multiple versions of BFD running simultaneously. This functionality is handled through separately versioned API endpoints.
  • Define a Release Management Process: this is just the first step in pursuing some wider release management process for BFD. Some release management scenarios have been used here to describe what's possible once we have a consistent versioning strategy in place, but this RFC does not seek to define any process beyond versions and component version resolution.

Proposed Solution

BFD uses a version string that shares the format described by [email protected] with practical deviations from the specification. Each component found in the familiar MAJOR.MINOR.PATCH format will be further described in the detailed design below, but the modified algorithm for calculating the components can be summarized as follows:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you add a database schema migration
MINOR version when you add non-database schema migration changes to the application source code
PATCH version when you add non-application source code changes
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

NOTE: "application source code" above refers to the source code directly responsible for the resultant Java Archive (JAR) artifacts

When applied, the version string clarifies release artifacts. As of March 2023, these artifacts should include:

Proposed Solution: Detailed Design

Deviations from a standard must be carefully considered before they are accepted by anyone. What follows are some justifications for the modified [email protected] component version algorithms that are consistent with the BFD system.

Incrementing MAJOR Component Version

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you add a database schema migration

BFD is a data-driven application. Changes that involve updates to the data's structure represent the most significant (major) and riskiest kinds of change. These changes have far-reaching impacts to the customer experience. Ultimately, what is served to the customer is dictated by what and how data is stored in BFD.

As a result, BFD already treats the schema and the scripts that define mutations of the database from one schema to the next as an intermediate artifact. Schema migration scripts are versioned with the assistance of flyway, helping engineers reason about the incremental changes to the database state over time. Engineers carefully prefix the scripts with simple, strictly incrementing version numbers that define the order in which they must be applied. At deployment time, the necessary scripts are applied to the relevant data stores in order for flyway and the respective database to match schema versions. This is performed by the bfd-db-migrator, which also serves to validate the application's expectations and understanding of the schema (i.e. the application's model) against the schema itself.

The existing, latest available version of well-vetted schema migration scripts is an attractive MAJOR version component. However, the traditional [email protected] strategy dictates that theMAJOR version component represents backward incompatible or breaking changes; changes to the database's schema, critical as they may be to the application's behavior, CAN NOT introduce backward incompatibility as a MAJOR change from either the contributor's or customer's perspective.

Backward incompatible changes from the customer's perspective are handled elsewhere. BFD's primary, customer-facing component is the BFD FHIR API service, bfd-server. And BFD customers interact with BFD through versioned API endpoints. As of March 2023, bfd-server supported two versioned endpoints: v1 and v2. The v2 endpoint was introduced to minimize disruptions to customers where breaking, backward incompatible changes were implemented. In other words, changes that are both backward-incompatible and customer-facing are only accepted into BFD through the addition of new, versioned API endpoints.

To summarize, under the traditional, [email protected] strategy, no natural, top-level MAJOR version can be adopted. Instead, we look to adjusting the standard. By adjusting the definition of what a MAJOR component version can be (i.e. coupling major versions to database schema versions), it seems obvious that we would want to promote the intermediate schema migration script artifact versions to a global, MAJOR version component. In this way, we can provide a MAJOR component version that's meaningful, relatively stable, and easy to automatically derive.

The following example script could be used to derive the `MAJOR` component version, manually or in CI contexts
#!/usr/bin/env bash
shopt -s expand_aliases

if [ "$(uname)" = 'Darwin' ]; then
    # compatibility for local development on macOS
    # requirements enforced through the following naive checks
    command -v gfind >/dev/null && alias "find=gfind" || echo "'gfind' not found; try 'brew install findutils'"
    command -v gsort >/dev/null && alias "sort=gsort" || echo "'gsort' not found; try 'brew install coreutils'"
    command -v gawk  >/dev/null && alias "awk=gawk"   || echo "'gawk' not found; try 'brew install gawk'"
    command -v gtail >/dev/null && alias "tail=gtail" || echo "'gtail' not found; try 'brew install gtail'"
    command -v gsed  >/dev/null && alias "sed=gsed"   || echo "'gsed' not found; try 'brew install gnu-sed'"
fi

REPO_ROOT="$(git rev-parse --show-toplevel)"
RELATIVE_MIGRATIONS_PATH="apps/bfd-model/bfd-model-rif/src/main/resources/db/migration"
MIGRATIONS_PATH="${REPO_ROOT}/${RELATIVE_MIGRATIONS_PATH}"

find "$MIGRATIONS_PATH" -type f -iname 'V*sql' -printf '%f\n' | sort --version-sort | tail -1 | awk -F '__' '{ print $1 }' | sed 's/[^0-9]//g'

Incrementing MINOR and PATCH Component Versions

Given a version number MAJOR.MINOR.PATCH, increment the:

MINOR version when you add non-database schema migration changes to the application source code
PATCH version when you add non-application source code changes

NOTE: "application source code" above refers to the source code directly responsible for the resultant Java Archive (JAR) artifacts

[email protected]'s traditional definition for the component versions is based on compatibility. Because breaking changes from the customer's perspective are only introduced into BFD through versioned API endpoints, the distinction between MAJOR and other version components no longer exist. This is an opportunity to update the MINOR and PATCH component version definitions to provide more detail. Especially with the mono repository strategy, formalizing another existing distinction between changes in the code base is helpful: application vs non-application changes.

The MAJOR version based on database schema version is already sufficiently distinct, but for the remaining changes, we split the component versions along the following lines:

  • excluding database migrations, changes that result in distinct Java archive (JAR) artifacts, result in an incremented MINOR version, e.g. features and fixes to bfd-db-migrator, bfd-pipeline, bfd-server
  • any non-application source code changes without direct impacts to the resultant JAR increment the PATCH version, e.g. independent infrastructure, configuration, documentation changes.

Why [email protected] at all?

Given the deviations from the standard in [email protected], it might be reasonable to ask: "why refer to it at all?" It's a fair question, but aside from the definitions of MAJOR, MINOR, and PATCH, much of the specification still applies:

  • incremental component version significance is still useful such that:
    • only the most significant version component is incremented in a given release, and less significant components are zeroed
    • given qualifying changes for MAJOR, MINOR, and PATCH in the same release, only the MAJOR version is incremented, e.g. 1.0.0 becomes 2.0.0 not 2.1.1
    • given qualifying changes for MINOR and PATCH in the same release, only the MINOR version is incremented, e.g. 2.0.0 becomes 2.1.0 not 2.1.1
  • specification allows for build metadata and pre-release extensions which keeps the format flexible
  • release precedence is codified and familiar

What's more, the widely understood MAJOR.MINOR.PATCH format is well-supported by third party tools and remains the de facto standard.

Simple Examples

What can we expect from the following notional scenarios?

Upgrade from 1.0.0 to 2.0.0
  • a deployment of this release is relatively higher risk
  • the release likely includes new functionality, draws more attention from customers
  • there's a database migration involved
  • [email protected] might include new functionality
  • [email protected] might include new functionality
  • [email protected] might include mapping changes
Upgrade from 2.0.0 to 2.1.0
  • this deployment has relatively moderate risk
  • this potentially includes new functionality, draws some attention from customers
  • [email protected] need not be deployed
  • [email protected] is unlikely to include mapping changes
Upgrade from 2.1.0 to 2.1.1
  • deployment of this has relatively low risk
  • no new functionality, draws less attention from customers
  • deployment likely limited to configuration that needs to be applied/re-applied
  • may be limited to updates to secondary services for monitoring and performance testing

Proposed Solution: Unresolved Questions

N/A

Proposed Solution: Drawbacks

None. There are no drawbacks to providing a pragmatic solution to versioning for this product.

  • Adoption does not require extensive engineering work, though it would be wise to only adopt this when the resolution and storage of versioned releases is available
  • If a truly adherent solution to [email protected] emerges, that strategy could simply pick up where the existing [email protected] compliant scheme left off, and begin the incremental releases based off of the traditional criteria
  • If an alternative versioning scheme is deemed superior, it should be trivial to transition away from this versioning scheme, provided that the alternative satisfies similar signaling requirements
  • In the unlikely event that release versions are deemed harmful, backing away from these changes will not be trivial after some amount of automation is constructed around it, but those changes still shouldn't be extensive.

Proposed Solution: Notable Alternatives

Per Artifact Software Versioning

A generalized solution for software versioning was a stated non-goal for this proposal. Nevertheless, if the BFD system were already taking advantage of meaningful versions for the system's components, communicating change could be all the more obvious by looking to the respective version numbers applied there.

Historically, BFD has avoided versioning artifacts like the Java-based JARs all together. In part, this was due to the complexities and overhead of maintaining consistent inherited version throughout the project. Fundamentally, BFD is as a maven project, implemented in Java, and structured with nested parent-child relationships. As of March 2023, this includes more than 30 pom.xml-defined modules. Here's one such nested parent-child relationship between bfd-parent, bfd-model, and bfd-model-rif:

apps
├── pom.xml (the "bfd-parent")
└── bfd-model
    ├── pom.xml ("bfd-model" is the child of "bfd-parent")
    └── bfd-model-rif
        └── pom.xml ("bfd-model-rif" is the child of "bfd-model")

For the majority of BFD's history, the version that bfd-parent and its descendants have used is 1.0.0-SNAPSHOT. While there were attempts to increment and fully utilize non-snapshot versioning over the course of this project's history, the last adjustment to bfd-parent's version field was made in August 2019.

Proper, per artifact versioning would be difficult to achieve consistently for any appreciable amount of time with the current tools and strategies; after its adoption, there would no doubt be an occasional version or dependency mismatch within the product, a problem that is all but unknown in today's usage of a static, 1.0.0-SNAPSHOT.

What's more, it's unclear what benefits per-artifact versioning might bring to BFD:

  • It's widely believed, that most changes to BFD impact most artifacts, i.e. most artifacts exhibit the same rate of change and regularity of deployments with common module updates, etc
  • BFD makes only modest use of artifact repositories, generally rebuilding everything from source
  • BFD follows a kind of immutable infrastructure pattern that calls for rebuilding the deployable artifact (AMIs) for each deployment

If engineers supporting BFD later determine that a per-artifact versioning strategy has demonstrable merit, thoroughly investigating and potentially adopting tools like Bazel as a maven replacement would be a good first step toward versioning these intermediate deployment artifacts.

A More [email protected] Adherent Solutions

It would have been simpler to point to [email protected] and suggest that BFD adopts that. Instead, we've only been able to identify one set of alternatives that might come closer to the [email protected] specification. This alternative revolves around adopting the single backward incompatible mechanism supported in BFD today: the FHIR API Server endpoint version.

This seems like an obvious, more adherent, and possibly better solution. However, it has similar issues to that of the proposed solution and other problems that the proposed solution avoids entirely.

The API version is very stable, and infrequently changing. While this is not problematic, it's just not as meaningful as the proposed solution's change on truly significant updates to the product inherent to the database schema version. In the end, semantic versions should be semantic (meaningful). If this strategy was already in-use when v2 first appeared, BFD's latest release would probably be something like 2.200.0.

Here are a handful of observations when comparing the proposed solution to this alternative:

  1. v2 has been supported for the past two years. How meaningful is it to remind all parties of the largely static v2 version's availability?
  2. While the feature flag in BFD is enabled for v2 endpoints by default, v2 support is still technically behind a feature flag as of March 2023. When should BFD truly adopt v2 as its major version?
  3. When changes are released that only include bug fixes to the otherwise static v1 endpoints, the PATCH seems almost inappropriate for a 2.x.x release.
  4. If there were changes introduced for initial support to a new v3 API endpoint via feature flags, at what point should the MAJOR version be incremented?

A blended variation of this and the proposed solution could be possible, e.g.

  • MAJOR reflects the latest FHIR API server endpoint
  • MINOR reflects the latest database migration script version
  • PATCH is reserved for everything else

However, this is still less meaningful, it fails to adhere to the [email protected] specification, and fails to address some of the more problematic observations above.

Non-Semantic Versions

There are no hard-and-fast rules to release versioning, especially without an opinionated artifact repository involved. There is little technical limitation to stop engineers from adopting names of animated characters or animal with alliterative adjectives, celestial bodies, tropical islands, capital cities, or other identifiers for product releases. But, if the overall desire is to facilitate and augment communication surrounding releases, this seems more likely to distract than focus our conversations.

Similarly, releases that are based on dates seem like they could be reasonable, but without some kind of semantic component, those release versions are little better than what we have today, which largely revolves around the vague recollection of date-time groups representing the time of deployments or creation time of resources in the path-to-production.

Prior Art

From the Wider Ecosystem

After briefly reviewing the ecosystem for release strategies, it appears that the systems that rely on BFD generally define their own releases using a combination of one or more of the following:

  • simple incremental versions (e.g. build numbers)
  • deployment dates
  • agile sprint/program increment boundaries

Future Possibilities

  • When there exists an acceptable versioning strategy, additional thought is needed to better define a release and change management process
  • Once the versioning strategy is adopted, automation opportunities abound:
  • Using JIRA releases could go further in helping plan releases in the future and illustrate relationships between the work and the releases, e.g.
    • which versions are impacted by a known bug?
    • which release should customers anticipate for a given feature?
    • when has a release been cut and fully deployed?

Addenda

The following addenda are required reading before voting on this proposal: