Skip to content

Schema Evolution

Mark Feit edited this page Mar 27, 2018 · 2 revisions

This page covers test specifications because they're the most involved, but the same techniques apply to everything.

Why Schemas Need to Evolve

Say there's a test specification that starts life looking like this:

{
    "old-thing": "foo"
}

Time passes and one of the tools adds a feature that needs to become part of the test. The parameter for that feature might be called new-thing:

{
    "old-thing": "foo",
    "new-thing": "bar"
}

Older versions of the plugin can't simply ignore the extra parameters. For one, their JSON validators don't recognize the new-thing parameter and will treat with the same lack of validity as, say, ham-sandwich. For another, the submitter of a task will expect the plugin(s) to do whatever the new parameter directs. The only way to make sure the parameter is universally-accepted is to upgrade every instance of the plugin on the planet before it is used, which is not a practical alternative.

Schema Numbering

The way to make sure a given installation of pScheduler can handle changes to task specifications is to give each revision a number. This number is called a schema number and is, by convention, represented in JSON as a numeric pair called schema. The schema begins with 1 and is incremented each time the specification changes. To simplify the creation of JSON for the large number of items in the system that are unlikely to ever change, anything missing a schema is assumed to have a schema of 1.

Note that objects nested within the JSON may have different schemas:

{   
    "schema": 1,
    "test": {
        "type": "mytest",
        "spec": {
            "schema": 2,
            "old-thing": "foo",
            "new-thing": "bar"
        }
    }
}

This is usually done an outer object, such as the task specification above, has sections such as spec which can contain any valid JSON and other parts of the system such as plugins are used to validate them.

Development Practices for Handling Multiple Schemas

Validation

All of the JSON processed by pScheduler, at the core and in its plugins, is validated using JSON Schema. The validator original version of our strawman test specification would look like this:

{
    ...
    "MyTestSpecification": {
        "type": "object",
        "properties": {
            "schema": { "type": "integer", "enum": [ 1 ] }
            "old-thing": { "$ref": "#/pScheduler/String" }
        },
        "required": [ "old-thing" ]
    },
    ...
}

Once the specification undergoes its first revision, it is split into two copies, each given a version suffix, and tied together by making both of them valid:

{
    ...
    "MyTestSpecification_V1": {
        "type": "object",
        "properties": {
            "schema": { "type": "integer", "enum": [ 1 ] }
            "old-thing": { "$ref": "#/pScheduler/String" }
        },
        "required": [ "old-thing" ]
    },
    "MyTestSpecification_V2": {
        "type": "object",
        "properties": {
            "schema": { "type": "integer", "enum": [ 2 ] }
            "old-thing": { "$ref": "#/pScheduler/String" },
            "new-thing": { "$ref": "#/pScheduler/String" }
        },
        "required": [ "schema", "old-thing", "new-thing" ]
    },
    "MyTestSpecification": {
        "anyOf": [
            { "$ref": "#/pScheduler/MyTestSpecification_V1" },
            { "$ref": "#/pScheduler/MyTestSpecification_V2" }
            ]
        },
    ...
}

Note that the _V2 variant of the specification requires a schema and that it be 2 where the _V1 variant makes it optional because the default is 1.

Generation of JSON

Any code which generates JSON should produce the lowest schema value required for what it produced to be considered valid and not just apply the largest value possible.

The pScheduler library provides a HighInteger class that can be used to keep track of the highest schema required while producing JSON. For example:

spec = { }
schema = pscheduler.HighInteger(1)
if options.old_thing:
    spec["old-thing"] = options.old_thing
    # No need to set the schema here.  It's already 1.
if options.new_thing:
    spec["new-thing"] = options.new_thing
    schema.set(2)
if options.newer_thing:
    spec["newer-thing"] = options.newer_thing
    schema.set(3)

spec["schema"] = schema.value()

This code will produce JSON with a schema that reflects the set of options used. For example, if only newer_thing is present, the schema will be 3 because that option was not introduced until version 3 of the schema. Similarly, using only new_thing will produce a schema of 2 and neither will produce 1 because that's the highest required.

Parsing of JSON

All plugin methods using the json_load() function in the pScheduler library to parse JSON. The max_schema parameter will look for a top-level schema value less than or equal to the number specified and throw and exception or exit with an error if the input contains anything higher.

input = pscheduler.json_load(exit_on_error=True, max_schema=2)