-
Notifications
You must be signed in to change notification settings - Fork 34
Schema Evolution
This page covers test specifications because they're the most involved, but the same techniques apply to everything.
Say there's a test specification that starts life looking like this:
{
"old-thing": "foo"
}
Time passes and one of the tools adds a feature that needs to become part of the test. The parameter for that feature might be called new-thing
:
{
"old-thing": "foo",
"new-thing": "bar"
}
Older versions of the plugin can't simply ignore the extra parameters. For one, their JSON validators don't recognize the new-thing
parameter and will treat with the same lack of validity as, say, ham-sandwich
. For another, the submitter of a task will expect the plugin(s) to do whatever the new parameter directs. The only way to make sure the parameter is universally-accepted is to upgrade every instance of the plugin on the planet before it is used, which is not a practical alternative.
The way to make sure a given installation of pScheduler can handle changes to task specifications is to give each revision a number. This number is called a schema number and is, by convention, represented in JSON as a numeric pair called schema
. The schema
begins with 1
and is incremented each time the specification changes. To simplify the creation of JSON for the large number of items in the system that are unlikely to ever change, anything missing a schema
is assumed to have a schema
of 1
.
Note that objects nested within the JSON may have different schemas:
{
"schema": 1,
"test": {
"type": "mytest",
"spec": {
"schema": 2,
"old-thing": "foo",
"new-thing": "bar"
}
}
}
This is usually done an outer object, such as the task specification above, has sections such as spec
which can contain any valid JSON and other parts of the system such as plugins are used to validate them.
All of the JSON processed by pScheduler, at the core and in its plugins, is validated using JSON Schema. The validator original version of our strawman test specification would look like this:
{
...
"MyTestSpecification": {
"type": "object",
"properties": {
"schema": { "type": "integer", "enum": [ 1 ] }
"old-thing": { "$ref": "#/pScheduler/String" }
},
"required": [ "old-thing" ]
},
...
}
Once the specification undergoes its first revision, it is split into two copies, each given a version suffix, and tied together by making both of them valid:
{
...
"MyTestSpecification_V1": {
"type": "object",
"properties": {
"schema": { "type": "integer", "enum": [ 1 ] }
"old-thing": { "$ref": "#/pScheduler/String" }
},
"required": [ "old-thing" ]
},
"MyTestSpecification_V2": {
"type": "object",
"properties": {
"schema": { "type": "integer", "enum": [ 2 ] }
"old-thing": { "$ref": "#/pScheduler/String" },
"new-thing": { "$ref": "#/pScheduler/String" }
},
"required": [ "schema", "old-thing", "new-thing" ]
},
"MyTestSpecification": {
"anyOf": [
{ "$ref": "#/pScheduler/MyTestSpecification_V1" },
{ "$ref": "#/pScheduler/MyTestSpecification_V2" }
]
},
...
}
Note that the _V2
variant of the specification requires a schema
and that it be 2
where the _V1
variant makes it optional because the default is 1
.
Any code which generates JSON should produce the lowest schema
value required for what it produced to be considered valid and not just apply the largest value possible.
The pScheduler library provides a HighInteger
class that can be used to keep track of the highest schema required while producing JSON. For example:
spec = { }
schema = pscheduler.HighInteger(1)
if options.old_thing:
spec["old-thing"] = options.old_thing
# No need to set the schema here. It's already 1.
if options.new_thing:
spec["new-thing"] = options.new_thing
schema.set(2)
if options.newer_thing:
spec["newer-thing"] = options.newer_thing
schema.set(3)
spec["schema"] = schema.value()
This code will produce JSON with a schema
that reflects the set of options used. For example, if only newer_thing
is present, the schema
will be 3
because that option was not introduced until version 3 of the schema. Similarly, using only new_thing
will produce a schema
of 2
and neither will produce 1
because that's the highest required.
All plugin methods using the json_load()
function in the pScheduler library to parse JSON. The max_schema
parameter will look for a top-level schema
value less than or equal to the number specified and throw and exception or exit with an error if the input contains anything higher.
input = pscheduler.json_load(exit_on_error=True, max_schema=2)