A schema defines information about the type of some value. It may describe an atomic type (“integer”, “boolean”) or types which are aggregates of other types (“record”, “sequence”). The language forms we choose to use to provide this information defines a schema system (essentially itself a “meta schema”). moo supports a multiple schema systems but the predominant one is called oschema.
Many schema systems will specify a persistent representation (file format) for expressing a schema. Eg, XML has XSD and JSON has JSON Schema (itself expressed as JSON). moo oschema system is defined in terms of a transient representation (structured data in memory). Thus, a variety of persistent representations may be used to provide oschema schema. Users may select file formats they know and love if they wish. Below we will describe two (Jsonnet and Python) for which moo provides direct support.
moo oschema is defined conceptually starting with a fixed set of schema classes that are listed below. A moo oschema type (called in some parts of moo an otype) is considered an instance of exactly one schema class. Dropping down one rung of the semantic ladder, a model (aka “value”) is an instance of a type.
The moo oschema schema classes are summarized:
boolean
- a type which may take value “true” or “false”
number
- a numeric type of given format and size and optional numeric constraints
string
- a character string type possibly matching some pattern or format
sequence
- an array or vector with elements of one type
tuple
- (not yet supported, but you can guess what it will be)
record
- a collection of
fields
(named type references) directly held or indirectly held by zero or more records named asbases
. enum
- a type that may take one value from a predefined, limited set of values
any
- a type that may take a value of any type (eg such as
void*
,boost::any
,nlohmann::json
) anyOf
- a type that may take a value of any type in a predefined, limited set of types.
namespace
- a collection of named types, (distinct from
record
to match eg C++/Python semantics)
Every oschema type then provides a set of attributes as determined by its schema class. Some attributes are common across all schema classes and may be required or optional:
name
- (required) type name unique to the type context (see
path
) schema
- (required) string identifying the schema class taken from above list
doc
- (optional, default empty) document string briefly describing the type
path
- (required, potentially empty), ordered array of names
representing the absolute context of the type (eg as a C++
namespace
or Python module path).
The following sections go through this list of schema classes in more detail.
A type in moo schema is considered atomic if it holds no references to other types. Some details of the atomic types are provided in this section.
An instance of the boolean
schema class is a type that may hold a true
or false value.
Instances of the moo number
schema class describe numeric types in
terms of representation size, format and optional constraints.
The size and format is specified as code which should be a valid
numpy.dtype
. For example, i8
is a 8 byte (not bit!) signed integer
type and f4
is a single-precision floating point type. Numpy supports
a wide variety of “spelling” of the format and size codes however moo
schema is restricted to 2-character dtype
codes.
The dtype
is intended for codegen. For validation, a number
schema
instance may also supply numeric constraints with names matching those
from JSON Schema numeric types. Specifically:
multipleOf
- a valid value must be a multiple of the given number
exclusiveMaximum
- a valid value must be strictly less than the given number
exclusiveMinimum
- a valid value must be strictly more than the given number
maximum
- a valid value must be less than or equal to the given number
minimum
- a valid value must be more than or equal to the given number
Some types will hold references to one or more other types. These are
aggregate types. A type reference is represented as a fully-qualified
type name (FQTN) which is formed as the dot-separated concatenation of
the elements of path
(if any) and the name
. A FQTN shall not begin
nor end with a period. As the path
is absolute there is no concept of
a “relative” type reference. See the section Namespace below for
related concepts.
Thus, every type exists at an absolutely determined location in a
name hierarchy, and it carries this location with it as path
+
name
. Types that must reference another type do so by its path
and name
. Of course, to resolve a reference the referred type must
be available in order to match the type reference against possible
path
and name
. For this reason a schema is considered
complete if every type referenced by any of the schema’s types are
also included.
The moo record
schema class is analogous to Python class
or C++ struct
but with it we may only define data members, which in moo are called
fields
, and not methods.
Each field
is itself a small data structure but moo does not
considered it a first class “type”. A field
merely associates the
following information in the context of the record
:
name
- unique identifier (native type string type)
item
- reference the type of the value that the field represents.
default
- optionally provide a value directly in the schema (see note)
doc
- optionally provide a brief description of the field.
optional
- optionally indicate if this field is optional (
true
value) or required (false
).
When optional values are not provided, the attribute will be omitted
from the resulting record
structure.
A record
may also be constructed with an attribute called bases
which
provides an array of zero or more references to other record
types.
The fields
of all record
types referenced in bases
should be
considered held by the record
itself. That is, bases
provides a
simple form of object inheritance.
Like all aspects of schema, it is up to a consumer of the schema to
determine how to reflect this inheritance information. As examples,
the ostructs.hpp.j2
template, described more below, directly reflects
bases
into a list of base struct
’s in simple C++ inheritance.
Somewhat differently the onljs.hpp.j2
template which provides
serialization an ostruct.hpp.j2
generated struct
and nlohman::json
representations interpret bases
simply as providing additional fields.
Thus it enacts a “duck-typed” interface between the type-free JSON
object and the strongly-typed C++ struct
.
The any
schema class provides the “type erasure” pattern. That is an
any
type may hold any type. It is like a void*
in C or a std::any
in
C++. As it may represent any type it holds no type information other
than name
and doc
.
With the ostructs.hpp.j2
and onljs.hpp.j2
templates, the any
type is
mapped to a nlohmann::json
type. This can be used to delay
serialization of specific parts of a record
type until the instance
can be passed into some more specifically typed C++ context.
Three similar aggregate schema classes are: anyOf
, allOf
and oneOf
.
Each type is a collection of references to other types for which
anyOf
- any may apply
allOf
- all apply
oneOf
- exactly one may apply
As always, how they reflect depends on the consumers and not all may
have useful meanings in all contexts. Generally, all three have
meaningful reflections in a validation context. Indeed they are in
moo in order to support JSON Schema validation. The oneOf
could be
considered to map to union types such as std::variant
. The allOf
could be considered to represent a component with all of the required
interfaces. Given those two definitions, anyOf
lacks an obvious use
for code generation.
Every type is defined in a context called a path
. Conceptually, a
namespace
is a path
shared by all types “in” or “under” the namespace.
moo provides Python and Jsonnet helper code which provides a namespace
as a type factory constructing types in a given namespace.
The most simple way to express a set of moo types is as a schema
array. This is simply a Jsonnet or JSON array or Python list
holding
moo type structures. Generally, schema arrays must be topologically
sorted so that no type structure references any other type later than
itself in the array. moo provides Python and Jsonnet code to perform
this topological sort and examples are given below.
This section describes how to define a schema using either the Jsonnet or the Python programming language as a persistent representation. It also describes how to convert moo schema (from any format) into other schema systems in particular JSON Schema.
To illustrate some of the patterns seen in real, large-scale projects the example will factor the schema into two parts:
- sys
- a set of types relevant to some system (eg a framework)
- app
- a set of types relevant to an application based on the system
moo oschema may be described easily in the Jsonnet language as Jsonnet was designed for defining data structures.
We start by simply presenting the sys
schema:
Let’s walk through this short example line by line:
This shows Jsonnet’s “module” system in action. A file is loaded and its
contents available via the moo
variable. The moo.jsonnet
file holds
various Jsonnet functions and data that will help build our oschema.
This call of the schema()
function of the moo.oschema
object
returns another object held by the local variable sys
which will
provide sort of a “schema factory” that operates “in” the type path of
“sys”. We see it in action:
This last line defines the data structure which is the “return” value
of the entire sys.jsonnet
file. That is, this file “compiles” to an
array holding a single oschema type.
We can see how the sys schema expands or compiles to a JSON representation using moo:
moo compile examples/oschema/sys.jsonnet
[
{
"deps": [],
"dtype": "u4",
"name": "Count",
"path": [
"sys"
],
"schema": "number"
}
]
This is then a schema with a single type called Count
which is of
schema class number
that has numeric Numpy-style type code dtype
of an
unsigned integer in four bytes "u4"
and is in a context path
of simply
["sys"]
.
Next, we imagine an application schema with a more rich set of types:
We will go through some of these lines of Jsonnet in order to explain some of the forms. Starting with the first few lines:
As with sys
we import the support module moo.jsonnet
. We also import
the sys schema that we made above. Thus, sa
holds a single-element
array.
Next we call the moo.oschema.hier()
method on this array. This
transforms the ordered array of types into an (unordered) object where
each type is available by its name. We’ll see sh
in use next.
Finally for this preamble, we make another “schema factory” which is
this time “in” the path: app
. We’ll use as
many times to build up
elements of our schema.
Next we build our sys schema and do so inside a “working object”. This lets us use Jsonnet language feature to refer to one of our types in another. Let’s look at the start:
Here we start our object and save it in a local variable hier
and give
it an initial entry. The key name counts
is a temporary convenience
and the value is what will ultimately matter. The type we make is a
sequence that references the sys.Count
type made in the sys schema.
This shows how “cross schema” references can be made.
The next set of types are nothing special but illustrate how instances of some of the different schema classes in Jsonnet are made:
Next, let’s jump to the definition of the Person
type which begins with:
Now we define an instance of the record
schema class:
In addition to showing how an instance of a record
schema class this
example shows how to reference types within our “working object”
through a self
object. We will skip the res of our “working object”
other than to say that this Vehicle
type is referenced in the Person
type described in the remaining and that this shows an example of
“nested” records.
We end the app schema file by producing a “sorted” array of types:
Like any oschema, this array must include all of the types needed to
satisfy any type references and in an order such that any referenced
types come first. That is, a topological sort must be applied to the
graph built between types and their references. This is not so
trivial of an operation, but the moo
Jsonnet support provides the
required algorithm. As the sys schema is simple and independent from
app by construction we merely prepend it.
Sparing the long output here, the full schema compiled to JSON can be produced with this command:
moo compile examples/oschema/app.jsonnet
moo provides support for defining oschema in Python which has some similarities to its Jsonnet support. However, Python is a far more expressive language than Jsonnet and thus moo Python support provides more options to the developer.
Like the Jsonnet support there are layers of representation of schema information, transformations between the layers. These are summarized:
- Object representation
- the
moo.oschema
module provides a set of Python classes, each associated with one oschema class and object instances of which represent types. - POD representation
- the plain-old-data representation corresponds closely to JSON. In fact, one may copy-paste an oschema type in JSON representation into a Python file and it works as POD. POD and Object representations can be inter-transformed.
Like with the Jsonnet “schema factory” object, a moo.oschema.Namespace
may be constructed with a path
and then be used to construct type
objects which are “in” that path.
Python is far more expressive than Jsonnet and that leaves the developer many choices how to work with oschema representations in Python. The example presented here make some reasonable choices that systems may adopt but other approaches are certainly possible.
In general we will make files which are analogous to the Jsonnet files
but which may imported as Python modules. We take the convention that
a .schema
module variable will hold the array of types. These types
will be in Object form.
Starting with the simple sys schema we can make something like:
The app schema is structured similarly, if a fair bit longer.
Comparing this to the app.jsonnet
above, one can see it is a near
transliteration of syntax and so we will not dwell on the details.
But, one thing to call out is that like with app.jsonnet
we must
prepend the sys schema array to our result and perform a topological
sort on the types we make here. The sort is provided by
moo.oschema.depsort
and we play a bit of a Python trick to collect all
the oschema type objects made in the module using a filter on
globals()
.
- add an
import
based Python loader tomoo
moo provides support to convert a moo oschema into JSON Schema schema form. To do this, we must specify the fully-qualified type reference, a moo schema array containing the referenced type and any others it references. An instance of a JSON Schema may also specify an unique identifier usually in the form of a URL.
On the command line this may look like:
moo -A msa=examples/oschema/app.jsonnet \
-A typeref=app.Person \
compile moo2jschema.jsonnet | jq '."$ref"'
"#/definitions/app/Person"
Here we pipe to jq
just to filter down the rather verbose result and
show the command works.
The main use of moo
is to apply a data structure (“model”) to a
template in order to generate a file (eg, a C++ header file). The
template must have an understanding (“contract”) of the structure of
the model. The “oschema” structure described here is likely not
enough information, or not in a convenient form, for templates to be
easily defined.
We must then have means to transform and possibly augment the
initial data structure into a model expected by the template and
moo
supports several strategies to supply that.
In some cases, transformation and augmentation can may be done at the
input data structure level (ie, in Jsonnet). moo
“supports” this in
general by not restricting the structure of the input data. Users are
free to come up with their own solutions. Typically this requires
accepting a fluid contract between models and templates as one
iterates both.
In the case of using Jsonnet to describe the input data structure, the
moo
CLI supports the passing “top level arguments” (TLA) to the
Jsonnet code. This requires the Jsonnet to evaluates to a top level
function.
This simple example shows how TLAs work:
moo -A arg="hi" compile examples/oschema/tla.jsonnet
{
"arg": "hi",
"def": "default"
}
As shown, multiple TLAs may be used and default TLA values may be
given in the Jsonnet and omitted on the CLI. A TLA may also be
specified as a Jsonnet file in which case the contents of that file
will be evaluated and the resulting structure passed to the top level
function. Reusing the above example and the sys
schema file:
moo -A arg=examples/oschema/sys.jsonnet compile examples/oschema/tla.jsonnet
{
"arg": [
{
"deps": [],
"dtype": "u4",
"name": "Count",
"path": [
"sys"
],
"schema": "number"
}
],
"def": "default"
}
Thus, with TLAs one may construct a somewhat general Jsonnet file that transforms and augments some initial data structure to something more specific.
TLAs have one more useful trick. The function to which TLAs are given is just a normal Jsonnet function. This gives us the option to “bake in” some TLAs in a Jsonnet file that calls the original function. Consider:
Thus we may get the same output with a simpler command:
moo compile examples/oschema/tla-sys.jsonnet
{
"arg": [
{
"deps": [],
"dtype": "u4",
"name": "Count",
"path": [
"sys"
],
"schema": "number"
}
],
"def": "default"
}
This pattern of “baking in” of TLAs can be useful, for example, if one
has a codegen system where a package must supply the specific
information but otherwise relies on a common model. By “hiding” the
TLAs to that model in a Jsonnet file, the build system layer can be
made simpler. See build sys document for some pointers on integrating
moo
into popular build systems.
Jsonnet is a “small” language (one of its charms) and some model
transformations may be complex enough that its simplicity poses a
limitation. moo
thus allows transformations to be defined in Python
and this opens up the ability to form the model in a more strong and
object-oriented manner.
To do this, the user may tell the moo
CLI to apply a Python function
to transform the input data just prior to the application of the
template. The function is specified as a “dot” path naming the
function in its module. We may use the moo dump
command to
illustrate a transformation applied to the app
schema:
moo -M examples/oschema -t moo.oschema.typify dump -f pretty app.jsonnet
[<Any "app.Affiliation">, <Sequence "app.Counts" items:sys.Count>, <String "app.Email">, <Enum "app.MBTI">, <Record "app.Person" fields:{email, counts, affil, mbti}>]
The built-in moo.oschema.typify()
function illustrated here converts a
schema type as a “raw” data structure into a corresponding Python
object. In a template, the Python object should be usable everywhere
the “raw” data structure. The typify()
transform is thus only useful
if the extensions that the Python object provides are needed in a
template or if subsequent transforms require objects instead of raw
data structures (an example of which we’ll see next).
We may also pipeline transformations. Here is an example that will
use the output of typify()
, form a graph from the type reference
dependencies, and perform a topological sort to produce an array which
are ordered from least dependent to most.
moo -T examples/oschema -M examples/oschema \
-t 'moo.oschema.typify|moo.oschema.graph|moo.oschema.depsort' \
render app.jsonnet ool.txt.j2
Iterate over list of types: <Number "sys.Count"> <Any "app.Affiliation"> <Sequence "app.Counts" items:sys.Count> <String "app.Email"> <Enum "app.MBTI"> <String "app.Make"> <String "app.Model"> <Enum "app.VehicleClass"> <Record "app.Vehicle" fields:{make, model, type}> <Record "app.Person" fields:{email, email2, counts, counts2, affil, mbti, vehicle, vehicle2, vehicle2}>
Note the Person
type comes after the types that it refers to in its
fields due to the topological sort. Also, note that this particular
transform may also be performed in the Jsonnet layer and so is used
here as an illustration of the functionality.
This example also shows that the pipeline of transformations may
becomes rather complex. At some point, developing a composite
transformation function in Python and referring to it on the moo
CLI
may be useful to keep the command argument list small.
But, let us now move on to codegen.
Some trivial templates were introduced above in order to dump out some
of the information in their models. Here we develop two “real”
templates and apply them to the app
schema to generate code.
ostructs.hpp.j2
- generate a C++ header a defining a C++
namespace
scope and holding definition of a C++struct
type for each type instance of the schema classrecord
with apath
in that scope. onljs.hpp.j2
- for each C++
struct
defined above, produce functions that will allow thestruct
to participate innlohmann::json
(nljs) based serialization.
But, before developing the templates we first define a contract or model on which the template development may depend.
The omodel
contract is embodied in this Jsonnet file:
The top-level arguments are described below. What is produced is an object (ie, an instance of the model) with these attributes:
path
- the
namespace
path scope to focus on as a list/array nspre
namespace
prefix with trailing dottypes
- array of type data structures which are in scope
byref
- full type information retrieved via a type reference
byscn
- references to types collected by schema class
extref
- list of references to types outside the scope
Top-level arguments
os
- bring in the
oschema
array path
- the
namespace
path with which select a branch on the full schema tree
We can test out some TLAs and test that the model compiles using the moo
CLI:
moo -M examples/oschema \
-A os='app.jsonnet' -A path='app' \
compile omodel.jsonnet
We want to apply a transform to the types
attribute and can test that
with. This can be done by with this command. We will hold off on
showing the output until the next example CLI.
moo -M examples/oschema \
-A os='app.jsonnet' -A path='app' \
-t '/types:moo.oschema.typify|moo.oschema.graph|moo.oschema.depsort' \
dump -f pretty omodel.jsonnet
We are almost ready to turn to the template but one last detail is needed. As we will find there are some utilities that will simplify developing the template and which are specific to the target-language (eg C++) and which the rest of the model does not depend. We will bring these in as a model “graft”.
moo -g '/lang:ocpp.jsonnet' \
-M examples/oschema \
-A os='app.jsonnet' -A path='app' \
-t '/types:moo.oschema.typify|moo.oschema.graph|moo.oschema.depsort' \
dump -f types omodel.jsonnet
all_types <class 'list'> byref <class 'dict'> byscn <class 'dict'> ctxpath <class 'list'> ctxpre <class 'str'> extrefs <class 'list'> nspre <class 'str'> path <class 'list'> relpath <class 'str'> types <class 'list'> lang <class 'dict'>
If you squint you’ll see the lang
attribute added with others from the
model. Let’s now move to the template.
The ostructs.hpp.j2 template file gets applied to the omodel
to
produce a C++ header file defining a struct
for each record
instance
in the model and any supporting types via a using
type alias. It also
uses the extref
info to #include
any required external headers that
themselves are also generated from other parts of the overall schema.
Take particular note that this #include
pattern bakes in a specific
mapping from a type’s path
array to file locations. For the resulting
C++ code to compile, this pattern must of course actually be honored
in some way. This may be done manually by properly placing the
generated files according to this mapping or, better, be automatically
assured via a build system. Future work may generate this file-system
level assurance itself from schema. For now, we must simply be careful.
Finally, note that the grafting of ocpp.jsonnet
selects a particular
mapping from schema class names to their C++ equivalents. Eg,
nlohman::json
for any
. If we wished to generate code using a
different mapping, such as boost::any
for any
we would need to modify
or fork this grafted data structure while the rest of the structure
may be left as-is.
We can finally generate code by changing the above CLI call from dump
to render
and adding the template file name.
moo -g '/lang:ocpp.jsonnet' \
-M examples/oschema \
-A os='app.jsonnet' -A path='app' \
render omodel.jsonnet ostructs.hpp.j2
/*
* This file is 100% generated. Any manual edits will likely be lost.
*
* This contains struct and other type definitions for shema in
* namespace app.
*/
#ifndef APP_STRUCTS_HPP
#define APP_STRUCTS_HPP
#include <cstdint>
#include "sys/Structs.hpp"
#include <nlohmann/json.hpp>
#include <vector>
#include <string>
namespace app {
// @brief An associated object of any type
using Affiliation = nlohmann::json;
// @brief All the counts
using Counts = std::vector<sys::Count>;
// @brief Electronic mail address
using Email = std::string;
// @brief
enum class MBTI: unsigned {
introversion,
extroversion,
sensing,
intuition,
thinking,
feeling,
judging,
perceiving,
};
// @brief
using Make = std::string;
// @brief
using Model = std::string;
// @brief
enum class VehicleClass: unsigned {
boring,
fun,
};
// @brief
struct Vehicle {
// @brief
Make make = "Subaru";
// @brief
Model model = "WRX";
// @brief
VehicleClass type = app::VehicleClass::fun;
};
// @brief Describe everything there is to know about an individual human
struct Person {
// @brief E-mail address
Email email = "";
// @brief E-mail address
Email email2 = "me@example.com";
// @brief Count of some things
Counts counts = {};
// @brief Count of some things
Counts counts2 = {0, 1, 2};
// @brief Some affiliation
Affiliation affil = {};
// @brief Personality
MBTI mbti = app::MBTI::introversion;
// @brief Example of nested record
Vehicle vehicle = {"Subaru", "WRX", app::VehicleClass::fun};
// @brief Example of nested record with default
Vehicle vehicle2 = {"Subaru", "CrossTrek", app::VehicleClass::boring};
// @brief Example of nested record with default
Vehicle vehicle2 = {"Subaru", "BRZ", app::VehicleClass::fun};
};
} // namespace app
#endif // APP_STRUCTS_HPP
And, here are the corresponding nlohmann::json
serialization
functions, produced by applying the onljs.hpp.j2 template to the same
model.
moo -g '/lang:ocpp.jsonnet' \
-M examples/oschema \
-A os='app.jsonnet' -A path='app' \
render omodel.jsonnet onljs.hpp.j2
/*
* This file is 100% generated. Any manual edits will likely be lost.
*
* This contains functions struct and other type definitions for shema in
* namespace app to be serialized via nlohmann::json.
*/
#ifndef APP_NLJS_HPP
#define APP_NLJS_HPP
#include "app/Structs.hpp"
#include "sys/Nljs.hpp"
#include <nlohmann/json.hpp>
namespace app {
using data_t = nlohmann::json;
NLOHMANN_JSON_SERIALIZE_ENUM( MBTI, {
{ app::MBTI::introversion, "introversion" },
{ app::MBTI::extroversion, "extroversion" },
{ app::MBTI::sensing, "sensing" },
{ app::MBTI::intuition, "intuition" },
{ app::MBTI::thinking, "thinking" },
{ app::MBTI::feeling, "feeling" },
{ app::MBTI::judging, "judging" },
{ app::MBTI::perceiving, "perceiving" },
})
NLOHMANN_JSON_SERIALIZE_ENUM( VehicleClass, {
{ app::VehicleClass::boring, "boring" },
{ app::VehicleClass::fun, "fun" },
})
inline void to_json(data_t& j, const Vehicle& obj) {
j["make"] = obj.make;
j["model"] = obj.model;
j["type"] = obj.type;
}
inline void from_json(const data_t& j, Vehicle& obj) {
if (j.contains("make"))
j.at("make").get_to(obj.make);
if (j.contains("model"))
j.at("model").get_to(obj.model);
if (j.contains("type"))
j.at("type").get_to(obj.type);
}
inline void to_json(data_t& j, const Person& obj) {
j["email"] = obj.email;
j["email2"] = obj.email2;
j["counts"] = obj.counts;
j["counts2"] = obj.counts2;
j["affil"] = obj.affil;
j["mbti"] = obj.mbti;
j["vehicle"] = obj.vehicle;
j["vehicle2"] = obj.vehicle2;
j["vehicle2"] = obj.vehicle2;
}
inline void from_json(const data_t& j, Person& obj) {
if (j.contains("email"))
j.at("email").get_to(obj.email);
if (j.contains("email2"))
j.at("email2").get_to(obj.email2);
if (j.contains("counts"))
j.at("counts").get_to(obj.counts);
if (j.contains("counts2"))
j.at("counts2").get_to(obj.counts2);
obj.affil = j.at("affil");
if (j.contains("mbti"))
j.at("mbti").get_to(obj.mbti);
if (j.contains("vehicle"))
j.at("vehicle").get_to(obj.vehicle);
if (j.contains("vehicle2"))
j.at("vehicle2").get_to(obj.vehicle2);
if (j.contains("vehicle2"))
j.at("vehicle2").get_to(obj.vehicle2);
}
// fixme: add support for MessagePack serializers (at least)
} // namespace app
#endif // APP_NLJS_HPP
Besides generating code from a schema, data objects may be constructed
with the help of and validated against schema. For information about
this usage pattern see the otypes document which describes how to use
the moo.otypes
module.