Formalize how our IR is represented and constructed #191

crherlihy · 2019-05-22T20:13:27Z

Much of the discussion at the May 2019 PI meeting was focused on defining (or converging toward a shared definition of) the layers of the metamodeling stack, and how we can formally represent and move between these layers.

The general consensus that has emerged is that there are 3 levels:

Abstract level (domain-specific; semantically rich)
Structured level (domain-agnostic; mathematical model)
Executable (machine-specific executable)

Several teams (including us) have conceived of this structured level as a language-agnostic intermediate representation (IR) (e.g., a data structure akin to those used by compilers to represent and manipulate source code).

Up to this point, we have described our IR in terms of its category-theoretic principles, but we have yet to formalize this representation. There are several factors motivating us to do so now, including:

To ensure (and prove) interoperability with other teams, it will be helpful to show that our IR is Turing complete, and can consume inputs/produce outputs to hook into other systems that would (in the absence of a shared IR) be incompatible without some form of wrapper or transpilation step(s).
To formally represent and reason about different metamodeling tasks (e.g., model synthesis, model augmentation, model validation), workflows (e.g., scientific design of experiments; public policy interventions, etc.), and define the set of transformations under which a class of models is closed.

To this end, @mehalter has proposed that we use typegraphs with expressions that represent each morphism as our IR. @jpfairbanks and @crherlihy have discussed how we can use this representation when going down the stack from abstract to structured layer, and up the stack, from code to structured layer.

When going from code to structured layer, the typegraph can be constructed via augmented static analysis (e.g., we may need to inject data collection statements to recover some of the metadata we'd like to preserve). This type of movement up the stack has guided most of our efforts to-date (e.g., Epicookbook has been our foundation and source of code + domain knowledge).

When going from domain to structured layer, we'd like to help domain users take advantage of existing (previously defined or mined) concepts (types) and expressions (morphisms). To do this, the process we envision looks something like:

Define/tag objects.
Define/tag morphisms.
Map types to objects.
Map expressions to morphisms; in cases where the exact mathematical expression is not yet known, facilitate representation with either an empty expression :() or :verb.

In this way, types and morphisms could be persisted across sessions, and/or iteratively modified, with diffs accessible via version control.

Formalizing our IR in this way (and doing so in Julia) leaves us with a cartesian closed category where each object is represented by a Julia type, and each morphism is mapped to a Julia expression.

@jpfairbanks, @mehalter , and @crherlihy also discussed how this IR can help us to formally and verifiably reason about model similarity and/or equivalency, and compute the relative lossiness of different ways of moving up and down the metamodeling stack. Examples of these types of reasoning tasks, and representation within the IR, include:

A functor between IR representations can serve as a measure of the difference between models
There are movements between and within layers; taking a model and adding Exprs to the morphisms (including those that were previously abstract placeholders) is an example of a functor that adds detail. The model that results would require more bits to compress, and as such, is more complex (but is also more semantically rich).
Taking a model and dropping the Exprs to retain only the graph structure is an example of a forgetful functor; the resulting model would require less bits to compress and as such, is less complex, and less semantically rich.
Given two different models, we expect it to be useful to be able to identify the most domain-rich, shared model (sub)structure that can represent two specific models, given some specified tolerance for loss of precision/entropy.

The text was updated successfully, but these errors were encountered:

crherlihy added the MIR Modeling Intermediate Representations label May 22, 2019

crherlihy mentioned this issue May 22, 2019

Describe our workflow in the revised ASKE language #188

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formalize how our IR is represented and constructed #191

Formalize how our IR is represented and constructed #191

crherlihy commented May 22, 2019

Formalize how our IR is represented and constructed #191

Formalize how our IR is represented and constructed #191

Comments

crherlihy commented May 22, 2019