Skip to content

Commit

Permalink
Minor changes and clarifications
Browse files Browse the repository at this point in the history
  • Loading branch information
boggle committed Jul 2, 2017
1 parent ce09cf5 commit 2498907
Showing 1 changed file with 102 additions and 51 deletions.
153 changes: 102 additions & 51 deletions cip/CIP2017-06-18-multiple-graphs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,7 @@ This CIP proposes to extend Cypher with support for the construction, transforma
toc::[]

```
TODO:

* Parameter handling
* Graph name syntax
* Precise update semantics
* Entity identity
* Composition Semantics
```

== Motivation
Expand Down Expand Up @@ -68,12 +63,24 @@ An entity is considered to be deleted if it is no longer part of any graph.

=== Graph Addressing

Graphs do not expose an identity like nodes or relationships. They may however be made addressable through other means by a conforming implementation (e.g. through exposing the graph under a _Graph URL_).
Graphs do not expose an identity like nodes or relationships do.

Graphs may be made addressable through other means by a conforming implementation (e.g. through exposing the graph under a _graph URL_ for referencing and loading it).
The details regarding the format and choice of graph URLs is outside the scope of this proposal.

A graph is considered to have been deleted if it is no longer registered under a graph URL and no other reference to it is retained (e.g. from a running query).

=== Entity Identity

The details of such a mechanism are out of scope of this proposal.
In the single property graph model, nodes and relationships are commonly identified by a single integer id.
This model was originally not designed for sharing entities between many different graphs while ensuring that entity ids are unique.

However, a graph is considered to have been deleted if it is no longer registered under a Graph URL and no other reference to it is retained (e.g. from a running query).
In the multiple property graphs model, entities are additionally implicitly associated with a _graph space_ that allows to distinguish between entities with the same original id from different sources (e.g. different databases or even snapshots of the same database).

In the multiple property graphs model, no graph may contain two entities from the same graph space that have the same original id.

Graph spaces may be made identifiable by a conforming implementation by assigning a _graph URI_ to them.
The details regarding the format and choice of graph URIs is outside the scope of this proposal.

== Background: Single Graph Execution Model

Expand Down Expand Up @@ -109,8 +116,10 @@ This CIP proposes to redefine the *execution context* to be
This CIP proposes to redefine the *query context* to be

* a set of named graphs from the *execution context*
* an optional information that indicates which of these named graphs is the current *source graph*
* an optional information that indicates which of these named graphs is the current *target graph*
* a special graph drawn from the execution context that is called the *default source graph*
* a special graph drawn from the execution context that is called the *default target graph*
* an optional information that indicates which of these named graphs if any is the *returned source graph*
* an optional information that indicates which of these named graphs if any is the *returned target graph*
* optional *tabular data*, i.e. a potentially ordered bag of records, each having the same fixed set of fields

These redefinitions constitute the multiple graphs execution model. A parameterized Cypher query under this model can _also_ be described as executing within (and operating on) a given execution context and an initial query context and finally returning the query context produced as output for the top-most `RETURN` clause.
Expand Down Expand Up @@ -140,7 +149,7 @@ A query `Q1` whose output signature is an acceptable (in terms of provided bindi

This homogenous query composition is enabled by using an uniform query context that is passed between clauses.

Note: The currently drafted subquery CIP proposes a language addition (e.g. `THEN`) for expressing this kind of query composition directly.
Note: The currently drafted subquery CIP proposes a language addition (e.g. `THEN`) for expressing this kind of query composition directly. In terms of this CIP, `THEN` is simply syntactic sugar for `WITH * GRAPHS *`

=== Query combinators

Expand Down Expand Up @@ -188,27 +197,43 @@ This CIP proposes the following kinds of graph specifiers:

* `NEW GRAPH [<new-graph-name>] [AT <graph-url>]`: Reference to a newly created, empty graph that is to be bound as `<new-graph-name>` and may potentially overwrite any pre-existing graph at the provided `<graph-url>`
* `GRAPH [<new-graph-name] AT <graph-url>`: Reference to the graph at the given `<graph-url>` that is to be bound as `<new-graph-name>`
* `GRAPH <graph-name> [AS <new-graph-name>]`: Reference to an already bound named graph
* `SOURCE GRAPH [AS <new-graph-name>]`: Reference to the currently _provided source graph_, optionally to be bound as `<new-graph-name>`
* `TARGET GRAPH [AS <new-graph-name>]`: Reference to the currently _provided target graph_, optionally to be bound as `<new-graph-name>`
* `[GRAPH] <graph-name> [AS <new-graph-name>]`: Reference to an already bound named graph
* `COPY [GRAPH] <graph-name> [AS <new-graph-name>]`: Reference to a copy of an already bound named graph
* `SOURCE GRAPH [<new-graph-name>]`: Reference to the currently _provided source graph_, optionally to be bound as `<new-graph-name>`
* `TARGET GRAPH [<new-graph-name>]`: Reference to the currently _provided target graph_, optionally to be bound as `<new-graph-name>`

If a graph specifier is not referencing an already bound named graph and does not specify a `<new-graph-name>`, it is bound to a fresh system generated name.
The details of this are left to implementations.

It is an error to use a `<graph-specifier>` in a context where it's introduced `<new-graph-name>` is already bound.

=== Changing back to the default graph
==== Graph names

Graph names use the same syntax as existing variable names.

It is an error to use the same name for both a regular variable or the name of a graph.

Additionally, this CIP proposes new syntax for changing the source and the target graph of the current query back to the the default graph provided by the outer execution context:
==== Graph URLs

The exact shape and form of graph URL lies outside the scope of this CIP.

This CIP however proposes that a `<graph-url>` must always be given as either a string literal or a query parameter.

This allows parameterization of queries by controlling which graphs from which graph URLs they should use.

=== Changing back to no graph

Additionally, this CIP proposes new syntax for discarding the source and the target graph of the current query:

[source, cypher]
----
FROM -
INTO -
----

`DEFAULT GRAPH` is not a graph specifier; rather this syntax is a special form for discarding the current source and target graph such that the provided source and target graph are again chosen to be the default graph as specified for partial query contexts.

In consequence, both `FROM DEFAULT GRAPH` and `INTO DEFAULT GRAPH` without an explicitly given `<new-graph-name>` will not bind the default graph to a generated fresh name.
`-` is not a graph specifier; rather this syntax is a special form for discarding the current source and target graph such that the provided source and target graph are again chosen to be the default graph as specified for partial query contexts.

In consequence, both `FROM -` and `INTO -` will not bind the default graph to a generated fresh name.
This is different from `<graph-specifier>` semantics that will ensure that referenced graphs are always bound to a name.

=== Returning, aliasing, and selecting graphs
Expand All @@ -218,33 +243,35 @@ The newly proposed syntax is:

[source, cypher]
----
WITH [ < return-items > ] [ GRAPHS < graph-return-items > ]
RETURN [ < return-items > ] [ GRAPHS < graph-return-items > ]
WITH [ < return-items > ] [ [ INPUT ] GRAPHS < graph-return-items > ]
RETURN [ < return-items > ] [ [ INPUT ] GRAPHS < graph-return-items > ]
----

This CIP proposes the following kinds of `<graph-return-items>`:

* `<graph-item-list`: A comma separated list of `<graph-return-item>` (defined below) that are to be passed on
* `<graph-specifier-list>`: A comma separated list of `<graph-specifier>` that are to be passed on
* `*`: All named graphs are to be passed on
* `*, <graph-item-list>`: All named graphs are to be passed on together with any additional named graphs that are newly bound in `<graph-item-list>`
* `*, <graph-specifier-list>`: All named graphs are to be passed on together with any additional named graphs that are newly bound in `<graph-specifier-list>`
* `-`: No named graphs are to be passed on

The order of named graphs inherently given by `<graph-return-items` is semantically insignificant.
The order of named graphs inherently given by `<graph-return-items>` is semantically insignificant.
However it is recommended that conforming implementations preserve this order at least in programmatic output operations (e.g. a textual display of the list of returned graphs).
This in essence mirrors the semantics for tabular data returned by Cypher.

This CIP proposes the introduction of the following kinds of graph return items that may be included in a `<graph-item-list>`:
Both `WITH ... GRAPHS ...` and `RETURN ... GRAPHS ...` will pass on (or return respectively) exactly the set of described named graphs.
To simplify passing on available graphs it is proposed by this CIP that regular `WITH <return-items>` is taken to be syntactic sugar for `WITH <return-items> GRAPHS -` and that regular `RETURN <return-items>` is taken to be syntactic sugar for `RETURN <return-items> GRAPHS -`.

* `<graph-specifier>`: Any graph that is described by a `<graph-specifier>` may be passed on under the provided `<new-graph-name>` (unless the given graph is an un-aliased already existing graph, it which case it's passed on with it's existing name)
* `<graph-name> [AS <new-graph-name>], ...`: Syntactic sugar for `GRAPH <graph-name> [AS <new-graph-name>]`
To even further simplify, it is additionally proposed that `WITH|RETURN <return-items> INPUT GRAPHS <graph-return-items>` is to be syntactic sugar for `WITH|RETURN <return-items> GRAPHS <graph-return-items>, SOURCE GRAPH, TARGET GRAPH`.
However if `<graph-return-items>` already passes on a reference for the `SOURCE GRAPH`, no additional reference for it is added and if `<graph-return-items>` already passes on a reference for the `TARGET GRAPH`, no additional reference for it is added.

Both `WITH` and `RETURN` will pass on (or return respectively) exactly the set of described named graphs.
If the current named source graph (or the current named target graph) are not passed on, they are discarded and due to the rules regarding partial query contexts the provided source graph (or target respectively) again are chosen to be the default graph of the outer execution context.

Note: `WITH <return-items> GRAPHS *` may be used to pass through the initial query context without having to alias source and target graphs explicitly.

=== Discarding available tabular data

It is additionally proposed that both `WITH GRAPHS <graph-return-items>` and `RETURN GRAPHS <graph-return-items>` are
special forms for discarding all tabular data such that the provided tabular input for the following clause (or query respectively) would again be the provided single record without any fields as specified by the rules for partial query contexts.
It is additionally proposed that both `WITH GRAPHS <graph-return-items>` and `RETURN GRAPHS <graph-return-items>` are syntactic sugar for `WITH - GRAPHS <graph-return-items>` (and `RETURN - GRAPHS <graph-return-items>` respectively).
These special forms may be used for discarding all tabular data such that the provided tabular input for the following clause (or query respectively) would again be the provided single record without any fields as specified by the rules for partial query contexts.

Note: This syntax may be used to indicate when the gradual construction of a named graph is finished since neither fields nor the cardinality of tabular data is preserved after this point.

Expand All @@ -259,35 +286,59 @@ The proposed syntax is:

[source, cypher]
----
FROM < graph-specifier > | DEFAULT GRAPH [AS < new-graph-name >] { < graph-construction-subquery > }
INTO < graph-specifier > | DEFAULT GRAPH [AS < new-graph-name >] { < graph-construction-subquery > }
FROM < graph-specifier > | '-' { < graph-construction-subquery > }
INTO < graph-specifier > | '-' { < graph-construction-subquery > }
----

A `<graph-construction-subquery>` is an updating subquery (i.e. a sequence of clauses, including update clauses) that may or may not end in `RETURN`.
All variables bound before the nested `FROM` and `INTO` subqueries are made visible to the `<graph-construction-subquery>`.
All variables bound at the end of the `<graph-construction-subquery>` are made visible to the remaining outer query.

These forms have the exact same effect as creating aliases for the current source and target graph, then changing the current source and target graph as specified before executing the given `<graph-construction-subquery>`, and finally restoring the original source and target graphs using the aliases followed by discarding those aliases from the current scope.
These forms have the exact same effect as creating fresh aliases for the current source and target graph, then changing the current source and target graph as specified before executing the given `<graph-construction-subquery>`, and finally restoring the original source and target graphs using the aliases followed by discarding those aliases from the current scope.

=== Updating graphs

This CIP proposes the following update semantics for Cypher with support for multiple graphs.

Entities are always created in and deleted from the currently provided target graph.

Semantically, all effects of an updating clause must be made visible before proceeding with the execution of the next clause.
In other words, a conforming implementation must ensure that a later clause alway sees the complete set of updates of a preceding updating clause.

A single update clause may perform multiple conflicting updates on the same node or relationship.
In this situation, the outcome is undefined.

Conflicting updates are considered to be out of scope of this CIP.

For now it is proposed that a conforming implementation must choose at least either the original value or one of the values written or `NULL` as the final outcome of a conflicting update.

=== Query signature declarations

Finally this CIP proposed using the `WITH` clause as the initial clause in a query for declaring all query input arguments:
Finally this CIP proposed using the `WITH` clause as the initial clause in a query for declaring all query inputs:

[source, cypher]
----
WITH [ < return-items > ] [ GRAPHS < graph-return-items > ]
WITH < return-items > [ [ INPUT ] GRAPHS < graph-return-items > ]
WITH [ < return-items > ] [ INPUT ] GRAPHS < graph-return-items >
----

It is proposed that using `WITH` as the initial clause here is to be called a *query input declaration* while the use of `RETURN` as the last clause is to be called a *query output declaration* henceforth.
It is proposed that using `WITH` as the initial clause in a query is to be called a *query input declaration* while the use of `RETURN` as the last clause is to be called a *query output declaration*.

Query input declarations are subject to the following limitations:

* All return items are expected to be over an imagined set of input variables from the previous query
* All such referenced variables must be declared or aliased explicitly by another return item
* The use of `WITH *` and `WITH *, ...` causes all undeclared incoming variables to be renamed to fresh system generated variable names
* The use of `GRAPH *` and `GRAPH *, ...` causes all incoming graphs to be renamed to fresh system generated graph names
* All return item expressions are expected to reference an imagined set of input variables from the previous query
* All such referenced variables must be declared or aliased explicitly by another return item unless the query input declaration starts with `WITH *` or `WITH *,`
* If the input query context provides additional, undeclared variables or graphs, those inputs are to be silently discarded by query composition or execution

If the input query context provides additional variables or graphs, those inputs are to be silently discarded by query composition or execution.
A query that does not start with a query input declaration is assumed to start with `WITH - GRAPHS -`, i.e. to run in isolation and to initially read and write to the default graph.

== Grammar

Proposed syntax changes
[source, ebnf]
----
// TODO
----

== Examples

Expand Down Expand Up @@ -327,7 +378,7 @@ INTO NEW GRAPH berlin
CREATE (a)-[:FRIEND]->(b) WHERE c.name = "Berlin"
INTO NEW GRAPH santiago
CREATE (a)-[:FRIEND]->(b) WHERE c.name = "Santiago"
FROM DEFAULT GRAPH
FROM -
RETURN c.name AS city, count(r) AS num_friends GRAPHS berlin, santiago
----

Expand All @@ -347,7 +398,7 @@ CREATE (a)-[:POSSIBLE_FRIEND]->(c)
WITH GRAPHS *
// Switch context to named graph.
FROM GRAPH recommendations
FROM recommendations
MATCH (a:Person)-[e:POSSIBLE_FRIEND]->(b:Person)
// Return tabular and graph output
RETURN a.name, b.name, count(e) AS cnt
Expand All @@ -374,12 +425,12 @@ SET a.country = cn.name
// ... and finally discard all tabular data and cardinality
WITH GRAPHS *
FROM GRAPH sn_updated
FROM sn_updated
MATCH (a:Person)-[e:KNOWS]->(b:Person)
WITH a.country AS a_country, b.country AS b_country, count(a) AS a_cnt, count(b) AS b_cnt, count(e) AS e_cnt
INTO NEW GRAPH rollup {
MERGE (:Persons {country: a_country, cnt: a_cnt})-[:KNOW {cnt: e_cnt}]->(:Persons {country: b_country, cnt: b_cnt})
}
}
// Return final graph output
RETURN GRAPHS rollup
----
Expand All @@ -394,29 +445,29 @@ MATCH (a:Person)-[e]->(b:Person),
(a)-[:LIVES_IN]->()->[:IS_LOCATED_IN]-(c:Country {name: ‘Sweden’}),
(b)-[:LIVES_IN]->()->[:IS_LOCATED_IN]-(c)
// Create a persistent graph at 'graph://social-network/swe'
INTO GRAPH sweden_people AT './swe' {
INTO NEW GRAPH sweden_people AT './swe' {
// connecting persons that live in the same city in Sweden.
CREATE (a)-[e]->(b)
}
}
// Finally discard all tabular data and cardinality
WITH GRAPHS *
MATCH (a:Person)-[e]->(b:Person),
(a)-[:LIVES_IN]->()->[:IS_LOCATED_IN]-(c:Country {name: ‘Germany’}),
(b)-[:LIVES_IN]->()->[:IS_LOCATED_IN]-(c)
// Create a persistent graph at 'graph://social-network/ger'
INTO GRAPH german_people AT './ger' {
INTO NEW GRAPH german_people AT './ger' {
// connecting persons that live in the same city in Germany.
CREATE (a)-[e]->(b)
}
// Finally discard all tabular data and cardinality
WITH GRAPHS *
// Start query on the 'sweden_people' graph
FROM GRAPH sweden_people
FROM sweden_people
MATCH p=(a)--(b)--(c)--(a) WHERE NOT (a)--(c)
// Create a temporary graph 'swedish_triangles'
INTO GRAPH swedish_triangles {
INTO NEW GRAPH swedish_triangles {
ADD p
}
// and return it together with a count of it's content
Expand Down

0 comments on commit 2498907

Please sign in to comment.