-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document main IR classes #11972
base: develop
Are you sure you want to change the base?
Document main IR classes #11972
Conversation
docs/libraries/database-ir.md
Outdated
This also includes `Let` and `Let_Ref` variants which are used to express | ||
let-style bindings using SQL `with` syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first paragraph it says that column expression is one of Column/Operation/Constant/Literal/Text_Literal, and only later we see it can also be the Let.
I'd rephrase to also include Let/Let_Ref in the first paragraph perhaps with a note "which are explained later".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
table built from other tables (`Join`, `Union`), or a constant value (`Query`, | ||
`Literal_Values`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd distinguish Query
from Literal_Values
- they differ quite a lot.
table built from other tables (`Join`, `Union`), or a constant value (`Query`, | |
`Literal_Values`). | |
table built from other tables (`Join`, `Union`), from a raw SQL query passed as text (`Query`) or constructed from constants | |
`Literal_Values`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important that the first paragraph of each section gives a summary with a small set of broad categories, not necessarily comprehensive, to help understanding. Otherwise, this might as well be constructor-level documentation and should be in the source, rather than here. Both Query
and Literal_Value
are constants in the sense that they have meaning without additional context, and are not built out of other table expressions, so I think they go together.
But I do think the distinction is important so I added another section describing both the literal values.
docs/libraries/database-ir.md
Outdated
|
||
A `DB_Column` contains its own reference to a `Context`, so it can be read | ||
without relying on the `DB_Table` object that it came from. In fact, `DB_Column` | ||
values can be thought of as not being attached to a particular table. Instead, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence ("can be thought o as not being attached to a particular table") is a bit confusing to me.
The DB_Column
is standalone and not necessarily directly tied to a DB_Table
, but from SQL/DB standpoint it often is tied to some table (or more complex context).
Maybe,
values can be thought of as not being attached to a particular table. Instead, | |
values are standalone and not directly tied to `DB_Table` instance. Instead, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/libraries/database-ir.md
Outdated
they are connected to the `Context` objects they contain, and all `DB_Columns` | ||
from a single table expression must share the same `Context`. This corresponds | ||
to the idea that the columns expressions in a `SELECT` clause all refer to the | ||
same table expression in the `FROM` clause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this may be worth mentioning:
And also we can 'merge' DB Columns that have the same Context
into a single DB_Table
e.g. via DB_Table.set
, allowing to add more derived expressions to existing tables. This is verified by the check_integrity
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/libraries/database-ir.md
Outdated
A `Context` serves as a table expression, but really inherits this from the | ||
`From_Spec` that it contains. It also contains `where`, `order by`, `group by` | ||
and `limit` clauses. | ||
|
||
A `From_Spec` serves as a table expression, and can be a base value (table name, | ||
constant, etc), join, union, or subquery: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rephrase something about this - both of these 'serve as a table expression' it is not bringing that much information. I think we can try to describe them in a more distinct way.
I'd say that (but still needs a bit better phrasing): the Context
is everything that is after the FROM
clause in SQL - from where we are taking the data (the From_Spec
) as well as other modifiers - WHERE, ORDER BY etc. The From_Spec
is then just the 'shape' that the FROM
part itself can take. It is not a table expression on its own IMHO - it may just refer to tables or their combinations, but a table expression is more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember from the London meetings if we wanted to rename these? Because I think there was some suggestions but I don't remember know what it was. We should probably find the sketches we made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, mostly -- I made it clear that Context
includes the from
clause and everything after it. I think it's still useful to describe both of these as 'table expressions', since the two main categories here (table and column expressions) are important categories for understanding the whole IR. And both From_Spec
and Context
contain enough information to specify a result set (even though they are not used that way, directly).
We did have some ideas about renaming things; after we've agreed on this basic documentation I think that's the next step.
`Sub_Query` is used to nest a query as a subquery, replacing column expressions | ||
with aliases to those same column expressions within the subquery. This is used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not exactly true. The 'alias' is just one 'use-case'.
What Sub_Query
does is it nests a full sub query into the From_Spec
, meaning that a whole Context plus a set of column expressions (SQL_Expression
) are nested in it and then a new set of columns that reference this new context can reference the columns from within that nested subquery.
It allows more than aliases, as the subquery can contain more complicated expressions that from now on can be referenced just by their names.
Well after a thought perhaps that's what you mean by 'replacing column expressions with aliases', but it was not immediately clear to me so I was wondering if we could add some details, and perhaps an example here?
To show that e.g. when we have a query SELECT 1+2*T.A, T.B FROM T
the subquery allows to refer to the 'complex' expression 1+2*T.A
by a simple alias name: e.g. becoming SELECT SUB.EXPR1, SUB.B FROM (SELECT 1+2*T.A AS EXPR1, T.B AS B FROM T) AS SUB
.
I don't know, perhaps I'm overcomplicating this explanation too much. I just wanted to more clearly show that Sub_Query
allows 'baking' in some complex expressions and giving them 'simpler' names - in that regard it has some similarity to the Let
construct as well although it has different use-cases because it also creates the sub-expression which (as you noted) makes any ORDER BY etc. from the outer query independent from the inner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops I completely missed that you actually described this very well in a section some lines below 🤦 Sorry.
The explanation below looks perfect. I'd then just add 'see section ... for more explanation of subqueries'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks for the comments @radeusgd -- I more or less implemented all of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a good step to me - but will leave to Radek for final changes.
@@ -0,0 +1,210 @@ | |||
--- | |||
layout: developer-doc | |||
title: Database IR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still planning on renaming to SQL AST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely on the list.
Pull Request Description
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
or the Snowflake database integration, a run of the Extra Tests has been scheduled.