Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document main IR classes #11972

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open

Document main IR classes #11972

wants to merge 9 commits into from

Conversation

GregoryTravis
Copy link
Contributor

@GregoryTravis GregoryTravis commented Jan 3, 2025

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.
  • If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
    or the Snowflake database integration, a run of the Extra Tests has been scheduled.
    • If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

@GregoryTravis GregoryTravis added the CI: No changelog needed Do not require a changelog entry for this PR. label Jan 3, 2025
@GregoryTravis GregoryTravis marked this pull request as ready for review January 3, 2025 17:32
Comment on lines 54 to 55
This also includes `Let` and `Let_Ref` variants which are used to express
let-style bindings using SQL `with` syntax.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first paragraph it says that column expression is one of Column/Operation/Constant/Literal/Text_Literal, and only later we see it can also be the Let.

I'd rephrase to also include Let/Let_Ref in the first paragraph perhaps with a note "which are explained later".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +60 to +61
table built from other tables (`Join`, `Union`), or a constant value (`Query`,
`Literal_Values`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd distinguish Query from Literal_Values - they differ quite a lot.

Suggested change
table built from other tables (`Join`, `Union`), or a constant value (`Query`,
`Literal_Values`).
table built from other tables (`Join`, `Union`), from a raw SQL query passed as text (`Query`) or constructed from constants
`Literal_Values`).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important that the first paragraph of each section gives a summary with a small set of broad categories, not necessarily comprehensive, to help understanding. Otherwise, this might as well be constructor-level documentation and should be in the source, rather than here. Both Query and Literal_Value are constants in the sense that they have meaning without additional context, and are not built out of other table expressions, so I think they go together.

But I do think the distinction is important so I added another section describing both the literal values.


A `DB_Column` contains its own reference to a `Context`, so it can be read
without relying on the `DB_Table` object that it came from. In fact, `DB_Column`
values can be thought of as not being attached to a particular table. Instead,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence ("can be thought o as not being attached to a particular table") is a bit confusing to me.

The DB_Column is standalone and not necessarily directly tied to a DB_Table, but from SQL/DB standpoint it often is tied to some table (or more complex context).

Maybe,

Suggested change
values can be thought of as not being attached to a particular table. Instead,
values are standalone and not directly tied to `DB_Table` instance. Instead,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

they are connected to the `Context` objects they contain, and all `DB_Columns`
from a single table expression must share the same `Context`. This corresponds
to the idea that the columns expressions in a `SELECT` clause all refer to the
same table expression in the `FROM` clause.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this may be worth mentioning:

And also we can 'merge' DB Columns that have the same Context into a single DB_Table e.g. via DB_Table.set, allowing to add more derived expressions to existing tables. This is verified by the check_integrity method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 104 to 109
A `Context` serves as a table expression, but really inherits this from the
`From_Spec` that it contains. It also contains `where`, `order by`, `group by`
and `limit` clauses.

A `From_Spec` serves as a table expression, and can be a base value (table name,
constant, etc), join, union, or subquery:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rephrase something about this - both of these 'serve as a table expression' it is not bringing that much information. I think we can try to describe them in a more distinct way.

I'd say that (but still needs a bit better phrasing): the Context is everything that is after the FROM clause in SQL - from where we are taking the data (the From_Spec) as well as other modifiers - WHERE, ORDER BY etc. The From_Spec is then just the 'shape' that the FROM part itself can take. It is not a table expression on its own IMHO - it may just refer to tables or their combinations, but a table expression is more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember from the London meetings if we wanted to rename these? Because I think there was some suggestions but I don't remember know what it was. We should probably find the sketches we made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, mostly -- I made it clear that Context includes the from clause and everything after it. I think it's still useful to describe both of these as 'table expressions', since the two main categories here (table and column expressions) are important categories for understanding the whole IR. And both From_Spec and Context contain enough information to specify a result set (even though they are not used that way, directly).

We did have some ideas about renaming things; after we've agreed on this basic documentation I think that's the next step.

Comment on lines +63 to +64
`Sub_Query` is used to nest a query as a subquery, replacing column expressions
with aliases to those same column expressions within the subquery. This is used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not exactly true. The 'alias' is just one 'use-case'.

What Sub_Query does is it nests a full sub query into the From_Spec, meaning that a whole Context plus a set of column expressions (SQL_Expression) are nested in it and then a new set of columns that reference this new context can reference the columns from within that nested subquery.

It allows more than aliases, as the subquery can contain more complicated expressions that from now on can be referenced just by their names.

Well after a thought perhaps that's what you mean by 'replacing column expressions with aliases', but it was not immediately clear to me so I was wondering if we could add some details, and perhaps an example here?

To show that e.g. when we have a query SELECT 1+2*T.A, T.B FROM T the subquery allows to refer to the 'complex' expression 1+2*T.A by a simple alias name: e.g. becoming SELECT SUB.EXPR1, SUB.B FROM (SELECT 1+2*T.A AS EXPR1, T.B AS B FROM T) AS SUB.


I don't know, perhaps I'm overcomplicating this explanation too much. I just wanted to more clearly show that Sub_Query allows 'baking' in some complex expressions and giving them 'simpler' names - in that regard it has some similarity to the Let construct as well although it has different use-cases because it also creates the sub-expression which (as you noted) makes any ORDER BY etc. from the outer query independent from the inner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops I completely missed that you actually described this very well in a section some lines below 🤦 Sorry.

The explanation below looks perfect. I'd then just add 'see section ... for more explanation of subqueries'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@GregoryTravis
Copy link
Contributor Author

Thanks for the comments @radeusgd -- I more or less implemented all of them.

Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a good step to me - but will leave to Radek for final changes.

@@ -0,0 +1,210 @@
---
layout: developer-doc
title: Database IR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still planning on renaming to SQL AST?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely on the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants