-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Move PPL Spec & AST into this repository #23
Labels
enhancement
New feature or request
Comments
This was referenced Nov 27, 2024
Step 6 is just a link to this issue. Can the dashboard use a Jar artifact from Maven, or would it need something else like an NPM module? |
@normanj-bitquill
|
[Catch All Triage - 1, 2, 3, 4] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem?
The purpose of this RFC is to consolidate all the different PPL execution engines usage of the PPL specification (ANTLR) and the query to AST construction into a single repository.
This repository will maintain the most up-to-date vocabulary and documentations and will be used as a reference for any downstream engine to use.
A single grammar location enables simpler and consistent way to evolve the language and moves the responsibility of updating the downstream engine on the engine implementing the spec rather than the grammar & language maintainers.
Implemented solution
Our goal is to remove the PPL grammar and AST tree structure from each of the downstream engines and consolidate into a single artifact that will be used by any existing or future physical execution engine.
The single responsibility of the execution engine would be to translate PPL's AST tree into that engine logical or physical plan (in case where that engine has no logical layer such as OpenSearch).
In Spark PPL use case for example, we implemented a CatalystQueryPlanVisitor PPL AST logical plan traverser that will travers the PPL AST tree to transform the PPL logical plan into Catalyst logical plan that will be submitted to Spark to generate the subsequent physical plan and execute the query.
Advantages of the selected approach:
Today each engine has to support its own version of PPL ANTLR and documentation. They tend to diverse one from the other due each engine specification.
Once these components would be extracted into a single location the dependency would be immutable and force the engine implementations to follow the grammar more closely and in the unique cases where divergence is needed - adding a distinct UDF to facilitate the difference in the actual grammar.
Advantages:
The following diagram shows the high level architecture of the selected implementation solution :
The logical Architecture show the next artifacts:
Libraries:
Drivers:
Task :
Sub-Tasks
This project will be composed of sub-tasks for an incremental and continues process:
Do you have any additional context?
ppl on spark
ppl on opensearch
PPL spark ANTLR grammar
PPL OpenSearch ANTLR grammar
ppl spark implementation issue
The text was updated successfully, but these errors were encountered: