-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support of GroK command including default patterns #598
Conversation
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/CatalystQueryPlanVisitor.java
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,3 @@ | |||
# Forked from https://github.com/elasticsearch/logstash/tree/v1.4.0/patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do this file resources use in PR? I cannot find any test which leverage the files in folder resource/patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GrokExpression
statically loads the default patterns in the patterns
folder
public static class GrokExpression extends ParseExpression {
private static final GrokCompiler grokCompiler = GrokCompiler.newInstance();
static {
grokCompiler.registerDefaultPatterns();
}
afterwards any such string '.+@%{HOSTNAME:host}'
is matched and replaced with the default patterns that are stated in that folder
The FlintSparkPPLGrokITSuite
uses these patterns for the tests
ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/utils/ParseStrategy.java
Outdated
Show resolved
Hide resolved
ppl-spark-integration/src/main/java/org/opensearch/sql/common/grok/GrokCompiler.java
Outdated
Show resolved
Hide resolved
integ-test/src/integration/scala/org/opensearch/flint/spark/ppl/FlintSparkPPLGrokITSuite.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
ppl-spark-integration/README.md
Outdated
- `source=accounts | grok email '.+@%{HOSTNAME:host}' | eval eval_result=1 | fields host, eval_result` | ||
- `source=accounts | grok street_address '%{NUMBER} %{GREEDYDATA:address}' | fields address ` | ||
- `source=logs | grok message '%{COMMONAPACHELOG}' | fields COMMONAPACHELOG, timestamp, response, bytes` | ||
- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L324 need to remove.
BTW, can we add a line to explain current limitation of Grok? such as
Limitation: Overriding existing field is unsupported:
source=accounts | grok address '%{NUMBER} %{GREEDYDATA:address}' | fields address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the rest
ppl-spark-integration/README.md
Outdated
- `source=accounts | grok email '.+@%{HOSTNAME:host}' | eval eval_result=1 | fields host, eval_result` | ||
- `source=accounts | grok street_address '%{NUMBER} %{GREEDYDATA:address}' | fields address ` | ||
- `source=logs | grok message '%{COMMONAPACHELOG}' | fields COMMONAPACHELOG, timestamp, response, bytes` | ||
- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the rest
Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]>
* Add support of GroK command including default patterns Signed-off-by: YANGDB <[email protected]> * Add support of GroK command including default patterns Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]> * fix grok parsing on projected fields Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]> * update named regexp field index selection to the RegExpMatcher Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]> * update named regexp field index selection to the RegExpMatcher Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]> * update comments Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]> * add `scalastyle:off` to ignore a long regexp test seting Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]> * fix according to comments Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]> * update spaces format Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]> * update README.md Signed-off-by: YANGDB <[email protected]> Signed-off-by: YANGDB <[email protected]> --------- Signed-off-by: YANGDB <[email protected]> (cherry picked from commit 176e150) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/opensearch-spark/backport-0.5-nexus 0.5-nexus
# Navigate to the new working tree
pushd ../.worktrees/opensearch-spark/backport-0.5-nexus
# Create a new branch
git switch --create backport/backport-598-to-0.5-nexus
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 176e150e0e7b5e94872d2f4ca8b3c2388f7a40f9
# Push it to GitHub
git push --set-upstream origin backport/backport-598-to-0.5-nexus
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/opensearch-spark/backport-0.5-nexus Then, create a pull request where the |
* Add support of GroK command including default patterns * Add support of GroK command including default patterns * fix grok parsing on projected fields * update named regexp field index selection to the RegExpMatcher * update named regexp field index selection to the RegExpMatcher * update comments * add `scalastyle:off` to ignore a long regexp test seting * fix according to comments * update spaces format * update README.md --------- (cherry picked from commit 176e150) Signed-off-by: YANGDB <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Description
Add PPL
grok
command described hereMap
grok
to the next spark sqlregexp_extract
commandRelated campaign :
#408
Issues Resolved
#451
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.