-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure cluster string could be quoted #120355
base: main
Are you sure you want to change the base?
Conversation
Currently we accept "remote:index", remote:"index" but not "remote":"index" as a valid index pattern. This change fixes this.
Hi @idegtiarenko, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
@@ -615,6 +615,55 @@ private void clustersAndIndices(String command, String indexString1, String inde | |||
); | |||
} | |||
|
|||
public void testValidQuotingFromIndexPattern() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coverage looks pretty good to me, there is one negative case that I can think of. According to RemoteClusterAware.isRemoteIndexName
, :
is a valid character in the index pattern used to separate cluster and index name, it is not a valid character for index name, I wonder if it is a valid character for the cluster name?
The following queries can pass the grammar and parser, they errors out, which look correct, however the error message does not look quite clear.
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "FROM \"remot:e\":existing_index"
}
'
{
"error" : {
"root_cause" : [
{
"type" : "no_such_remote_cluster_exception",
"reason" : "no such remote cluster: [remot]"
}
],
"type" : "no_such_remote_cluster_exception",
"reason" : "no such remote cluster: [remot]"
},
"status" : 404
}
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "FROM \"remote:\":existing_index"
}
'
{
"error" : {
"root_cause" : [
{
"type" : "invalid_index_name_exception",
"reason" : "Invalid index name [remote::existing_index], Invalid usage of :: separator, [existing_index] is not a recognized selector",
"index_uuid" : "_na_",
"index" : "remote::existing_index"
}
],
"type" : "invalid_index_name_exception",
"reason" : "Invalid index name [remote::existing_index], Invalid usage of :: separator, [existing_index] is not a recognized selector",
"index_uuid" : "_na_",
"index" : "remote::existing_index"
},
"status" : 400
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RemoteClusterAware.isRemoteIndexName, : is a valid character in the index pattern used to separate cluster and index name, it is not a valid character for index name, I wonder if it is a valid character for the cluster name?
I believe :
can not be used as a character in a cluster name.
According to
elasticsearch/server/src/main/java/org/elasticsearch/transport/RemoteClusterAware.java
Lines 91 to 114 in c3839e1
/** | |
* Split the index name into remote cluster alias and index name. | |
* The index expression is assumed to be individual index (no commas) but can contain `-`, wildcards, | |
* datemath, remote cluster name and any other syntax permissible in index expression component. | |
* There's no guarantee the components actually represent existing remote cluster or index, only | |
* rudimentary checks are done on the syntax. | |
*/ | |
public static String[] splitIndexName(String indexExpression) { | |
if (indexExpression.isEmpty() || indexExpression.charAt(0) == '<' || indexExpression.startsWith("-<")) { | |
// This is date math, but even if it is not, the remote can't start with '<'. | |
// Thus, whatever it is, this is definitely not a remote index. | |
return new String[] { null, indexExpression }; | |
} | |
int i = indexExpression.indexOf(RemoteClusterService.REMOTE_CLUSTER_INDEX_SEPARATOR); | |
if (i == 0) { | |
throw new IllegalArgumentException("index name [" + indexExpression + "] is invalid because the remote part is empty"); | |
} | |
if (i < 0 || indexExpression.startsWith(SelectorResolver.SELECTOR_SEPARATOR, i)) { | |
// Either no colon present, or the colon was a part of a selector separator (::) | |
return new String[] { null, indexExpression }; | |
} else { | |
return new String[] { indexExpression.substring(0, i), indexExpression.substring(i + 1) }; | |
} | |
} |
we rely on finding the first
:
when splitting cluster name and index pattern in indexExpression. This would lead to an indexPattern with :
when multiple :
used that is not permitted.
Also when registering a remote with :
I am getting the following:
PUT http://localhost:9200/_cluster/settings
Content-Type: application/json
{
"persistent" : {
"cluster.remote.remote:1.seeds" : ["127.0.0.1:9301"]
}
}
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "persistent setting [cluster.remote.remote:1.seeds], not recognized"
}
],
"type": "illegal_argument_exception",
"reason": "persistent setting [cluster.remote.remote:1.seeds], not recognized"
},
"status": 400
}
I believe above indicates that we do not support :
in cluster names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added basic cluster string validation in d58db78.
Please note below indexPatterns are not checked:
FROM remote:invalid:index
FROM remote:"invalid:index"
as we skip all validation when detecting remote:
Lines 83 to 86 in d58db78
if (clusterString == null) { | |
hasSeenStar.set(indexPattern.contains(WILDCARD) || hasSeenStar.get()); | |
validateIndexPattern(indexPattern, c, hasSeenStar.get()); | |
} else { |
Lines 105 to 107 in d58db78
if (isRemoteIndexName(index)) { // skip the validation if there is remote cluster | |
continue; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
clusterString | ||
: UNQUOTED_SOURCE | ||
| QUOTED_STRING | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they are the same, point clusterString to indexString:
clusterString
: indexString
;
We could fully remove it but it's worth keeping the element in for future changes.
if (ctx == null) { | ||
return null; | ||
} else if (ctx.UNQUOTED_SOURCE() != null) { | ||
return ctx.UNQUOTED_SOURCE().getText(); | ||
} else { | ||
return unquote(ctx.QUOTED_STRING().getText()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment in the grammar - this method can then be either remove or delegate to visitIndexString.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya, I agree this looks good and consistent.
However, do you know of an example where the added syntax enables us to specify a remote-index-pattern combination that was either not possible before, or was less ergonomic to express?
If we can construct one, it'd be great to have this in the PR's description and/or as a test.
Otherwise, I have only some minor remarks - and the added tests have some overlap with existing tests. Maybe it'd be nicer if we could at least move them closer together.
expectError("FROM \"remote:\":index", "line 1:6: cluster string [remote:] must not contain ':'"); | ||
expectError("FROM \"remote:invalid\":index", "line 1:6: cluster string [remote:invalid] must not contain ':'"); | ||
} | ||
|
||
public void testInvalidQuotingAsFromIndexPattern() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also add tests for invalid quoting of the remote name itself for good measure.
return str.charAt(randomInt(str.length() - 1)); | ||
} | ||
|
||
public void testInvalidFromIndexPattern() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to testInvalidCharacterInIndexPattern
in the same file.
@@ -615,6 +615,60 @@ private void clustersAndIndices(String command, String indexString1, String inde | |||
); | |||
} | |||
|
|||
public void testValidFromIndexPattern() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has some overlap with testStringAsIndexPattern
in the same file.
I think this test has the following drawbacks:
- Only one kind of quoting (
"
, but there's also still"""
). - None of the index patterns actually require quoting. Looking at
testStringAsIndexPattern
, there's e.g. date math stuff that can make quoting required.
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.interp # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
} | ||
|
||
public String visitIndexPattern(List<EsqlBaseParser.IndexPatternContext> ctx) { | ||
List<String> patterns = new ArrayList<>(ctx.size()); | ||
Holder<Boolean> hasSeenStar = new Holder<>(false); | ||
ctx.forEach(c -> { | ||
String indexPattern = visitIndexString(c.indexString()); | ||
String clusterString = c.clusterString() != null ? c.clusterString().getText() : null; | ||
String clusterString = visitClusterString(c.clusterString()); | ||
// skip validating index on remote cluster, because the behavior of remote cluster is not consistent with local cluster | ||
// For example, invalid#index is an invalid index name, however FROM *:invalid#index does not return an error | ||
if (clusterString == null) { | ||
hasSeenStar.set(indexPattern.contains(WILDCARD) || hasSeenStar.get()); | ||
validateIndexPattern(indexPattern, c, hasSeenStar.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's slightly out of scope but I just realized that the validation for the index pattern is lacking. For instance, you can use FROM "foo:bar:baz"
and you'll not even get an error message.
What is a bit more in scope: at least when the cluster string is not null, we should probably validate that the index pattern is not a remote pattern. This applies to cases like FROM "remote":"index:pattern"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried locally, with your branch:
$ curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/_query?format=txt" -d '
{
"query": "from cluster_one:\"remote:index\""
}'
<no-fields>
---------------
$ curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/_query?format=txt" -d '
{
"query": "from cluster_one:remote:index"
}'
{"error":{"root_cause":[{"type":"parsing_exception","reason":"line 1:24: mismatched input ':' expecting {<EOF>, '|', ',', 'metadata'}"}],"type":"parsing_exception","reason":"line 1:24: mismatched input ':' expecting {<EOF>, '|', ',', 'metadata'}","caused_by":{"type":"input_mismatch_exception","reason":null}},"status":400}%
$ curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/_query?format=txt" -d '
{
"query": "from \"cluster_one:remote:index\""
}'
<no-fields>
---------------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idegtiarenko if you feel like it, I think we could add some validation for this as we're just touching this anyway. Otherwise, let's put what we found into an issue because that'll need to be fixed one day, anyway.
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.interp # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.java
Currently we accept "remote:index", remote:"index" but not "remote":"index" as a valid index pattern. This change fixes this.