Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure cluster string could be quoted #120355

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

idegtiarenko
Copy link
Contributor

Currently we accept "remote:index", remote:"index" but not "remote":"index" as a valid index pattern. This change fixes this.

Currently we accept "remote:index", remote:"index" but not "remote":"index" as a valid index pattern. This change fixes this.
@idegtiarenko idegtiarenko added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Jan 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@astefan astefan requested review from costin and fang-xing-esql and removed request for alex-spies January 17, 2025 09:31
@@ -615,6 +615,55 @@ private void clustersAndIndices(String command, String indexString1, String inde
);
}

public void testValidQuotingFromIndexPattern() {
Copy link
Member

@fang-xing-esql fang-xing-esql Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coverage looks pretty good to me, there is one negative case that I can think of. According to RemoteClusterAware.isRemoteIndexName, : is a valid character in the index pattern used to separate cluster and index name, it is not a valid character for index name, I wonder if it is a valid character for the cluster name?

The following queries can pass the grammar and parser, they errors out, which look correct, however the error message does not look quite clear.

+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "FROM \"remot:e\":existing_index"
} 
'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "no_such_remote_cluster_exception",
        "reason" : "no such remote cluster: [remot]"
      }
    ],
    "type" : "no_such_remote_cluster_exception",
    "reason" : "no such remote cluster: [remot]"
  },
  "status" : 404
}
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "FROM \"remote:\":existing_index"
} 
'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "invalid_index_name_exception",
        "reason" : "Invalid index name [remote::existing_index], Invalid usage of :: separator, [existing_index] is not a recognized selector",
        "index_uuid" : "_na_",
        "index" : "remote::existing_index"
      }
    ],
    "type" : "invalid_index_name_exception",
    "reason" : "Invalid index name [remote::existing_index], Invalid usage of :: separator, [existing_index] is not a recognized selector",
    "index_uuid" : "_na_",
    "index" : "remote::existing_index"
  },
  "status" : 400
}

Copy link
Contributor Author

@idegtiarenko idegtiarenko Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoteClusterAware.isRemoteIndexName, : is a valid character in the index pattern used to separate cluster and index name, it is not a valid character for index name, I wonder if it is a valid character for the cluster name?

I believe : can not be used as a character in a cluster name.

According to

/**
* Split the index name into remote cluster alias and index name.
* The index expression is assumed to be individual index (no commas) but can contain `-`, wildcards,
* datemath, remote cluster name and any other syntax permissible in index expression component.
* There's no guarantee the components actually represent existing remote cluster or index, only
* rudimentary checks are done on the syntax.
*/
public static String[] splitIndexName(String indexExpression) {
if (indexExpression.isEmpty() || indexExpression.charAt(0) == '<' || indexExpression.startsWith("-<")) {
// This is date math, but even if it is not, the remote can't start with '<'.
// Thus, whatever it is, this is definitely not a remote index.
return new String[] { null, indexExpression };
}
int i = indexExpression.indexOf(RemoteClusterService.REMOTE_CLUSTER_INDEX_SEPARATOR);
if (i == 0) {
throw new IllegalArgumentException("index name [" + indexExpression + "] is invalid because the remote part is empty");
}
if (i < 0 || indexExpression.startsWith(SelectorResolver.SELECTOR_SEPARATOR, i)) {
// Either no colon present, or the colon was a part of a selector separator (::)
return new String[] { null, indexExpression };
} else {
return new String[] { indexExpression.substring(0, i), indexExpression.substring(i + 1) };
}
}

we rely on finding the first : when splitting cluster name and index pattern in indexExpression. This would lead to an indexPattern with : when multiple : used that is not permitted.

Also when registering a remote with : I am getting the following:

PUT http://localhost:9200/_cluster/settings
Content-Type: application/json

{
  "persistent" : {
    "cluster.remote.remote:1.seeds" : ["127.0.0.1:9301"]
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "persistent setting [cluster.remote.remote:1.seeds], not recognized"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "persistent setting [cluster.remote.remote:1.seeds], not recognized"
  },
  "status": 400
}

I believe above indicates that we do not support : in cluster names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added basic cluster string validation in d58db78.

Please note below indexPatterns are not checked:

FROM remote:invalid:index
FROM remote:"invalid:index"

as we skip all validation when detecting remote:

if (clusterString == null) {
hasSeenStar.set(indexPattern.contains(WILDCARD) || hasSeenStar.get());
validateIndexPattern(indexPattern, c, hasSeenStar.get());
} else {

if (isRemoteIndexName(index)) { // skip the validation if there is remote cluster
continue;
}

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 144 to 147
clusterString
: UNQUOTED_SOURCE
| QUOTED_STRING
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they are the same, point clusterString to indexString:

clusterString
  : indexString
;

We could fully remove it but it's worth keeping the element in for future changes.

Comment on lines +57 to +63
if (ctx == null) {
return null;
} else if (ctx.UNQUOTED_SOURCE() != null) {
return ctx.UNQUOTED_SOURCE().getText();
} else {
return unquote(ctx.QUOTED_STRING().getText());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in the grammar - this method can then be either remove or delegate to visitIndexString.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya, I agree this looks good and consistent.

However, do you know of an example where the added syntax enables us to specify a remote-index-pattern combination that was either not possible before, or was less ergonomic to express?

If we can construct one, it'd be great to have this in the PR's description and/or as a test.

Otherwise, I have only some minor remarks - and the added tests have some overlap with existing tests. Maybe it'd be nicer if we could at least move them closer together.

expectError("FROM \"remote:\":index", "line 1:6: cluster string [remote:] must not contain ':'");
expectError("FROM \"remote:invalid\":index", "line 1:6: cluster string [remote:invalid] must not contain ':'");
}

public void testInvalidQuotingAsFromIndexPattern() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also add tests for invalid quoting of the remote name itself for good measure.

return str.charAt(randomInt(str.length() - 1));
}

public void testInvalidFromIndexPattern() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to testInvalidCharacterInIndexPattern in the same file.

@@ -615,6 +615,60 @@ private void clustersAndIndices(String command, String indexString1, String inde
);
}

public void testValidFromIndexPattern() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has some overlap with testStringAsIndexPattern in the same file.

I think this test has the following drawbacks:

  • Only one kind of quoting (", but there's also still """).
  • None of the index patterns actually require quoting. Looking at testStringAsIndexPattern, there's e.g. date math stuff that can make quoting required.

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.interp
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java
}

public String visitIndexPattern(List<EsqlBaseParser.IndexPatternContext> ctx) {
List<String> patterns = new ArrayList<>(ctx.size());
Holder<Boolean> hasSeenStar = new Holder<>(false);
ctx.forEach(c -> {
String indexPattern = visitIndexString(c.indexString());
String clusterString = c.clusterString() != null ? c.clusterString().getText() : null;
String clusterString = visitClusterString(c.clusterString());
// skip validating index on remote cluster, because the behavior of remote cluster is not consistent with local cluster
// For example, invalid#index is an invalid index name, however FROM *:invalid#index does not return an error
if (clusterString == null) {
hasSeenStar.set(indexPattern.contains(WILDCARD) || hasSeenStar.get());
validateIndexPattern(indexPattern, c, hasSeenStar.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's slightly out of scope but I just realized that the validation for the index pattern is lacking. For instance, you can use FROM "foo:bar:baz" and you'll not even get an error message.

What is a bit more in scope: at least when the cluster string is not null, we should probably validate that the index pattern is not a remote pattern. This applies to cases like FROM "remote":"index:pattern".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried locally, with your branch:

$ curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/_query?format=txt" -d '
{
  "query": "from cluster_one:\"remote:index\""
}'
  <no-fields>  
---------------

$ curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/_query?format=txt" -d '
{
  "query": "from cluster_one:remote:index"   
}'
{"error":{"root_cause":[{"type":"parsing_exception","reason":"line 1:24: mismatched input ':' expecting {<EOF>, '|', ',', 'metadata'}"}],"type":"parsing_exception","reason":"line 1:24: mismatched input ':' expecting {<EOF>, '|', ',', 'metadata'}","caused_by":{"type":"input_mismatch_exception","reason":null}},"status":400}%

$ curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/_query?format=txt" -d '
{
  "query": "from \"cluster_one:remote:index\""
}'
  <no-fields>  
---------------

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idegtiarenko if you feel like it, I think we could add some validation for this as we're just touching this anyway. Otherwise, let's put what we found into an issue because that'll need to be fixed one day, anyway.

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.interp
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants