Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT]Handling error response when a comma is present in the operand name of tags #107

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions go/parser.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,12 @@ import (
"fmt"
"strings"
"unicode"
"regexp"
)

const OPERAND = "operand"
const OPERATOR = "operator"
var VALID_TOKEN = regexp.MustCompile(`^(?:@[^@]*|and|or|not|\(|\))$`)

type Evaluatable interface {
Evaluate(variables []string) bool
Expand Down Expand Up @@ -116,6 +118,10 @@ func tokenize(expr string) ([]string, error) {
escaped = true
} else if c == '(' || c == ')' || unicode.IsSpace(c) {
if token.Len() > 0 {
err := isTokenValid(token.String(), expr)
if err != nil {
return nil, err
}
tokens = append(tokens, token.String())
token.Reset()
}
Expand All @@ -127,6 +133,10 @@ func tokenize(expr string) ([]string, error) {
}
}
if token.Len() > 0 {
err := isTokenValid(token.String(), expr)
if err != nil {
return nil, err
}
tokens = append(tokens, token.String())
}

Expand All @@ -152,6 +162,12 @@ func check(infix, expectedTokenType, tokenType string) error {
}
return nil
}
func isTokenValid(token string, expr string) error {
if !VALID_TOKEN.MatchString(token) {
return fmt.Errorf("Tag expression \"%s\" could not be parsed because of syntax error: Please adhere to the Gherkin tag naming convention, using tags like \"@tag1\" and avoiding more than one \"@\" in the tag name.", expr)
}
return nil
}

func pushExpr(token string, stack *EvaluatableStack) {
if token == "and" {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
import java.util.regex.Pattern;

public final class TagExpressionParser {
//regex for token to ensure no token has ',' in them later can be customized further
private final static String VALID_TOKEN = "^(?:@[^@]*|and|or|not|\\(|\\))$";
Copy link
Contributor

@mpkorstanje mpkorstanje Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By trying to validate all tokens rather than the litterals you are making this more complex than needed. Considering that this implementation will have to be repeated for a few languages, perhaps a review in the middle of just the Java implementation can save some duplication of effort.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mpkorstanje, for your reply
I assume that a tag expression consists of operands and operators. When splitting the expression, there are two possibilities. The current token or string could be an operator, which includes "(", ")", "and", "or", and "not". Alternatively, it could be an operand, which should adhere to the Gherkin standard of starting with "@" and not containing "@" anywhere else.

To handle this, I have used a regex pattern to ensure that only valid tokens, which form the expression, are evaluated. By considering both operands and operators, it simplifies the code and eliminates the need for additional checks to determine the token type.

I believed this approach reduces complexity and improves the readability of the code, but maybe I might be missing someting, I would appreciate your opinion on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually the situation is slightly different (as you can see from the tests / testdata):

  • Tags in a tag-expression may either be @foo (with-leading-atsign) or foo (without-leading-atsign)
  • @ (atsign) should not occur in a tag (after the first char) in a tag-expression
  • Gherkin parser removes the @ (atsign) from tags for Features/Rules/Scenarios/… . Therefore, @foo becomes“foo“ internally

SEE ALSO:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gherkin parser removes the @ (atsign)

@jenisys are you sure about that? Or is that true only for some Gherkin parsers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but maybe I might be missing someting, I would appreciate your opinion on this.

@shivam-sehgal without repeating myself that will be a bit difficult.

But do consider validating only the litterals. In essence, do not put the validation in the middle of tokenizer but rather after tokenization, just prior to litteral creation.

Copy link
Author

@shivam-sehgal shivam-sehgal Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpkorstanje cucumber-jvm basically calls this library to parse the tag expression as well as to validate if the tag expression is valid or not so the flow is like this in cucmberOptions annotation user provides the tags expression shown in the below code

# — FILE: SomeClass.java
@CucumberOptions(features = "src/it/resources/features",
        glue = {"com.adobe.aep.devex.exim.integrationtests.stepDefinitions",
                "com.adobe.aep.devex.exim.integrationtests.config",
                "com.adobe.aep.devex.exim.integrationtests.hooks"
        },
        plugin = {"pretty", "html:target/cucumber-html-report.html",
                "json:target/cucumber.json",
                "rerun:target/cucumber-api-rerun.txt"
        },
    tags = "not @ignore and @hello" // Exclude the feature with the "@ignore" tag
)
…

this annotation is parsed by CucumberOptionsAnnotationParser.class.
addTags method in this class is called which then calls this repo to validate the tag expression, now the intent of this PR was users by mistake put tag expressions like @tag1,@tag2 in the tags section in the CucumberOptions, and they don't get any error message, to avoid that we need to validate the tag expression not having any tags or literals like @tag,@againInTag to prevent this confusion and give users a valid error response so they can understand that they need to use logical operator, instead of commas, I can put this tag validation for the provided tag expression in addTags method of the CucumberOptionsAnnotationParser clas i.e in cucumber jvm, but my mind says it's better to keep validation logic inside this tag expression validation repo only to follow the single responsibility principle of oops,
to address this concern with the least impacting change, for now, I see if there is an expression having a tag starting with @ that tag shouldn't have @ after first, it will prevent this confusion with a proper error message letting user understand that they are doing mistake and cucumber is reading their expression as a single tag, and also won't affect any case where we allow the literals that are not starting with @

In short, cucumber-jvm calls this repo to validate the tag expression passed by the user and uses a message from TagExpressionExceptionto let the user know with a message what is wrong with the expression, the message should be coming from here i.e from this repo's java directory, else the responsibility of this code we have to handle separately in cucumber-jvm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution approach could be:

  • if CucumberOptions had an optional parameter, like tag_expression_validator, tag_expression_parser or an tag-expression preprocess hook you could perform a pre-process check
  • Pre-process check would just split the tag-expression string value into words and check if comma(s) are contained in any word
  • If a comma is contained, you issue a log-warning or raise an exception (whatever the „right“ consequent action is), …

This approach would have the following advantages:

  • Strict checking is optional
  • If you want/need strict checking of tag-expressions, you can plug-in the functionality

HINT:
By looking at the current implementation of the CucumberOptions class, this needed parameter would need to be added to support such a kind of solution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jenisys, I find the approach of using an optional flag for literal validation quite appealing. However, I have some doubts about the term "strict checking." If libraries using this repository are enforcing Gherkins rules, then it seems reasonable for this library to do the same. However, if we believe and would like to continue supporting users who define literals not starting with @ in new versions, then the optional parameter approach makes sense.

One potential concern raised by @mpkorstanje is that checking for commas alone may cause issues. To differentiate between v1 and v2, perhaps we should consider checking for ^@[^@]+ which he mentioned if the optional strict validation flag is passed in CucumberOptions.

I would appreciate hearing both of your opinions on this matter. Let's collaborate and I can begin working on it.

cc: @mpkorstanje

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cost of the complexity is starting to exceed the value of the solution.

Perhaps a simpler solution such as improving the documentation in Cucumber JVM might help?

Copy link
Author

@shivam-sehgal shivam-sehgal Aug 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mpkorstanje for your advice, have opened a pull request to update the doc a little bit, to add info for users to avoid the confusion of separating tags using ,, attaching the screenshot of the doc changes below
image
would like to request your review of this PR .
Thanks in advance

private static final Map<String, Assoc> ASSOC = new HashMap<String, Assoc>() {{
put("or", Assoc.LEFT);
put("and", Assoc.LEFT);
Expand Down Expand Up @@ -106,6 +108,7 @@ private static List<String> tokenize(String expr) {
isEscaped = true;
} else if (c == '(' || c == ')' || Character.isWhitespace(c)) {
if (token.length() > 0) {
isTokenValid(token,expr);
tokens.add(token.toString());
token = new StringBuilder();
}
Expand All @@ -116,12 +119,28 @@ private static List<String> tokenize(String expr) {
token.append(c);
}
}
if (token.length() > 0) {
if (token.length() > 0) {
isTokenValid(token,expr);
tokens.add(token.toString());
}
return tokens;
}

/**
* this method checks if the token comply with the req
* regex if not throws exception
* @param token supposed tag or operator of the expresiion
* @param expr entire expression
*/
private static void isTokenValid(StringBuilder token,String expr){

if(token.length()>0&&!String.valueOf(token).matches(VALID_TOKEN)){
throw new TagExpressionException("Tag expression \"%s\" could not be parsed because of syntax error: Please adhere to the Gherkin tag naming convention, using tags like \"@tag1\" and avoiding more than one \"@\" in the tag name.",
expr);
}

}

private void check(TokenType expectedTokenType, TokenType tokenType) {
if (expectedTokenType != tokenType) {
throw new TagExpressionException("Tag expression \"%s\" could not be parsed because of syntax error: Expected %s.", infix, expectedTokenType.toString().toLowerCase());
Expand Down
11 changes: 11 additions & 0 deletions javascript/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ const ASSOC: { [key: string]: string } = {
and: 'left',
not: 'right',
}
const VALID_TOKEN = /^(?:@[^@]*|and|or|not|\(|\))$/;

/**
* Parses infix boolean expression (using Dijkstra's Shunting Yard algorithm)
Expand Down Expand Up @@ -109,6 +110,7 @@ function tokenize(expr: string): string[] {
isEscaped = true
} else if (c === '(' || c === ')' || /\s/.test(c)) {
if (token.length > 0) {
isTokenValid(token.join(''),expr);
tokens.push(token.join(''))
token = []
}
Expand All @@ -120,11 +122,20 @@ function tokenize(expr: string): string[] {
}
}
if (token.length > 0) {
isTokenValid(token.join(''),expr);
tokens.push(token.join(''))
}
return tokens
}

function isTokenValid(token: string, expr: string): void {
if (!token.match(VALID_TOKEN)) {
throw new Error(
`Tag expression "${expr}" could not be parsed because of syntax error: Please adhere to the Gherkin tag naming convention, using tags like "@tag1" and avoiding more than one "@" in the tag name.`
);
}
}

function isUnary(token: string) {
return 'not' === token
}
Expand Down
22 changes: 19 additions & 3 deletions perl/lib/Cucumber/TagExpressions.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ use strict;
use warnings;

use Cucumber::TagExpressions::Node;

our $VALID_TOKEN = qr/^(?:@[^@]*|and|or|not|\(|\))$/;
sub _expect_token {
my ( $state, $token ) = @_;

Expand All @@ -57,11 +57,19 @@ sub _get_token {
my $token = '';
while (1) {
my $char = _consume_char( $state, 1 );
return ($token ? $token : undef)
if not defined $char;
if (!defined $char) {
if ($token){
_is_token_valid( $state, $token);
return $token;
}
else{
return undef;
}
}

if ( $char =~ m/\s/ ) {
if ( $token ) {
_is_token_valid( $state, $token);
return $token;
}
else {
Expand All @@ -70,6 +78,7 @@ sub _get_token {
}
elsif ( $char eq '(' or $char eq ')' ) {
if ( $token ) {
_is_token_valid( $state, $token);
_save_token( $state, $char );
return $token;
}
Expand All @@ -93,6 +102,13 @@ sub _get_token {
}
}

sub _is_token_valid {
my ($state, $token) = @_;
if ($token !~ $VALID_TOKEN) {
die qq{Tag expression "$state->{text}" could not be parsed because of syntax error: Please adhere to the Gherkin tag naming convention, using tags like "\@tag1" and avoiding more than one "\@" in the tag name.}
}
}

sub _save_token {
my ( $state, $token ) = @_;

Expand Down
13 changes: 13 additions & 0 deletions python/cucumber_tag_expressions/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
from __future__ import absolute_import
from enum import Enum
from cucumber_tag_expressions.model import Literal, And, Or, Not, True_
import re



# -----------------------------------------------------------------------------
Expand Down Expand Up @@ -156,6 +158,9 @@ class TagExpressionParser(object):
TOKEN_MAP = {token.keyword: token
for token in Token.__members__.values()}

valid_token_pattern = r'^(?:@[^@]*|and|or|not|\(|\))$'
valid_token_regex = re.compile(valid_token_pattern)

@classmethod
def select_token(cls, text):
"""Select the token that matches the text or return None.
Expand All @@ -165,6 +170,13 @@ def select_token(cls, text):
"""
return cls.TOKEN_MAP.get(text, None)

@classmethod
def check_valid_token(cls,part,expr):
if not cls.valid_token_regex.match(part):
message = 'Tag expression "%s" could not be parsed because of syntax error: Please adhere to the Gherkin tag naming convention, using tags like "@tag1" and avoiding more than one "@" in the tag name.'
raise TagExpressionError(message % (expr))


@classmethod
def make_operand(cls, text):
"""Creates operand-object from parsed text."""
Expand Down Expand Up @@ -201,6 +213,7 @@ def ensure_expected_token_type(token_type):

for index, part in enumerate(parts):
token = cls.select_token(part)
cls.check_valid_token(part,text)
if token is None:
# -- CASE OPERAND: Literal or ...
ensure_expected_token_type(TokenType.OPERAND)
Expand Down
72 changes: 36 additions & 36 deletions python/tests/functional/test_tag_expression.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,19 +34,19 @@ class TestTagExpression(object):

@pytest.mark.parametrize("tag_expression_text, expected, tags, case", [
("", True, [], "no_tags"),
("", True, ["a"], "one tag: a"),
("", True, ["other"], "one tag: other"),
("", True, ["@a"], "one tag: @a"),
("", True, ["@other"], "one tag: @other"),
])
def test_empty_expression_is_true(self, tag_expression_text, expected, tags, case):
tag_expression = TagExpressionParser.parse(tag_expression_text)
assert expected == tag_expression.evaluate(tags)


@pytest.mark.parametrize("tag_expression_text, expected, tags, case", [
("not a", False, ["a", "other"], "two tags: a, other"),
("not a", False, ["a"], "one tag: a"),
("not a", True, ["other"], "one tag: other"),
("not a", True, [], "no_tags"),
("not @a", False, ["@a", "@other"], "two tags: @a, @other"),
("not @a", False, ["@a"], "one tag: @a"),
("not @a", True, ["@other"], "one tag: @other"),
("not @a", True, [], "no_tags"),
])
def test_not_operation(self, tag_expression_text, expected, tags, case):
tag_expression = TagExpressionParser.parse(tag_expression_text)
Expand Down Expand Up @@ -75,55 +75,55 @@ def test_fails_when_only_operators_are_used(self, tag_part):


@pytest.mark.parametrize("tag_expression_text, expected, tags, case", [
("a and b", True, ["a", "b"], "both tags"),
("a and b", True, ["a", "b", "other"], "both tags and more"),
("a and b", False, ["a"], "one tag: a"),
("a and b", False, ["b"], "one tag: b"),
("a and b", False, ["other"], "one tag: other"),
("a and b", False, [], "no_tags"),
("@a and @b", True, ["@a", "@b"], "both tags"),
("@a and @b", True, ["@a", "@b", "@other"], "both tags and more"),
("@a and @b", False, ["@a"], "one tag: @a"),
("@a and @b", False, ["@b"], "one tag: @b"),
("@a and @b", False, ["@other"], "one tag: @other"),
("@a and @b", False, [], "no_tags"),
])
def test_and_operation(self, tag_expression_text, expected, tags, case):
tag_expression = TagExpressionParser.parse(tag_expression_text)
assert expected == tag_expression.evaluate(tags)

@pytest.mark.parametrize("tag_expression_text, expected, tags, case", [
("a or b", True, ["a", "b"], "both tags"),
("a or b", True, ["a", "b", "other"], "both tags and more"),
("a or b", True, ["a"], "one tag: a"),
("a or b", True, ["b"], "one tag: b"),
("a or b", False, ["other"], "one tag: other"),
("a or b", False, [], "no_tags"),
("@a or @b", True, ["@a", "@b"], "both tags"),
("@a or @b", True, ["@a", "@b", "@other"], "both tags and more"),
("@a or @b", True, ["@a"], "one tag: @a"),
("@a or @b", True, ["@b"], "one tag: @b"),
("@a or @b", False, ["@other"], "one tag: @other"),
("@a or @b", False, [], "no_tags"),
])
def test_or_operation(self, tag_expression_text, expected, tags, case):
tag_expression = TagExpressionParser.parse(tag_expression_text)
assert expected == tag_expression.evaluate(tags)

@pytest.mark.parametrize("tag_expression_text, expected, tags, case", [
("a", True, ["a", "other"], "two tags: a, other"),
("a", True, ["a"], "one tag: a"),
("a", False, ["other"], "one tag: other"),
("a", False, [], "no_tags"),
("@a", True, ["@a", "@other"], "two tags: @a, @other"),
("@a", True, ["@a"], "one tag: @a"),
("@a", False, ["@other"], "one tag: @other"),
("@a", False, [], "no_tags"),
])
def test_literal(self, tag_expression_text, expected, tags, case):
tag_expression = TagExpressionParser.parse(tag_expression_text)
assert expected == tag_expression.evaluate(tags)

# NOTE: CANDIDATE for property-based testing
@pytest.mark.parametrize("tag_expression_text, expected, tags, case", [
("a and b", True, ["a", "b"], "two tags: a, b"),
("a and b", False, ["a"], "one tag: a"),
("a and b", False, [], "no_tags"),
("a or b", True, ["a", "b"], "two tags: a, b"),
("a or b", True, ["b"], "one tag: b"),
("a or b", False, [], "no_tags"),
("a and b or c", True, ["a", "b", "c"], "three tags: a, b, c"),
("a and b or c", True, ["a", "other", "c"], "three tags: a, other, c"),
("a and b or c", True, ["a", "b", "other"], "three tags: a, b, other"),
("a and b or c", True, ["a", "b"], "two tags: a, b"),
("a and b or c", True, ["a", "c"], "two tags: a, c"),
("a and b or c", False, ["a"], "one tag: a"),
("a and b or c", True, ["c"], "one tag: c"),
("a and b or c", False, [], "not tags"),
("@a and @b", True, ["@a", "@b"], "two tags: @a, @b"),
("@a and @b", False, ["@a"], "one tag: @a"),
("@a and @b", False, [], "no_tags"),
("@a or @b", True, ["@a", "@b"], "two tags: @a, @b"),
("@a or @b", True, ["@b"], "one tag: @b"),
("@a or @b", False, [], "no_tags"),
("@a and @b or @c", True, ["@a", "@b", "@c"], "three tags: @a, @b, @c"),
("@a and @b or @c", True, ["@a", "@other", "@c"], "three tags: @a, @other, @c"),
("@a and @b or @c", True, ["@a", "@b", "@other"], "three tags: @a, @b, @other"),
("@a and @b or @c", True, ["@a", "@b"], "two tags: @a, @b"),
("@a and @b or @c", True, ["@a", "@c"], "two tags: @a, @c"),
("@a and @b or @c", False, ["@a"], "one tag: @a"),
("@a and @b or @c", True, ["@c"], "one tag: @c"),
("@a and @b or @c", False, [], "not tags"),
])
def test_not_not_expression_sameas_expression(self, tag_expression_text, expected, tags, case):
not2_tag_expression_text = "not not "+ tag_expression_text
Expand Down
Loading