Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TypedString contain Value instead of String to support and preserve other quote styles #1679

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

graup
Copy link

@graup graup commented Jan 23, 2025

Fixes #1673. This is a breaking change.

TypedString contained a String without any knowledge of the used quote style. The parser used parse_literal_string to construct this, which doesn't support any quote styles other than single or double quotes. Namely, it doesn't support triple quotes from BigQuery, causing the issue reported in #1673. Additionally, it doesn't round-trip properly, always formatting its string using single quotes.

I think the most proper fix is to have TypedString contain a Value instead, similar to IntroducedString and others. This gives us immediate support for other quote styles and fixes the formatting to make it roundtrippable.

This is a breaking change but should be an easy fix in users' codebases, just (un)wrapping the value. Migration path:

  1. When constructing an AST node
Expr::TypedString {
    data_type: DataType::JSON,
--  value: r#"{"class" : {"students" : [{"name" : "Jane"}]}}"#.to_string()
++  value: Value::SingleQuotedString(
++      r#"{"class" : {"students" : [{"name" : "Jane"}]}}"#.to_string()
++  )
},],
  1. When using AST parser results
if let Expr::TypedString { data_type, value } = expr {
--  let string_value: String = value;
++  let string_value: String = value.into();
    };
}

(This works because I also added impl Into<String> for Value. Not totally sure about this being 100% correct.)

While fixing this I noticed another issue: Value uses escape_quoted_string to format single and double quoted strings. This doesn't take into account the dialect, producing invalid output for BigQuery like 'foo''bar' (should be 'foo\'bar'). I think that's a separate issue better addressed by a separate PR.

Value::DoubleQuotedString(s) => s,
Value::Placeholder(s) => s,
Value::DollarQuotedString(s) => s.value,
_ => panic!("not a string value"),
Copy link
Author

@graup graup Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this one. This panics for Number, Boolean, Null. Should be safe if users only access String values where they are supposed to be? But there may be a better way do achieve an ergonomic value unwrap, like maybe just a unwrap function that returns Option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wasn't clear to me in the PR why the Into<String> for Value was needed, did we need it for something in this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking no, but without it, users that used the string value before have to put code like this in their code. Thought it would be nice offering a more easy migration by just adding .into() to the existing usage.

@@ -48,34 +48,34 @@ fn parse_literal_string() {
let select = dialect.verified_only_select(sql);
assert_eq!(10, select.projection.len());
assert_eq!(
&Expr::Value(Value::SingleQuotedString("single".to_string())),
&Expr::Value(Value::SingleQuotedString("single".into())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I followed the intent behind the changes from into() to to_string(), I figure they should be equivalent?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unrelated, was just tidying up to make the usage all consistent in the same file and not jump back and forth between into() and to_string(). Can revert this to make the PR simpler.

Value::DoubleQuotedString(s) => s,
Value::Placeholder(s) => s,
Value::DollarQuotedString(s) => s.value,
_ => panic!("not a string value"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wasn't clear to me in the PR why the Into<String> for Value was needed, did we need it for something in this PR?

@@ -39,7 +39,7 @@ fn parse_literal_string() {
r#"'''triple-single'unescaped''', "#,
r#""double\"escaped", "#,
r#""""triple-double\"escaped""", "#,
r#""""triple-double"unescaped""""#,
r#""""triple-double"un'escaped""""#,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this change intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because currently there is no test for a single quote inside a triple quoted string. Thought it's a good idea to test this formatting behavior. (Considering the other bug I found in the formatting.)

Comment on lines +2235 to +2238
// SingleQuotedString and DoubleQuotedString are currently not correctly formatted by `fmt::Display for Value`.
// BigQuery does not support double escaping, should be \' or \" instead.
//bigquery().verified_expr(r#"JSON '{"foo":"bar\'s"}'"#);
//bigquery().verified_expr(r#"JSON "{\"foo\":\"bar's\"}""#);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this part a TODO for this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, like I mentioned in the description, this would be better for another PR. I can remove those lines if you think that's better but I thought it's good to document this unintended behavior. (The test should be there, it's just failing at the moment.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fails to parse TypedString expressions in BigQuery with triple quoted strings
2 participants