-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make fill_nothing take an empty string #8643
Conversation
1daf708
to
a5a2777
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an important edge case is unhandled, and I suspect the operation will crash if filling a fixed-length string column with ""
.
Please add a test for this case to ensure it works correctly.
std-bits/table/src/main/java/org/enso/table/data/column/storage/StringStorage.java
Outdated
Show resolved
Hide resolved
I'm wondering if fill_nothing shouldn't be changing types at all. |
I assume the alternative is to ensure the i.e. for a column |
Yes and to error if it doesn't
Yes but fixed length strings can still hold strings smaller than their size. Right? At least that's how they work in the db world. The fixed/variable size is to do with how they are stored not the data in them. |
At least currently in Enso, fixed length means that all string values within that column have this fixed length. Currently, regardless of the type, all strings are stored in the same way (I guess that could change with followups of #8512). In DBs, shorter strings (in fixed length column) are usually padded, right? |
I think from the users/functionality point of view then no they are not. https://www.postgresql.org/docs/current/datatype-character.html `CREATE TABLE test1 (a character(4)); a | char_length CREATE TABLE test2 (b varchar(5)); b | char_length But if we are talking about how they are stored on disk then yes they are likely padded. Again from https://www.postgresql.org/docs/current/datatype-character.html
I think this will cause us problems if we try to round trip fixed width fields from and to a database. I think we should change this behaviour but not as part of this MR. For now I'm going to add the tests you suggested and see what happens for this fix.
|
ab52da3
to
f8ceab4
Compare
std-bits/table/src/main/java/org/enso/table/data/column/storage/StringStorage.java
Show resolved
Hide resolved
I've changed my mind. I like the type expanding where needed. |
std-bits/table/src/main/java/org/enso/table/data/column/storage/StringStorage.java
Outdated
Show resolved
Hide resolved
if setup.is_database.not then | ||
actual.at "col0" . value_type . should_equal (Value_Type.Char size=5 variable_length=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could check this invariant for Postgres?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure how to specify "don't test this on SQLite". Do we have anywhere else we do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
One way is to check the prefix which contains the name of the DB: https://github.com/enso-org/enso/blob/develop/test/Table_Tests/src/Common_Table_Operations/Filter_Spec.enso#L85 But I think that's not quite perfect.
-
The preferred way is to use 'feature flags' from test selection, and I think
setup.test_selection.fixed_length_text_columns
is exactly what is needed here:
https://github.com/enso-org/enso/blob/develop/test/Table_Tests/src/Common_Table_Operations/Conversion_Spec.enso#L76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
t = table_builder [["col0", [Nothing, "200", Nothing, "400", "500", Nothing]]] . cast "col0" (Value_Type.Char size=3 variable_length=False) | ||
actual = t.fill_nothing ["col0"] "" | ||
actual.at "col0" . to_vector . should_equal ["", "200", "", "400", "500", ""] | ||
actual.at "col0" . value_type . should_equal (Value_Type.Char variable_length=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO while this type becomes variable length, it could retain a known max-size:
actual.at "col0" . value_type . should_equal (Value_Type.Char variable_length=True) | |
actual.at "col0" . value_type . should_equal (Value_Type.Char size=3 variable_length=True) |
what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be better wouldn't it? I'll look what that code change would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small test suggestions that I'm not 100% convinced are needed.
Overall it looks OK.
I'm going to merge this one and resolve the 2 suggestions in a second PR as they have some bigger implications that we should discuss. |
This is the follow up PR addressing the last couple of points from #8643 around what the return type from fill_nothing. # Important Notes The biggest change is changing what we size we need for an empty string. This change says a variable length string of length 1 and does it at a low enough level that it will effect the whole language. But I think that is correct.
Pull Request Description
I think fill_nothing should work with an empty string. Currently TextType.preciseTypeForValue(arg.asString()) throws if you give it an empty string so this fix avoid that.
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.