Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fill_nothing take an empty string #8643

Merged
merged 7 commits into from
Jan 10, 2024

Conversation

AdRiley
Copy link
Member

@AdRiley AdRiley commented Dec 28, 2023

Pull Request Description

I think fill_nothing should work with an empty string. Currently TextType.preciseTypeForValue(arg.asString()) throws if you give it an empty string so this fix avoid that.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@AdRiley AdRiley added the CI: No changelog needed Do not require a changelog entry for this PR. label Dec 28, 2023
@AdRiley AdRiley force-pushed the wip/adr/make-fill-nothing-take-empty-string branch from 1daf708 to a5a2777 Compare January 3, 2024 09:56
@AdRiley AdRiley marked this pull request as ready for review January 3, 2024 10:00
@AdRiley AdRiley changed the title Wip/adr/make fill nothing take empty string Make fill_nothing take an empty string Jan 3, 2024
Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an important edge case is unhandled, and I suspect the operation will crash if filling a fixed-length string column with "".

Please add a test for this case to ensure it works correctly.

@AdRiley
Copy link
Member Author

AdRiley commented Jan 4, 2024

I'm wondering if fill_nothing shouldn't be changing types at all.

@AdRiley AdRiley marked this pull request as draft January 4, 2024 13:17
@radeusgd
Copy link
Member

radeusgd commented Jan 8, 2024

I'm wondering if fill_nothing shouldn't be changing types at all.

I assume the alternative is to ensure the default value fits the target type, right?

i.e. for a column (Value_Type.Char size=3 variable_length=False), a value "a" or "" would by disallowed, only strings of exactly length 3 would be allowed? And we could return an error message that suggests that if you want to have a different size, you should cast to variable_length=True first?

@AdRiley
Copy link
Member Author

AdRiley commented Jan 8, 2024

I assume the alternative is to ensure the default value fits the target type, right?

Yes and to error if it doesn't

i.e. for a column (Value_Type.Char size=3 variable_length=False), a value "a" or "" would by disallowed, only strings of exactly length 3 would be allowed? And we could return an error message that suggests that if you want to have a different size, you should cast to variable_length=True first?

Yes but fixed length strings can still hold strings smaller than their size. Right? At least that's how they work in the db world. The fixed/variable size is to do with how they are stored not the data in them.

@radeusgd
Copy link
Member

radeusgd commented Jan 8, 2024

i.e. for a column (Value_Type.Char size=3 variable_length=False), a value "a" or "" would by disallowed, only strings of exactly length 3 would be allowed? And we could return an error message that suggests that if you want to have a different size, you should cast to variable_length=True first?

Yes but fixed length strings can still hold strings smaller than their size. Right? At least that's how they work in the db world. The fixed/variable size is to do with how they are stored not the data in them.

At least currently in Enso, fixed length means that all string values within that column have this fixed length. Currently, regardless of the type, all strings are stored in the same way (I guess that could change with followups of #8512).

In DBs, shorter strings (in fixed length column) are usually padded, right?

@AdRiley
Copy link
Member Author

AdRiley commented Jan 9, 2024

In DBs, shorter strings (in fixed length column) are usually padded, right?

I think from the users/functionality point of view then no they are not.

https://www.postgresql.org/docs/current/datatype-character.html

`CREATE TABLE test1 (a character(4));
INSERT INTO test1 VALUES ('ok');
SELECT a, char_length(a) FROM test1; -- (1)

a | char_length
------+-------------
ok | 2

CREATE TABLE test2 (b varchar(5));
INSERT INTO test2 VALUES ('ok');
INSERT INTO test2 VALUES ('good ');
INSERT INTO test2 VALUES ('too long');
ERROR: value too long for type character varying(5)
INSERT INTO test2 VALUES ('too long'::varchar(5)); -- explicit truncation
SELECT b, char_length(b) FROM test2;

b | char_length
-------+-------------
ok | 2
good | 5
too l | 5`

But if we are talking about how they are stored on disk then yes they are likely padded.

Again from https://www.postgresql.org/docs/current/datatype-character.html
"Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character."

At least currently in Enso, fixed length means that all string values within that column have this fixed length. Currently, regardless of the type, all strings are stored in the same way (I guess that could change with followups of #8512).

I think this will cause us problems if we try to round trip fixed width fields from and to a database. I think we should change this behaviour but not as part of this MR. For now I'm going to add the tests you suggested and see what happens for this fix.

PostgreSQL Documentation
8.3. Character Types # Table 8.4. Character Types Name Description character varying(n), varchar(n) variable-length with limit character(n), char(n), bpchar(n) fixed-length, blank-padded bpchar variable …

@AdRiley AdRiley force-pushed the wip/adr/make-fill-nothing-take-empty-string branch from ab52da3 to f8ceab4 Compare January 9, 2024 10:22
@AdRiley
Copy link
Member Author

AdRiley commented Jan 9, 2024

I'm wondering if fill_nothing shouldn't be changing types at all.

I've changed my mind. I like the type expanding where needed.

@AdRiley AdRiley marked this pull request as ready for review January 9, 2024 12:29
@AdRiley AdRiley requested a review from radeusgd January 10, 2024 08:47
Comment on lines +334 to +335
if setup.is_database.not then
actual.at "col0" . value_type . should_equal (Value_Type.Char size=5 variable_length=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could check this invariant for Postgres?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how to specify "don't test this on SQLite". Do we have anywhere else we do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. One way is to check the prefix which contains the name of the DB: https://github.com/enso-org/enso/blob/develop/test/Table_Tests/src/Common_Table_Operations/Filter_Spec.enso#L85 But I think that's not quite perfect.

  2. The preferred way is to use 'feature flags' from test selection, and I think setup.test_selection.fixed_length_text_columns is exactly what is needed here:
    https://github.com/enso-org/enso/blob/develop/test/Table_Tests/src/Common_Table_Operations/Conversion_Spec.enso#L76

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

t = table_builder [["col0", [Nothing, "200", Nothing, "400", "500", Nothing]]] . cast "col0" (Value_Type.Char size=3 variable_length=False)
actual = t.fill_nothing ["col0"] ""
actual.at "col0" . to_vector . should_equal ["", "200", "", "400", "500", ""]
actual.at "col0" . value_type . should_equal (Value_Type.Char variable_length=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO while this type becomes variable length, it could retain a known max-size:

Suggested change
actual.at "col0" . value_type . should_equal (Value_Type.Char variable_length=True)
actual.at "col0" . value_type . should_equal (Value_Type.Char size=3 variable_length=True)

what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be better wouldn't it? I'll look what that code change would be.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small test suggestions that I'm not 100% convinced are needed.
Overall it looks OK.

@AdRiley
Copy link
Member Author

AdRiley commented Jan 10, 2024

I'm going to merge this one and resolve the 2 suggestions in a second PR as they have some bigger implications that we should discuss.

@AdRiley AdRiley merged commit f31ecc7 into develop Jan 10, 2024
32 of 34 checks passed
@AdRiley AdRiley deleted the wip/adr/make-fill-nothing-take-empty-string branch January 10, 2024 11:59
mergify bot pushed a commit that referenced this pull request Jan 12, 2024
This is the follow up PR addressing the last couple of points from #8643 around what the return type from fill_nothing.

# Important Notes
The biggest change is changing what we size we need for an empty string. This change says a variable length string of length 1 and does it at a low enough level that it will effect the whole language. But I think that is correct.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants