Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds case insensitive decorators. #59

Merged
merged 3 commits into from
Oct 23, 2021
Merged

Conversation

cheqianh
Copy link
Contributor

@cheqianh cheqianh commented Oct 15, 2021

Description

This PR fixs the issue #51. Ion is case sensitive and path extractor's case insensitive flag is targeting to the field names only (E.g. it doesn't affect fields within a nested struct).

Discussed with jobarr-amzn, one of the most scalable ways to solve the issue is implementing a case insensitive decorator which wraps an Ion Container. For read methods such as containsKey, get, iterator, the decorator should return a case insensitive decorator wrapped Ion Value as well without changing the underlying Ion struct to make sure all fields within a nested container are still case insensitive.

Reproduce the issue (solved in this PR):

Ion text: {foo: {Bar: 1}}
Hive QL:

CREATE EXTERNAL TABLE case( foo struct<bar:int> )
ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe' 
WITH SERDEPROPERTIES ("ion.path_extractor.case_sensitive"="false") 
STORED AS INPUTFORMAT 'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 'com.amazon.ionhiveserde.formats.IonOutputFormat' 
LOCATION '/user/data'; 

Reading from Hive (Before):

hive > Select * from case                                              
OK                                                                     
null     // can't find bar!

It shows (Now):

hive > Select * from case                                              
OK                                                                     
{bar: 1} 

Root:

The root of this issue is because ion-java-path-extractor will call the callback method directly and return the IonStruct without stepping into it (because it was no need to spend extra time to step-in the struct) when ion-java-path-extractor matches the desired search path. As a result, cases of filed names within a struct are never changed (e.g. they are still uppercase).

Test:

Passed unit tests/integration tests. In addition, it worked correctly for #51 example in the EMR environment.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@cheqianh cheqianh requested a review from jobarr-amzn October 15, 2021 00:40
Copy link
Contributor

@jobarr-amzn jobarr-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! It's too bad that it's a pain to upgrade to Kotlin 1.5 right now, we're missing those assertions.

I also wonder whether it would be possible to somehow use Kotlin in our implementation and not just in test - it would make forwarding/decorator classes so much easier to implement.

@jobarr-amzn jobarr-amzn merged commit c37e299 into amazon-ion:master Oct 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants