-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[java] Fix avro logical-types conversions for BQ storage #33422
base: master
Are you sure you want to change the base?
Conversation
@@ -115,34 +120,90 @@ static String convertUUID(Object value) { | |||
} | |||
|
|||
static Long convertTimestamp(Object value, boolean micros) { | |||
if (value instanceof ReadableInstant) { | |||
return ((ReadableInstant) value).getMillis() * (micros ? 1000 : 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was wrong. BQ always expects epoch microseconds. Conversion should be applied on the raw type depending if it represents millis or micros
.setScale(type.getScale(), RoundingMode.DOWN) | ||
.round(new MathContext(type.getPrecision(), RoundingMode.DOWN)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this seems legit to round ? We might also fail if the BigDecimal
precision and scale do not match with expected logical type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand previously it only accepts a ByteBuffer, now it's adding support of java.math.BigDecimal? If previously it would fail or it's not lose precision compared to the existing behavior I think it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other conversions accept the logical-type as well as the underlying-type.
This is mandatory to support both: even if the logical-type is present on the schema, it can be discarded when the GenericData
used for serialization does not have the feature enabled.
Concerning the rounding, The doc states to use BigDecimalByteStringEncoder. BeamRowToStorageApiProto
is a copy of that.
It's however not supporting parameterized NUMERIC
and BIGNUMERIC
types.
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
assign set of reviewers |
Assigning reviewers. If you would like to opt out of this review, comment R: @kennknowles for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Reminder, please take a look at this pr: @kennknowles @chamikaramj |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
|
Reminder, please take a look at this pr: @Abacn @johnjcasey |
Most of the avro logical-type to BQ are broken. Add support for both joda and java time to ensure compatibility with older avro versions
30d0a2a
to
f8d575b
Compare
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @damondouglas for label java. Available commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! Had a few questions.
.put(LogicalTypes.timestampMicros().getName(), TableFieldSchema.Type.TIMESTAMP) | ||
.put(LogicalTypes.timestampMillis().getName(), TableFieldSchema.Type.TIMESTAMP) | ||
.put(LogicalTypes.uuid().getName(), TableFieldSchema.Type.STRING) | ||
.put("date", TableFieldSchema.Type.DATE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason to change them to hard coded names here? I understand they should be equivalent? Or keep the getName() while note the resolved name as comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are equivalent. I changed because type name for decimal
is not accessible and requires creation of a 'fake' logical-type.
LogicalTypes.decimal(1).getName()
I can revert to the old style if that's prefered.
@@ -115,34 +120,97 @@ static String convertUUID(Object value) { | |||
} | |||
|
|||
static Long convertTimestamp(Object value, boolean micros) { | |||
if (value instanceof ReadableInstant) { | |||
return ((ReadableInstant) value).getMillis() * (micros ? 1000 : 1); | |||
if (value instanceof org.joda.time.ReadableInstant) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may worth add a comment noting the following helper functions support both joda time and java.time, and refer to #19215 as future clean up
.setScale(type.getScale(), RoundingMode.DOWN) | ||
.round(new MathContext(type.getPrecision(), RoundingMode.DOWN)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand previously it only accepts a ByteBuffer, now it's adding support of java.math.BigDecimal? If previously it would fail or it's not lose precision compared to the existing behavior I think it's fine.
...rm/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AvroGenericRecordToStorageApiProto.java
Show resolved
Hide resolved
byteBuffer.duplicate(), | ||
Schema.create(Schema.Type.NULL), // dummy schema, not used | ||
logicalType); | ||
return BeamRowToStorageApiProto.serializeBigDecimalToNumeric(bigDecimal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is considering the target column to be NUMERIC
(which did not even matched with the inferred BIGNUMERIC
type).
Avro decimal
is parametrized, so are the NUMERIC
and BIGNUMERIC
. The proto bytes sent for this column must respect those parameters.
Most of the avro logical-type to BQ are broken.
Add support for both joda and java time to ensure compatibility with older avro versions