-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6830] AVG() returns double precision by default when its argument type is INT_TYPES #4193
base: main
Are you sure you want to change the base?
Conversation
DOUBLE is almost never the right type in SQL, since computations on double values are in general non-deterministic. If any type is right, a variant of DECIMAL should be used for AVG. Can you propose an algorithm to choose the precision and scale for the result? Ideally Calcite would follow what other databases do - at least databases that have proper DECIMAL types; the Postgres DECIMAL type is not a standard one, for example. For functions like STDDEV it's a bit more complicated, since they involve a square root at the last step. Moreover, this change is backwards-incompatible, and it may cause problems for all projects that have relied on this behavior. At the very least this change should be documented in history.md as such. |
The
Not sure what precision algorithm would be the best, but definitely not return integer for |
|
…ument type is INT_TYPES
dc5a42b
to
e481c75
Compare
In general, the result of double addition aggregation depends on the order in which the numbers are processed, whereas the addition of decimals gives always the same result. It's true that this can only happen for very large integers, since double has 53 bits of mantissa. So any integer value which requires less than 53 bits will be represented exactly, and if all intermediate addition results are less than 53 bits the result will be deterministic and exact; the only imprecision will be produced by the final division. The result of division in FP and DECIMAL can also be slightly different. In general for financial-type computations you should always use DECIMAL. Your formula depends on the number of input values, which is not known when you compile the SQL program. Calcite is statically-typed, so it has to choose the type at compile-time. |
@@ -1619,9 +1619,9 @@ LogicalAggregate(group=[{0}], EXPR$1=[STDDEV_POP($1)], EXPR$2=[AVG($1)], EXPR$3= | |||
</Resource> | |||
<Resource name="planAfter"> | |||
<![CDATA[ | |||
LogicalProject(NAME=[$0], EXPR$1=[CAST(POWER(/(-($1, /(*($2, $2), $3)), $3), 0.5:DECIMAL(2, 1))):INTEGER NOT NULL], EXPR$2=[CAST(/($2, $3)):INTEGER NOT NULL], EXPR$3=[CAST(POWER(/(-($1, /(*($2, $2), $3)), CASE(=($3, 1), null:BIGINT, -($3, 1))), 0.5:DECIMAL(2, 1))):INTEGER], EXPR$4=[CAST(/(-($1, /(*($2, $2), $3)), $3)):INTEGER NOT NULL], EXPR$5=[CAST(/(-($1, /(*($2, $2), $3)), CASE(=($3, 1), null:BIGINT, -($3, 1)))):INTEGER NOT NULL]) | |||
LogicalProject(NAME=[$0], EXPR$1=[POWER(/(-($1, /(*(CAST($2):DOUBLE NOT NULL, CAST($2):DOUBLE NOT NULL), $3)), $3), 0.5:DECIMAL(2, 1))], EXPR$2=[/(CAST($2):DOUBLE NOT NULL, $3)], EXPR$3=[POWER(/(-($1, /(*(CAST($2):DOUBLE NOT NULL, CAST($2):DOUBLE NOT NULL), $3)), CASE(=($3, 1), null:BIGINT, -($3, 1))), 0.5:DECIMAL(2, 1))], EXPR$4=[/(-($1, /(*(CAST($2):DOUBLE NOT NULL, CAST($2):DOUBLE NOT NULL), $3)), $3)], EXPR$5=[CAST(/(-($1, /(*(CAST($2):DOUBLE NOT NULL, CAST($2):DOUBLE NOT NULL), $3)), CASE(=($3, 1), null:BIGINT, -($3, 1)))):DOUBLE NOT NULL]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you see, the type inferred for the result will cascade through the query plan, influencing the type all values that depend on the aggregation result.
@@ -363,7 +363,12 @@ && getDefaultPrecision(typeName) != RelDataType.PRECISION_NOT_SPECIFIED) { | |||
|
|||
@Override public RelDataType deriveAvgAggType(RelDataTypeFactory typeFactory, | |||
RelDataType argumentType) { | |||
return argumentType; | |||
if (SqlTypeName.INT_TYPES.contains(argumentType.getSqlTypeName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type system is actually pluggable. See feldera/feldera#3588 for the right way to do this. We should not change RelDataTypeSystemImpl.
To reproduce (agg.iq)
Should return 22.1429, but returns 22.
The current default logic of Calcite setting the return type to be same as the input column. For example, if the input column is INT, the return type for AVG() is INT.
This behavior is different from other databases. For example:
Postgres
MySQL
Presto
Besides, since the
AVG
agg with integer argument returns integer type, the behaviors ofvar_pop
,var_samp
,stddev_pop
,stddev_samp
andstddev
are same asavg
.This PR targets to change the default return type of
avg
to double precision when its argument type areTINYINT
,SMALLINT
,INTEGER
orBIGINT
.