Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent conversion of time dimensions to DATETIME bigquery type #9245

Open
grzaks opened this issue Feb 19, 2025 · 1 comment
Open

Inconsistent conversion of time dimensions to DATETIME bigquery type #9245

grzaks opened this issue Feb 19, 2025 · 1 comment
Assignees

Comments

@grzaks
Copy link

grzaks commented Feb 19, 2025

Describe the bug
Dimensions of type time are automatically casted to DATETIME(sql_expression, 'UTC') when used as dimension, but not cased when it is used in the measure definition.

This causes a lot of type inconsistency in complicated schemas and is hard to maintain. The same time dimension based on one TIMESTAMP column is once interpreted as TIMESTAMP and in other situations as DATETIME.

To Reproduce
Test schema:

cube('bugcase01', {
    public: true,
    sql: `WITH test_data AS (
  SELECT TIMESTAMP '2025-01-01' as timeCol, 45.67 as value UNION ALL
  SELECT TIMESTAMP '2025-01-02', 78.90 UNION ALL
  SELECT TIMESTAMP '2025-01-03', 23.45 UNION ALL
  SELECT TIMESTAMP '2025-01-04', 89.12 UNION ALL
  SELECT TIMESTAMP '2025-01-05', 34.56 UNION ALL
  SELECT TIMESTAMP '2025-01-06', 67.89 UNION ALL
  SELECT TIMESTAMP '2025-01-07', 23.55
) SELECT * FROM test_data
`,
    dimensions: {
        time_dim: {
            sql: `${CUBE}.timeCol`,
            type: 'time'
        },
    },
    measures: {
        test_measure: {
            type: 'max',
            sql: `(DATETIME_DIFF(CURRENT_DATETIME(), ${time_dim}, SECOND) / 60.0)`
        }
    }
})

Test query:

SELECT
  DATE_TRUNC('day', bugcase01.time_dim),
  MEASURE(bugcase01.test_measure)
FROM
  bugcase01
GROUP BY
  1
LIMIT
  10000;

The above query is translated to the following BigQuery query ...

SELECT
  DATETIME_TRUNC(DATETIME(`bugcase01`.timeCol, 'UTC'), DAY) `bugcase01__time_dim_day`,
  max(
    (
      DATETIME_DIFF(CURRENT_DATETIME(), `bugcase01`.timeCol, SECOND) / 60.0
    )
  ) `bugcase01__test_measure`
FROM
  (
    WITH test_data AS (
      SELECT TIMESTAMP '2025-01-01' as timeCol, 45.67 as value UNION ALL
      SELECT TIMESTAMP '2025-01-02', 78.90 UNION ALL
      SELECT TIMESTAMP '2025-01-03', 23.45 UNION ALL
      SELECT TIMESTAMP '2025-01-04', 89.12 UNION ALL
      SELECT TIMESTAMP '2025-01-05', 34.56 UNION ALL
      SELECT TIMESTAMP '2025-01-06', 67.89 UNION ALL
      SELECT TIMESTAMP '2025-01-07', 23.55
    )
    SELECT * FROM test_data
  ) AS `bugcase01`
WHERE
  (
    `bugcase01`.timeCol >= TIMESTAMP(?)
    AND `bugcase01`.timeCol <= TIMESTAMP(?)
  )
GROUP BY
  1
ORDER BY
  1 ASC
LIMIT
  10000

... which causes the following error, because DATETIME_DIFF expects either two TIMESTAMP or two DATETIME arguments and got one DATETIME and one TIMESTAMP. Please notice that bugcase01.timeCol is not casted to DATETIME(..., 'UTC') for bugcase01__test_measure but is casted for bugcase01__time_dim_day

Please also notice that there WHERE ... conditions built by cube are dealing with TIMESTAMP(?) query parameters and timeCol is not casted here too. I cannot cast it myself to anything other than TIMESTAMP in my schema definition because WHERE time-based conditions would stop working then.

BigQuery error:

No matching signature for function DATETIME_DIFF Argument types: DATETIME, TIMESTAMP, DATE_TIME_PART Signature: DATETIME_DIFF(DATETIME, DATETIME, DATE_TIME_PART) Argument 2: Unable to coerce type TIMESTAMP to expected type DATETIME Signature: DATETIME_DIFF(TIMESTAMP, TIMESTAMP, DATE_TIME_PART) Argument 1: Unable to coerce type DATETIME to expected type TIMESTAMP at [2:97]

Expected behavior
The BigQuery compiled query has timeCol casted everywhere consitently to the same type (either DATETIME or TIMESTAMP).

Version:
1.2.4

Additional context
We upgraded recently from 1.1.2 and I believe it did not behave that way.

@grzaks
Copy link
Author

grzaks commented Feb 23, 2025

I spent some more time evaluating this issue and it looks like this PR merged into v1.0.6 changed this behaviour of cube (breaking change?) eab14fe

I confirmed that by downgrading my staging environment to v1.0.5 which made my complex schema working again. Upgrading to v1.0.6 broke it, but explicitly setting CUBESQL_SQL_PUSH_DOWN env variable to false made it working again. I've upgraded staging again to v1.2.9 with CUBESQL_SQL_PUSH_DOWN=false and it works as before.

For some reason my PoC schema and query in original post above work on all versions and regardless of query push down env variable. I'll spend some more time and prepare the simplest possible PoC which would work on 1.0.5 but not on 1.0.6.

I'm not into cube source code that much, but it looks like the convertTzForRawTimeDimension function should be called somewhere and is not?

@igorlukanin igorlukanin self-assigned this Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants