Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement datediff and dateadd in c to improve performance #2035

Conversation

Jakeowen1
Copy link
Contributor

Description

This change reimplements the datediff, datediff_big, and dateadd functions in C to improve performance by 65% compared to the original implementation.

Issues Resolved

Task: Babel-4496

Test Scenarios Covered

  • Use case based -
1> select datediff(hour, cast("2023-01-01 01:01:20.99" as datetime), cast("2024-01-01 01:01:20.99" as datetime))
2> go
datediff   
-----------
       8760

(1 rows affected)
  • Boundary conditions -
1> select datediff(week, cast("2023-01-01 01:01:20.99" as datetime), cast("2023-01-05 01:01:20.99" as datetime))
2> go
datediff   
-----------
          1

(1 rows affected)
  • Arbitrary inputs -
1> select dateadd(dayofyear, 5, cast("text" as datetimeoffset));
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
invalid input syntax for type timestamp with time zone: "text"
1> select dateadd(dayofyear, 5, null)
2> go
dateadd                
-----------------------
                   NULL

(1 rows affected)
1> select datediff(y, NULL, cast("1900-01-02" as datetime))
2> go
datediff   
-----------
       NULL

(1 rows affected)
  • Negative test cases -
1> select datediff(nanosecond, cast("2023-02-15" as datetime), cast("1950-02-20" as datetime))
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart
1> 
1> select dateadd(day, 2, cast("01:01:21" as time));
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
the datepart "day" is not supported by function dateadd for datatype time
  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests -

Join on two tables with 60,000 rows each

Query 1:

select d.D
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
GO

Performance with commit:
~~START~~
text
Query Text: select d.D
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
Hash Join  (cost=1816.75..2978.79 rows=332 width=8) (actual time=91.801..677.766 rows=60001 loops=1)
  Hash Cond: (d.id = md.id)
  Join Filter: (datediff('day'::text COLLATE "default", d.d, md.d) = 0)
  ->  Seq Scan on dates d  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.046..56.757 rows=60001 loops=1)
  ->  Hash  (cost=988.00..988.00 rows=66300 width=12) (actual time=89.527..89.529 rows=60001 loops=1)
        Buckets: 131072  Batches: 1  Memory Usage: 3837kB
        ->  Seq Scan on more_dates md  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.040..43.681 rows=60001 loops=1)
Planning Time: 0.020 ms
Execution Time: 747.474 ms
~~END~~

Performance without commit:
~~START~~
text
Query Text: select d.D
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
Hash Join  (cost=1816.75..2978.79 rows=332 width=8) (actual time=94.734..2920.237 rows=60001 loops=1)
  Hash Cond: (d.id = md.id)
  Join Filter: (datediff('day'::text COLLATE "default", d.d, md.d) = 0)
  ->  Seq Scan on dates d  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.049..59.248 rows=60001 loops=1)
  ->  Hash  (cost=988.00..988.00 rows=66300 width=12) (actual time=89.456..89.458 rows=60001 loops=1)
        Buckets: 131072  Batches: 1  Memory Usage: 3837kB
        ->  Seq Scan on more_dates md  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.042..43.743 rows=60001 loops=1)
Planning Time: 0.014 ms
Execution Time: 3002.463 ms
~~END~~


Query 2: 
select dateadd(day, 1, d.D)
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
GO

Performance with commit: 
~~START~~
text
Query Text: select dateadd(day, 1, d.D)
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
Hash Join  (cost=1816.75..3061.79 rows=332 width=8) (actual time=80.382..1347.963 rows=60001 loops=1)
  Hash Cond: (d.id = md.id)
  Join Filter: (datediff('day'::text COLLATE "default", d.d, md.d) = 0)
  ->  Seq Scan on dates d  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.015..42.137 rows=60001 loops=1)
  ->  Hash  (cost=988.00..988.00 rows=66300 width=12) (actual time=77.801..77.803 rows=60001 loops=1)
        Buckets: 131072  Batches: 1  Memory Usage: 3837kB
        ->  Seq Scan on more_dates md  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.011..33.449 rows=60001 loops=1)
Planning Time: 0.013 ms
Execution Time: 1411.302 ms
~~END~~

Performance without commit
~~START~~
text
Query Text: select dateadd(day, 1, d.D)
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
Hash Join  (cost=1816.75..3061.79 rows=332 width=8) (actual time=81.898..4002.259 rows=60001 loops=1)
  Hash Cond: (d.id = md.id)
  Join Filter: (datediff('day'::text COLLATE "default", d.d, md.d) = 0)
  ->  Seq Scan on dates d  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.016..48.974 rows=60001 loops=1)
  ->  Hash  (cost=988.00..988.00 rows=66300 width=12) (actual time=78.335..78.337 rows=60001 loops=1)
        Buckets: 131072  Batches: 1  Memory Usage: 3837kB
        ->  Seq Scan on more_dates md  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.010..33.678 rows=60001 loops=1)
Planning Time: 0.071 ms
Execution Time: 4084.733 ms
~~END~~

  • Tooling impact -
    None

  • Client tests -
    None

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…sh-for-postgresql#1998)

This change reimplements the datediff, datediff_big, and dateadd functions in C to improve performance by 65% compared to the original implementation.

Task: BABEL-4496
Signed-off-by: Jake Owen <[email protected]>
@Jakeowen1 Jakeowen1 changed the title Reimplement datediff and dateadd in c to improve performance (#1998) Reimplement datediff and dateadd in c to improve performance Nov 17, 2023
@forestkeeper forestkeeper merged commit 67d1dab into babelfish-for-postgresql:BABEL_3_4_STABLE Nov 18, 2023
28 checks passed
@@ -179,4 +179,7 @@ XX000 ERRCODE_INTERNAL_ERROR "The table-valued parameter \"%s\" must be declared
0A000 ERRCODE_FEATURE_NOT_SUPPORTED "Column name or number of supplied values does not match table definition." SQL_ERROR_213 16
42501 ERRCODE_INSUFFICIENT_PRIVILEGE "Only members of the sysadmin role can execute this stored procedure." SQL_ERROR_15003 16
42809 ERRCODE_WRONG_OBJECT_TYPE "The target \"%s\" of the OUTPUT INTO clause cannot be a view or common table expression." SQL_ERROR_330 16

22008 ERRCODE_DATETIME_VALUE_OUT_OF_RANGE "The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart" SQL_ERROR_535 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we validated the transactional/control behavior for new error mappings? we need to make sure we are compatible with TSQL for these. There are automated scripts to validate an error's behavior. Please talk to Dipesh for more details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zhibai and I validated the transactional/control behavior for the new error mappings and it is compatible with TSQL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , I have asked in purpose and it has test cases regarding the transaction behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants