-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release DataFusion 45.0.0
#14008
Comments
@andygrove would you like to coordinate this release or would you like me to? (or does anyone else want to do so?) |
I also added some issues to the description above that I think would be worth fixing |
I don't have a preference. I will traveling around this time though, so perhaps it would make sense for someone else to be release manager for this one. |
I am happy to do it again for 45 if no one else would like the opportunity (see what I did there 😆 ) |
Thanks, alamb, I booked 46 in advance! |
Awesome -- I filed #14123 to track 46 |
I plan to start assembing the release candidate and test on the week of Jan 27 (in about 2 weeks time() |
As promised, Sail is working on porting relevant tests into DataFusion. A good starting point is a regression our tests caught in DataFusion 43, which still seems to persist in DataFusion 44. A regression was introduced in DataFusion 43.0.0 related to casting to UTF8 in various places. Upgrading to DataFusion 43.0.0 required adding explicit casting in several areas as a workaround. This PR (lakehq/sail#355) comments out those changes to expose the regression through the 12 additional failed tests compared to the main branch. Once I’ve pinpointed the root cause(s) of the regression, I’ll create an issue in DataFusion to track the work. I want to ensure the issue accurately reflects the problem before filing it. I’m happy to address these regressions and port over the tests that cover them in the same PR. Hopefully, we can get this resolved in time for the DataFusion 45 release! |
Thank you very much @shehabgamin 🙏 I strongly suspect this is related to switching to Utf8View by default in Parquet; You can validate this theory by disabling the following config setting: https://datafusion.apache.org/user-guide/configs.html
I think we are pretty close to closing out the Utf8View epic (now that we have upgraded to the latest arrow): I'll add that to the list for 45 too |
I plan to start preparing / testing / pushing this release the week of Jan 27, aiming to get an release candidate early the next week |
Thanks for the pointer @alamb! I tried setting I'll take a deeper look into the issue after the weekend. Hope you have a great rest of your weekend! |
Most of the regressions are related to this issue: #14230. I should be able to resolve them well before the While testing my local Sail code with the latest commit on DataFusion's main branch, I encountered several breaking changes that may make DataFusion 45 a jarring upgrade for some users. Given the previous discussion about wanting to make releases less jarring (#13334 (comment)), I wanted to bring this to your attention, @alamb. Aside from that, there is one remaining regression I haven't investigated yet, which seems to be related to Parquet. |
Thanks @shehabgamin -- Can you enumerate these changes (or point me at a PR) so we can see if there is some way to make jarring |
Yeah I'll work on that right now! |
My apologies @alamb, the DataFusion upgrade from the latest main branch commit is smoother than I initially thought. After investigating the flood of errors, I discovered that many were resolved by simply updating Sail's PyO3 DataFusion If you'd like to see these changes, they're in my PR that's testing the regression fixes: lakehq/sail#355 |
To replace |
Some people currently use |
I see. |
I created an issue to track our progress with upgrading Comet to use DataFusion 45 and linked to it from the PR description: #14274 |
It turns out that type coercion for UDF arguments ( IMO this should go on the "must fix" list too. I'll make sure to have the PR ready by the end of the weekend. |
@alamb @jayzhan211 #14268 is ready for review! |
I am starting to do some ticket triage and prepare for the release
Done |
@shehabgamin --makes sense. I made this PR to try and clarify: Thank you again for the testing |
Is your feature request related to a problem or challenge?
Tracking ticket for next release, also a place to track desired inclusions
Last release was https://crates.io/crates/datafusion/440.0 December 31, 2024 so next major release would be around Feb 1, 2025
Steps:
46.0.0
#14123Pre-relese testing
Prior release tickets:
44.0.0
#13334Please let me know if you would like to add any items on this list or move the categorization
Items to fix before release
54.0.0
#14114DataFrame::schema
returns incorrect schema for NATURAL JOIN #14058Invalid comparison operation: Utf8 == Utf8View
error during LEFT ANTI JOIN #13510encode(..., "hex")
errors on non-UTF-8 binaries since Datafusion v43 #14055Items maybe to complete (not sure if they are blockers)
CREATE TABLE AS SELECT
... insertingVALUES
#13124EnforceDistribution
generates invalid plan #14150Nice to Have (but non blockers -- e.g. bugs but not regressions)
UNION
andORDER BY
queries #13748median
by implementing specialGroupsAccumulator
#13681FULL OUTER JOIN
andLIMIT
produces wrong results #14335The text was updated successfully, but these errors were encountered: