Regression: DataFrame::schema
returns incorrect schema for NATURAL JOIN
#14058
Labels
bug
Something isn't working
help wanted
Extra attention is needed
regression
Something that used to work no longer does
Describe the bug
Affected Version: 42.x, 43.x, 44.x (regression since 41.x)
The
DataFrame::schema
(=>LogicalPlan::schema
) method returns a schema that includes all columns from the joined sources (usingNATURAL JOIN
), including columns not present in the final output. This behavior is incorrect and inconsistent with the documented behavior:To Reproduce
Simple MRE here:
Deps:
Expected behavior
The schema returned by
DataFrame::schema
should match the structure of the output produced bycollect
/collect_partitioned
and etc. Specifically:Or, if it was intended - the documentation should be aligned and be clear how to access the schema.
However, I find previous behavior correct and useful (e.g - get schema before methods like
write_parquet
/csv
/json
)Additional context
This is a regression, as the method previously worked correctly in version 41.x.x and earlier.
Also, it probably points to the missing test coverage for particular code-paths. In a sense it's not enough to compare SQL execution results
The text was updated successfully, but these errors were encountered: