Skip to content

Commit

Permalink
fix(server/pypika): count null values as well in count distinct aggre…
Browse files Browse the repository at this point in the history
…gation

Signed-off-by: Luka Peschke <[email protected]>
  • Loading branch information
lukapeschke committed Jan 9, 2025
1 parent d6434a7 commit 7a60f2c
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 1 deletion.
4 changes: 4 additions & 0 deletions server/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unreleased

### Fixed

- Pandas & Pypika: the `count` aggregation of the aggregate step now properly counts nulls

## [0.48.6] - 2024-12-11

### Fixed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -565,7 +565,11 @@ def _build_window_subquery() -> Any:
for agg_column_name, new_column_name in zip(aggregation.columns, aggregation.new_columns, strict=True):
if new_column_name not in agg_col_names:
column_field = prev_step_table[agg_column_name]
new_agg_col = agg_fn(column_field).as_(new_column_name)
# Count("column") ignores NULL values, whereas COUNT(*) takes them into account
if agg_fn is functions.Count:
new_agg_col = agg_fn("*").as_(new_column_name)
else:
new_agg_col = agg_fn(column_field).as_(new_column_name)
agg_selected.append(new_agg_col)
agg_col_names.append(new_column_name)

Expand Down
41 changes: 41 additions & 0 deletions server/tests/backends/fixtures/aggregate/count_nulls_pypika.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
exclude:
- mongo
- pandas
- snowflake
step:
pipeline:
- aggregations:
- aggfunction: count
columns:
- nullable_name
newcolumns:
- nullable_name_count
keepOriginalGranularity: false
name: aggregate
'on':
- nullable_name
expected:
data:
- nullable_name: Ardwen Blonde
nullable_name_count: 1
- nullable_name: Bellfield Lawless Village IPA
nullable_name_count: 1
- nullable_name: Brewdog Nanny State Alcoholvrij
nullable_name_count: 1
- nullable_name: Brugse Zot blonde
nullable_name_count: 1
- nullable_name: Ninkasi Ploploplop
nullable_name_count: 1
- nullable_name: Pauwel Kwak
nullable_name_count: 1
- nullable_name: Weihenstephan Hefe Weizen Alcoholarm
nullable_name_count: 1
- nullable_name: null
nullable_name_count: 3
schema:
fields:
- name: nullable_name
type: string
- name: nullable_name_count
type: integer
pandas_version: 1.4.0

0 comments on commit 7a60f2c

Please sign in to comment.