Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse string ordering and string filtering by UTF8 instead of bytes #6143

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

casab
Copy link

@casab casab commented Feb 9, 2023

Clickhouse defaults to using bytes to order by and string manipulation functions such as lower, upper uses ascii. To overcome this limitation they have COLLATE keyword, and lowerUTF8, upperUTF8 functions.

Check List

  • Tests has been run in packages where changes made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Description of Changes Made (if issue reference is not provided)

  • Replaced CONCAT SQL function with js template literal to prevent unnecessary DB function call
  • Used lowerUTF8 instead of lower to support utf8 compatible search
  • Added COLLATE ‘en’ when ordering by strings to order incasesensitive. Clickhouse orders by bytes on default.

@casab casab requested a review from a team as a code owner February 9, 2023 14:32
@github-actions github-actions bot added the pr:community Contribution from Cube.js community members. label Feb 9, 2023
@paveltiunov
Copy link
Member

Hey @casab ! Thanks for contributing! Could you please add an integration test for it?

@paveltiunov paveltiunov self-assigned this Mar 4, 2023
Copy link

vercel bot commented Jun 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

8 Skipped Deployments
Name Status Preview Comments Updated (UTC)
examples-angular-dashboard ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-d3 ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-dashboard ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-data-table ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-highcharts ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-material-ui ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-pivot-table ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-vue-query-builder ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm

@casab casab force-pushed the feature/clickhouse_utf8_filter_order branch from 645512c to 63b935c Compare June 7, 2024 14:56
Copy link

vercel bot commented Jun 7, 2024

@casab is attempting to deploy a commit to the Cube Dev Team on Vercel.

A member of the Team first needs to authorize it.

@igorlukanin
Copy link
Member

@casab May I kindly ask you to test with the latest release and rebase your changes on top of it? We have migrated to a new ClickHouse client library recently. Thanks in advance!

@casab
Copy link
Author

casab commented Nov 22, 2024

@igorlukanin Can you approve running the workflows?

@KSDaemon
Copy link
Member

KSDaemon commented Jan 17, 2025

Hi @casab Could you please sync with the latest master, resolve conflicts and fix warnings/errors? Ping me whenever you need to approve running workflows!
Btw, for a quick linter fixes you can run yarn run lint:fix

@casab
Copy link
Author

casab commented Jan 24, 2025

@KSDaemon of course, done it.

@KSDaemon
Copy link
Member

KSDaemon commented Jan 24, 2025

@casab Hey! Thnx for the updates! Unfortunately, some lint errors still exists:

  /home/runner/work/cube/cube/packages/cubejs-schema-compiler/src/adapter/ClickHouseQuery.ts
    178:1   error    Expected indentation of 6 spaces but found 8  indent
    178:16  warning  Strings must use singlequote                  quotes
    182:1   error    Expected indentation of 6 spaces but found 8  indent
    182:26  error    Unexpected string concatenation               prefer-template
    182:58  warning  Strings must use singlequote                  quotes
    183:1   error    Expected indentation of 6 spaces but found 8  indent
    184:1   error    Expected indentation of 6 spaces but found 8  indent
    184:16  warning  Strings must use singlequote                  quotes
    188:1   error    Expected indentation of 6 spaces but found 8  indent
    188:16  warning  Strings must use singlequote                  quotes
    192:1   error    Expected indentation of 2 spaces but found 0  indent

Maybe thats because of merge... But anyway...

Can you fix them, please?

casab added 2 commits February 6, 2025 04:21
…add data_type to AliasedColumn and TemplateColumn to be able to use the data type in the templates
@casab casab requested a review from a team as a code owner February 6, 2025 01:24
@casab casab requested a review from mcheshkov February 6, 2025 06:45
Copy link

codecov bot commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Project coverage is 80.73%. Comparing base (1203291) to head (79373d6).

Files with missing lines Patch % Lines
...t/cubesql/cubesql/src/compile/engine/df/wrapper.rs 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6143      +/-   ##
==========================================
- Coverage   83.57%   80.73%   -2.84%     
==========================================
  Files         227      227              
  Lines       81618    81646      +28     
==========================================
- Hits        68210    65916    -2294     
- Misses      13408    15730    +2322     
Flag Coverage Δ
cubesql 80.73% <96.55%> (-2.84%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@casab casab requested a review from KSDaemon February 11, 2025 18:58
@casab
Copy link
Author

casab commented Feb 11, 2025

@KSDaemon @mcheshkov Hey, I have fixed all the errors. And implemented all the required parts. I would appreciate it if you can review it. Especially the rust part.

@@ -123,7 +125,7 @@ export class ClickHouseQuery extends BaseQuery {
.join(' AND ');
}

public getFieldAlias(id) {
public getField(id) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make it public? Maybe private would be better?

@@ -168,6 +188,43 @@ export class ClickHouseQuery extends BaseQuery {
return `${fieldAlias} ${direction}`;
}

public getCollation() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the same question here: should it be public?

Comment on lines +203 to +219
if (R.isEmpty(this.order)) {
return '';
}

const collation = this.getCollation();

const orderByString = R.pipe(
R.map((order) => {
let orderString = this.orderHashToString(order);
if (collation && this.getFieldType(order) === 'string') {
orderString = `${orderString} COLLATE '${collation}'`;
}
return orderString;
}),
R.reject(R.isNil),
R.join(', ')
)(this.order);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mind rewriting this in plain JS instead of using Ramda?
Smth like this:

Suggested change
if (R.isEmpty(this.order)) {
return '';
}
const collation = this.getCollation();
const orderByString = R.pipe(
R.map((order) => {
let orderString = this.orderHashToString(order);
if (collation && this.getFieldType(order) === 'string') {
orderString = `${orderString} COLLATE '${collation}'`;
}
return orderString;
}),
R.reject(R.isNil),
R.join(', ')
)(this.order);
if (this.order.length === 0) {
return '';
}
const collation = this.getCollation();
const orderByString = this.order
.map((order) => {
let orderString = this.orderHashToString(order);
if (collation && this.getFieldType(order) === 'string') {
orderString = `${orderString} COLLATE '${collation}'`;
}
return orderString;
})
.filter(Boolean) // Analogue `R.reject(R.isNil)`
.join(', ');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:community Contribution from Cube.js community members.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants