Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: add pg_trgm to project search #1309

Merged
merged 13 commits into from
Mar 6, 2024

Conversation

marthendalnunes
Copy link
Contributor

@marthendalnunes marthendalnunes commented Feb 12, 2024

This PR updates the project search query to use Postgres' pg_trm extension using SIMILARITY instead of ILIKE to improve the search quality.

@jainkrati
Copy link
Collaborator

@aminlatifi pls review

Copy link
Member

@aminlatifi aminlatifi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sembrestels for the PR
Some tests are failing, please take a look and fix them if they are not in conflict with the new expected output.
Also, please add unit tests to prove the correctness of the new query.

@@ -333,21 +333,38 @@ export class ProjectResolver {
searchTerm?: string,
) {
if (!searchTerm) return query;
const SIMILARITY_THRESHOLD = 0.4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While having a default value for threshold, please allow overriding it by an entry in configuration.

})
.orWhere('project.impactLocation ILIKE :searchTerm', {
qb.where(
'SIMILARITY(project.title, :searchTerm) > :similarityThreshold',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do efficient searching, it's necessary to index appropriately index target fields. Hopefully, these target fields are not updated frequently, and indexing them won't put a substantial load on the DB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this explanation:

  • If you're looking for approximate matches between strings and don't necessarily care about word boundaries, similarity might be sufficient.
  • If you want to consider individual words within the strings and prefer a more granular similarity calculation, word_similarity could be a better option.
  • If you want to ensure that matches occur within complete word boundaries and avoid partial word matches, strict_word_similarity might be more appropriate.

It seems work_similarity function would be a better fit for our use case, @sembrestels wdyt?

@aminlatifi
Copy link
Member

The PR has a conflict with the staging branch, please resolve it.
Also, one of tests didn't pass on my machine.

@@ -331,23 +331,40 @@ export class ProjectResolver {
static addSearchQuery(
query: SelectQueryBuilder<Project>,
searchTerm?: string,
similarityThreshold = 0.4,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still hardcoded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like this:

Number(config.get('NUMBER_OF_UPDATE_RECURRING_DONATION_CONCURRENT_JOB')) || 1;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to pass it as a function param, so if you want to get it from config as above, it would be better to resolve it once globally somewhere at the top

@@ -331,23 +331,40 @@ export class ProjectResolver {
static addSearchQuery(
query: SelectQueryBuilder<Project>,
searchTerm?: string,
similarityThreshold = 0.4,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like this:

Number(config.get('NUMBER_OF_UPDATE_RECURRING_DONATION_CONCURRENT_JOB')) || 1;

@aminlatifi aminlatifi merged commit a566f2c into staging Mar 6, 2024
3 checks passed
@aminlatifi aminlatifi deleted the feat/project-search-improvement branch March 6, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants