Skip to content

Commit

Permalink
make the use of text index
Browse files Browse the repository at this point in the history
  • Loading branch information
yindaheng98 committed May 16, 2024
1 parent 6b14ba2 commit c2fc27e
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 2 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ python -m dblp_crawler -k video -k edge -p 27d5dc70280c8628f181a7f8881912025f808
Without index, NEO4J query will be very very slow. So before you start, you should add some index:

```cql
CREATE TEXT INDEX publication_title_hash_text_index FOR (p:Publication) ON (p.title_hash);
CREATE INDEX publication_title_hash_index FOR (p:Publication) ON (p.title_hash);
CREATE INDEX publication_dblp_key_index FOR (p:Publication) ON (p.dblp_key);
CREATE INDEX publication_paper_id_index FOR (p:Publication) ON (p.paperId);
Expand Down
2 changes: 1 addition & 1 deletion citation_crawler/init/neo4j.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def match_papers_keywords(tx, year, *arg_keywords):
if not k:
continue
ki += 1
k_and.append(f"toLower(p.title) CONTAINS $keyword{ki}")
k_and.append(f"p.title_hash CONTAINS $keyword{ki}")
v_and[f"keyword{ki}"] = k
k_or.append(f"({' and '.join(k_and)})")
v_or = {**v_or, **v_and}
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

setup(
name='citation_crawler',
version='2.10',
version='2.10.1',
author='yindaheng98',
author_email='[email protected]',
url='https://github.com/yindaheng98/citation-crawler',
Expand Down

0 comments on commit c2fc27e

Please sign in to comment.