Queries to match :GeneSymbol - :Fragment

name of fulltext index: fragmentGeneSymbol

create index with custom analyzer on :Fragment nodes

custom Lucene analyzer from Stefan Armbruster: https://github.com/covidgraph/neo4j-additional-analyzers

CALL db.index.fulltext.createNodeIndex("fragmentGeneSymbol", ["Fragment"], ["text"], {analyzer: "synonym"});

skip some GeneSymbols (with whitespace, slash or star)

skip gene symbols with special characters in search
set an additional label to filter them

MATCH (gs:GeneSymbol)
WHERE gs.sid contains('(')
OR gs.sid contains(')')
OR gs.sid contains('/')
OR gs.sid contains('*')
OR gs.sid contains(' ')
OR gs.sid contains('[')
OR gs.sid contains(']')
OR gs.sid contains(':')
SET gs:OmitSpecialChar
RETURN count(distinct gs)

skip gene symbols of length 1

MATCH (gs:GeneSymbol)
WHERE size(gs.sid) = 1
SET gs:OmitLength

skip gene symbols that are english words

match gene symbols against word list to exclude symbols that are common words
set an additional label to filter them

MATCH (gs:GeneSymbol), (w:Word)
WHERE toLower(gs.sid) = toLower(w.value)
AND w.match11 = True
SET gs:OmitWord

run the text match

match gene symbols against :Fragment fulltext index
use MERGE to be able to rerun the query

CALL apoc.periodic.iterate(
    "MATCH (gs:GeneSymbol) WHERE NOT gs:OmitWord AND NOT gs:OmitSpecialChar AND NOT gs:OmitLength RETURN gs",
    "CALL db.index.fulltext.queryNodes('fragmentGeneSymbol', gs.sid) YIELD node, score
    MERGE (gs)<-[r:MENTIONS]-(node) SET r.score = score",
    {batchSize: 10, parallel: false, iterateList: true});

count number of gene symbols with MENTIONS relationship

MATCH (gs:GeneSymbol)<-[r:MENTIONS]-(:Fragment)
RETURN count(r)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Queries to match :GeneSymbol - :Fragment

create index with custom analyzer on :Fragment nodes

skip some GeneSymbols (with whitespace, slash or star)

skip gene symbols of length 1

skip gene symbols that are english words

run the text match

count number of gene symbols with MENTIONS relationship

About

Releases

Packages

Languages

covidgraph/graph-processing_text_gene_match

Folders and files

Latest commit

History

Repository files navigation

Queries to match :GeneSymbol - :Fragment

create index with custom analyzer on :Fragment nodes

skip some GeneSymbols (with whitespace, slash or star)

skip gene symbols of length 1

skip gene symbols that are english words

run the text match

count number of gene symbols with MENTIONS relationship

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages