Skip to content

Commit

Permalink
Searching with Postgres (#11803)
Browse files Browse the repository at this point in the history
* Convert to record

* Add embedded-postgres dependency

* Setup embedded postgre server

* Indexing entries with Postgres

* Remove Lucene bib fields indexer

* Use PreparedStatement to fix escaping characters

* Rename LuceneManager to IndexManager

* Begin to implement "new" search syntax

Co-authored-by: Loay Ghreeb <[email protected]>
Co-authored-by: Carl Christian Snethlage <[email protected]>

* Some new operators

* Change log level of EmbeddedPostgres

* Handle search flags

* Fix default field

* Fix handling of anyfield (and add "any" as alias)

* Openrewrite...

* More test cases

* Remove non-covered libraries

* Use LIKE syntax as default instead of regex

Disabled exact match

* Update module-info.java

* Create "query" package

* Postgres searcher

* Return back the exact match operator

* checkstyle

* Add link

* WIP

* Fix compilation

* WIP

* Intermediate result

Co-authored-by: Loay Ghreeb <[email protected]>

* Query should be OK

Co-authored-by: Loay Ghreeb <[email protected]>

* Indexing of split values

Co-authored-by: Loay Ghreeb <[email protected]>

* Fix tests compile

* Use first name Last name for authors

* Refactor SQL query visitor

* Adapt tests

* Use join with EXACT_MATCH only

* Update to Postgres 17

* Attempt to use sub-queries with CTEs

* Fix CTEs sub-queries and grouping

TODO: EXCAT_MATCH to search in split table

* group matches by entry_id

* Use NOT IN for negation queries

* Fix unary NOT operator

* Use split values table for EXACT_MATCH queries

* Prepare for linked files index (full-text)

* Prepare linked files tables

* Fix searching

* Use multi column index

* fix merged module issues

* Fix update event

* Remove postgres linked files indexer

* Remove and insert field on update event

* Remove search score column

* Update search groups matches

* Remove search_score from tale preferences

* Migrate search groups flags to new syntax

* Localization

* Fix dialog message

* Ignores groups field from default searches

Fixes #7996

* Use TYPE_HEADER field for entrytype

* Search to Lucene query for linked files searching

* Marge linked files and bib fields results

* Searching in background task

* Fix search to SQL tests

* Localization test

* Fix DatabaseSearcherTest

* Fix DatabaseSearcherWithBibFilesTest

* Fix exportMatches test

* Update src/main/java/org/jabref/model/entry/BibEntry.java

Co-authored-by: Oliver Kopp <[email protected]>

* Add SINGLE_ENTRY_LINK to latex field

* Remove changelog entries

* Remove groups migration from localization

* Extract search terms from query (ignore negated terms)

* Fix architecture test

* Highlight Preview viewer with Postgres regexp_replace

* OpenRewrite

* Remove onRunning

* Set search query listener in the constructor

* Fix preview tab scrolling

* Use prepared statement to fix escaping

* Use prepared statement for sql query

* Store the start and end positions for every field

* WIP highlight source tab

* Fix source tab highlighting

* Return regex, case-sensitive flags back to the search bar

* Use for search bar flags for unfielded terms

* Skip migrations for unfieleded terms

* Return regex, case-sensitive CheckBox to search groups dialog

* Apply suggestions from code review

Co-authored-by: Oliver Kopp <[email protected]>

* Update JabRef_en.properties

* Fix search grammar to support special chars

* Create SearchQueryTest.java

* Adapt SQL visitor with new grammar

* Allow to use quotes without escaping

* escape SQL wildcard chars

* Reorder methods

* Adapt SearchFlagsToExpressionVisitor

* Adapt SearchToLuceneVisitor

* Adapt SearchQueryExtractorVisitor

* Fix tests

* Fix DatabaseSearcherTest

* Fix search terms patten for highlighting

* Highlight source tab field by field according to the search query

* Apply suggestions from code review

* Update src/main/java/org/jabref/gui/importer/actions/SearchGroupsMigrationAction.java

Co-authored-by: Oliver Kopp <[email protected]>

* Set default operator to AND

* remove debug

---------

Co-authored-by: Oliver Kopp <[email protected]>
Co-authored-by: Carl Christian Snethlage <[email protected]>
Co-authored-by: Siedlerchr <[email protected]>
  • Loading branch information
4 people authored Oct 23, 2024
1 parent 3ad5812 commit bd7219f
Show file tree
Hide file tree
Showing 111 changed files with 3,886 additions and 1,602 deletions.
4 changes: 0 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv
### Added

- We added a "view as BibTeX" option before importing an entry from the citation relation tab. [#11826](https://github.com/JabRef/jabref/issues/11826)
- We added probable search hits instead of exact matches. Sorting by hit score can be done by the new score table column. [#11542](https://github.com/JabRef/jabref/pull/11542)
- We added support finding LaTeX-encoded special characters based on plain Unicode and vice versa. [#11542](https://github.com/JabRef/jabref/pull/11542)
- When a search hits a file, the file icon of that entry is changed accordingly. [#11542](https://github.com/JabRef/jabref/pull/11542)
- We added an AI-based chat for entries with linked PDF files. [#11430](https://github.com/JabRef/jabref/pull/11430)
Expand Down Expand Up @@ -42,8 +41,6 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv

### Changed

- The search syntax is changed to [Apache Lucene syntax](https://lucene.apache.org/core/9_11_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Overview) (also to be similar to the [online search syntax](https://docs.jabref.org/collect/import-using-online-bibliographic-database#search-syntax)). [#11542](https://github.com/JabRef/jabref/pull/11542/)
- When searching using a regular expression, one needs to enclose the search string in `/`. [#11542](https://github.com/JabRef/jabref/pull/11542/)
- A search in "any" fields ignores the [groups](https://docs.jabref.org/finding-sorting-and-cleaning-entries/groups). [#7996](https://github.com/JabRef/jabref/issues/7996)
- When a communication error with an [online service](https://docs.jabref.org/collect/import-using-online-bibliographic-database) occurs, JabRef displays the HTTP error. [#11223](https://github.com/JabRef/jabref/issues/11223)
- The Pubmed/Medline Plain importer now imports the PMID field as well [#11488](https://github.com/JabRef/jabref/issues/11488)
Expand Down Expand Up @@ -103,7 +100,6 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv

### Removed

- We removed support for case-sensitive and exact search. [#11542](https://github.com/JabRef/jabref/pull/11542)
- We removed the description of search strings. [#11542](https://github.com/JabRef/jabref/pull/11542)
- We removed support for importing using the SilverPlatterImporter (`Record INSPEC`). [#11576](https://github.com/JabRef/jabref/pull/11576)
- We removed support for automatically generating file links using the CLI (`--automaticallySetFileLinks`).
Expand Down
8 changes: 8 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,9 @@ dependencies {
// Even if "compileOnly" is used, IntelliJ always adds to module-info.java. To avoid issues during committing, we use "implementation" instead of "compileOnly"
implementation 'io.github.adr:e-adr:2.0.0-SNAPSHOT'

implementation 'io.zonky.test:embedded-postgres:2.0.7'
implementation enforcedPlatform('io.zonky.test.postgres:embedded-postgres-binaries-bom:17.0.0')

testImplementation 'io.github.classgraph:classgraph:4.8.177'
testImplementation 'org.junit.jupiter:junit-jupiter:5.11.0'
testImplementation 'org.junit.platform:junit-platform-launcher:1.10.3'
Expand Down Expand Up @@ -793,9 +796,13 @@ jlink {
requires 'org.apache.commons.lang3'
requires 'org.apache.commons.logging'
requires 'org.apache.commons.text'
requires 'org.apache.commons.codec'
requires 'org.apache.commons.io'
requires 'org.apache.commons.compress'
requires 'org.freedesktop.dbus'
requires 'org.jsoup'
requires 'org.slf4j'
requires 'org.tukaani.xz';
uses 'ai.djl.engine.EngineProvider'
uses 'ai.djl.repository.RepositoryFactory'
uses 'ai.djl.repository.zoo.ZooProvider'
Expand All @@ -807,6 +814,7 @@ jlink {
uses 'org.mariadb.jdbc.authentication.AuthenticationPlugin'
uses 'org.mariadb.jdbc.credential.CredentialPlugin'
uses 'org.mariadb.jdbc.tls.TlsSocketPlugin'
uses 'org.postgresql.shaded.com.ongres.stringprep.Profile'

provides 'org.mariadb.jdbc.tls.TlsSocketPlugin' with 'org.mariadb.jdbc.internal.protocol.tls.DefaultTlsSocketPlugin'
provides 'java.sql.Driver' with 'org.postgresql.Driver'
Expand Down
18 changes: 14 additions & 4 deletions external-libraries.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,10 +350,10 @@ License: MIT
```
```yaml
Id:io.github.adr:e-adr
Project:EmbeddedArchitecturalDecisionRecords
URL:https://github.com/adr/e-adr/
License:EPL-2.0
Id: io.github.adr:e-adr
Project: EmbeddedArchitecturalDecisionRecords
URL: https://github.com/adr/e-adr/
License: EPL-2.0
```
```yaml
Expand All @@ -363,6 +363,13 @@ URL: https://github.com/java-diff-utils/java-diff-utils
License: Apache-2.0
```
```yaml
Id: io.zonky.test:embedded-postgres
Project: embedded-postgres
URL: https://github.com/zonkyio/embedded-postgres
License: Apache-2.0
```
```yaml
Id: jakarta.annotation:jakarata.annotation-api
Project: Jakarta Annotations
Expand Down Expand Up @@ -798,6 +805,7 @@ de.undercouch:citeproc-java:3.1.0
eu.lestard:doc-annotations:0.2
info.debatty:java-string-similarity:2.0.0
io.github.java-diff-utils:java-diff-utils:4.12
io.zonky.test:embedded-postgres:2.0.7
jakarta.activation:jakarta.activation-api:2.1.3
jakarta.annotation:jakarta.annotation-api:2.1.1
jakarta.inject:jakarta.inject-api:2.0.1
Expand All @@ -813,6 +821,7 @@ net.jodah:typetools:0.6.1
net.synedra:validatorfx:0.5.0
one.jpro.jproutils:tree-showing:0.2.2
org.antlr:antlr4-runtime:4.13.2
org.apache.commons:commons-compress:1.27.1
org.apache.commons:commons-csv:1.11.0
org.apache.commons:commons-lang3:3.17.0
org.apache.commons:commons-text:1.12.0
Expand Down Expand Up @@ -889,6 +898,7 @@ org.slf4j:slf4j-api:2.0.16
org.tinylog:slf4j-tinylog:2.7.0
org.tinylog:tinylog-api:2.7.0
org.tinylog:tinylog-impl:2.7.0
org.tukaani:xz:1.9
org.yaml:snakeyaml:2.3
pt.davidafsilva.apple:jkeychain:1.1.0
tech.units:indriya:2.2
Expand Down
97 changes: 65 additions & 32 deletions src/main/antlr4/org/jabref/search/Search.g4
Original file line number Diff line number Diff line change
Expand Up @@ -4,49 +4,82 @@
* These search expressions are used for searching the bibtex library. They are heavily used for search groups.
*/
grammar Search;
options { caseInsensitive = true; }

WS: [ \t] -> skip; // whitespace is ignored/skipped
WS: [ \t\n\r]+ -> skip; // whitespace is ignored/skipped

LPAREN:'(';
RPAREN:')';
LPAREN: '(';
RPAREN: ')';

EQUAL:'='; // semantically the same as CONTAINS
EEQUAL:'=='; // semantically the same as MATCHES
NEQUAL:'!=';
EQUAL: '='; // case insensitive contains, semantically the same as CONTAINS
CEQUAL: '=!'; // case sensitive contains

AND:[aA][nN][dD]; // 'and' case insensitive
OR:[oO][rR]; // 'or' case insensitive
CONTAINS:[cC][oO][nN][tT][aA][iI][nN][sS]; // 'contains' case insensitive
MATCHES:[mM][aA][tT][cC][hH][eE][sS]; // 'matches' case insensitive
NOT:[nN][oO][tT]; // 'not' case insensitive
EEQUAL: '=='; // exact match case insensitive, semantically the same as MATCHES
CEEQUAL: '==!'; // exact match case sensitive

STRING:QUOTE (~'"')* QUOTE;
QUOTE:'"';
REQUAL: '=~'; // regex check case insensitive
CREEQUAL: '=~!'; // regex check case sensitive

FIELDTYPE:LETTER+;
// fragments are not accessible from the code, they are only for describing the grammar better
fragment LETTER : ~[ \t"()=!];
NEQUAL: '!='; // negated case insensitive contains
NCEQUAL: '!=!'; // negated case sensitive contains

NEEQUAL: '!=='; // negated case insensitive exact match
NCEEQUAL: '!==!'; // negated case sensitive exact match

start:
expression EOF;
NREQUAL: '!=~'; // negated regex check case insensitive
NCREEQUAL: '!=~!'; // negated regex check case sensitive

// labels are used to refer to parts of the rules in the generated code later on
// label=actualThingy
expression:
LPAREN expression RPAREN #parenExpression // example: (author=miller)
| NOT expression #unaryExpression // example: not author = miller
| left=expression operator=AND right=expression #binaryExpression // example: author = miller and title = test
| left=expression operator=OR right=expression #binaryExpression // example: author = miller or title = test
| comparison #atomExpression
AND: 'AND';
OR: 'OR';
CONTAINS: 'CONTAINS';
MATCHES: 'MATCHES';
NOT: 'NOT';

FIELD: [A-Z]+;
STRING_LITERAL: '"' ('\\"' | ~["])* '"'; // " should be escaped with a backslash
TERM: ('\\' [=!~()] | ~[ \t\n\r=!~()])+; // =!~() should be escaped with a backslash
start
: EOF
| andExpression EOF
;
andExpression
: expression+ #implicitAndExpression // example: author = miller year = 2010 --> equivalent to: author = miller AND year = 2010
;
expression
: LPAREN andExpression RPAREN #parenExpression // example: (author = miller)
| NOT expression #negatedExpression // example: NOT author = miller
| left = expression bin_op = AND right = expression #binaryExpression // example: author = miller AND year = 2010
| left = expression bin_op = OR right = expression #binaryExpression // example: author = miller OR year = 2010
| comparison #comparisonExpression // example: miller OR author = miller
;
comparison
: FIELD operator searchValue // example: author = miller
| searchValue // example: miller
;
comparison:
left=name operator=(CONTAINS | MATCHES | EQUAL | EEQUAL | NEQUAL) right=name // example: author != miller
| right=name // example: miller (search all fields)
operator
: EQUAL
| CEQUAL
| EEQUAL
| CEEQUAL
| REQUAL
| CREEQUAL
| NEQUAL
| NCEQUAL
| NEEQUAL
| NCEEQUAL
| NREQUAL
| NCREEQUAL
| CONTAINS
| MATCHES
;
name:
STRING // example: "miller"
| FIELDTYPE // example: author
searchValue
: STRING_LITERAL
| FIELD
| TERM
;
7 changes: 5 additions & 2 deletions src/main/java/module-info.java
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@
// endregion

// region: SQL databases
requires embedded.postgres;
requires org.tukaani.xz;
requires ojdbc10;
requires org.postgresql.jdbc;
requires org.mariadb.jdbc;
Expand All @@ -108,6 +110,7 @@
requires io.github.javadiffutils;
requires java.string.similarity;
requires org.apache.commons.cli;
requires org.apache.commons.compress;
requires org.apache.commons.csv;
requires org.apache.commons.io;
requires org.apache.commons.lang3;
Expand Down Expand Up @@ -160,8 +163,8 @@
// endregion

// region: Lucene
/**
* In case the version is updated, please also increment {@link org.jabref.model.search.SearchFieldConstants#VERSION} to trigger reindexing.
/*
* In case the version is updated, please also increment {@link org.jabref.model.search.LinkedFilesConstants.VERSION} to trigger reindexing.
*/
uses org.apache.lucene.codecs.lucene100.Lucene100Codec;
requires org.apache.lucene.analysis.common;
Expand Down
4 changes: 4 additions & 0 deletions src/main/java/org/jabref/Launcher.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import org.jabref.gui.util.DefaultFileUpdateMonitor;
import org.jabref.logic.UiCommand;
import org.jabref.logic.preferences.CliPreferences;
import org.jabref.logic.search.PostgreServer;
import org.jabref.logic.util.HeadlessExecutorService;
import org.jabref.migrations.PreferencesMigrations;

Expand Down Expand Up @@ -38,6 +39,9 @@ public static void main(String[] args) {

PreferencesMigrations.runMigrations(preferences);

PostgreServer postgreServer = new PostgreServer();
Injector.setModelOrService(PostgreServer.class, postgreServer);

JabRefGUI.setup(uiCommands, preferences, fileUpdateMonitor);
JabRefGUI.launch(JabRefGUI.class, args);
}
Expand Down
6 changes: 3 additions & 3 deletions src/main/java/org/jabref/cli/ArgumentProcessor.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
import org.jabref.model.database.BibDatabaseMode;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.BibEntryTypesManager;
import org.jabref.model.search.SearchQuery;
import org.jabref.model.search.query.SearchQuery;
import org.jabref.model.strings.StringUtil;
import org.jabref.model.util.DummyFileUpdateMonitor;
import org.jabref.model.util.FileUpdateMonitor;
Expand Down Expand Up @@ -458,8 +458,8 @@ private boolean exportMatches(List<ParserResult> loaded) {

List<BibEntry> matches;
try {
// extract current thread task executor from luceneManager
matches = new DatabaseSearcher(query, databaseContext, new CurrentThreadTaskExecutor(), cliPreferences.getFilePreferences()).getMatches();
// extract current thread task executor from indexManager
matches = new DatabaseSearcher(query, databaseContext, new CurrentThreadTaskExecutor(), cliPreferences).getMatches();
} catch (IOException e) {
LOGGER.error("Error occurred when searching", e);
return false;
Expand Down
4 changes: 4 additions & 0 deletions src/main/java/org/jabref/gui/JabRefGUI.java
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import org.jabref.logic.net.ProxyRegisterer;
import org.jabref.logic.remote.RemotePreferences;
import org.jabref.logic.remote.server.RemoteListenerServerManager;
import org.jabref.logic.search.PostgreServer;
import org.jabref.logic.util.BuildInfo;
import org.jabref.logic.util.FallbackExceptionHandler;
import org.jabref.logic.util.HeadlessExecutorService;
Expand Down Expand Up @@ -395,6 +396,9 @@ public static void shutdownThreadPools() {
LOGGER.trace("Shutting down directoryMonitor");
DirectoryMonitor directoryMonitor = Injector.instantiateModelOrService(DirectoryMonitor.class);
directoryMonitor.shutdown();
LOGGER.trace("Shutting down postgreServer");
PostgreServer postgreServer = Injector.instantiateModelOrService(PostgreServer.class);
postgreServer.shutdown();
LOGGER.trace("Shutting down HeadlessExecutorService");
HeadlessExecutorService.INSTANCE.shutdownEverything();
LOGGER.trace("Finished shutdownThreadPools");
Expand Down
Loading

0 comments on commit bd7219f

Please sign in to comment.