-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Mega changes * Make PdfBibExtractor be Importer * Add merge dialog to import * Renaming * Revert to old name * Rename to old names * Revert to old code * Fix style * Add documentation * Fix ids * Update checkstyle to 10.21.0 # Conflicts: # build.gradle * Rename method (and add JavaDoc) * Use Java23 _ * Add Java comment * Fix name * Refine adr-template.md * Fix ADR0043 * Refine comment * Point to ADR-0043 * Revert to stream-based forEach (enabled by adding "final") * Fix ADR * Update a bit ADR * Fix tests * Fix tests * Fix checkers --------- Co-authored-by: Oliver Kopp <[email protected]>
- Loading branch information
Showing
49 changed files
with
616 additions
and
597 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
64 changes: 64 additions & 0 deletions
64
docs/decisions/0043-show-merge-dialog-when-importing-a-single-pdf.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
--- | ||
nav_order: 43 | ||
parent: Decision Records | ||
--- | ||
# Show merge dialog when importing a single PDF | ||
|
||
## Context and Problem Statement | ||
|
||
PDF files are one of the main format for transferring various documents, especially scientific papers. However, by itself, | ||
PDF is like a picture, it contains commands solely for displaying the human-readable text, but it might not contain | ||
computer-readable metadata. | ||
|
||
To overcome these problems various heuristics and AI models are used to "convert" a PDF into a BibTeX entry. However, it | ||
also introduces a level of problems, as heuristics are not ideal: sometimes it works perfectly, but on others it generates | ||
random output. | ||
|
||
PDF importing in JabRef is done via `PdfImporter` abstract class and its descendants, and via `PdfMergeMetadataImporter`. | ||
`PdfImporter` is typically a single heuristics or method of extracting a `BibEntry` from PDF. `PdfMergeMetadataImporter` | ||
collects `BibEntry` candidates from all `PdfImporter`s and merges them automatically into a single `BibEntry`. | ||
|
||
The specific problem JabRef has: should JabRef automate all heuristics (automatically merge all `BibEntry`ies from | ||
several `PdfImporter`s) when importing PDF files or should every file be analysed thoroughly by users? | ||
|
||
## Decision Drivers | ||
|
||
* Option should provide a good-enough quality. | ||
* It is desired to have a fine-grained controls of PDF importing for power-users. | ||
|
||
## Considered Options | ||
|
||
* Automatically merge all `BibEntry` candidates from `PdfImporters`. | ||
* Open a merge dialog with all candidates. | ||
* Open a merge dialog with all candidates if a single PDF is imported. | ||
|
||
## Decision Outcome | ||
|
||
Chosen option: "Open a merge dialog with all candidates if a single PDF is imported", because comes out best (see below). | ||
|
||
## Pros and Cons of the Options | ||
|
||
### Automatically merge all `BibEntry` candidates from `PdfImporters` | ||
|
||
* Good, because minimal user interaction and disruption of flow. It also allows batch-processing. | ||
* Bad, because heuristics are not ideal, and it is even harder to develop a "smarter" merging algorithm. | ||
|
||
### Open a merge dialog with all candidates | ||
|
||
* Good, because allows for fine-grained import. Some correct field may be overridden by a wrong field from other importer, | ||
which is undesirable for power-users. | ||
* Bad, because it is a dialog. If lots of PDFs are imported, then there will be lots of dialogs, which might be | ||
too daunting to process manually. | ||
|
||
### Open a merge dialog with all candidates if a single PDF is imported | ||
|
||
Explanation: | ||
|
||
- If a single PDF is imported, then open a merge dialog. | ||
- If several PDFs are imported, merge candidates for each PDF automatically. | ||
|
||
Outcomes: | ||
|
||
* Good, because it combines the best of the other two options: Allow both for PDF batch-processing and for fine-grained control. | ||
|
||
<!-- markdownlint-disable-file MD004 --> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
68 changes: 68 additions & 0 deletions
68
src/main/java/org/jabref/gui/externalfiles/PdfMergeDialog.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
package org.jabref.gui.externalfiles; | ||
|
||
import java.io.IOException; | ||
import java.nio.file.Path; | ||
import java.util.function.Supplier; | ||
|
||
import org.jabref.gui.mergeentries.MultiMergeEntriesView; | ||
import org.jabref.gui.preferences.GuiPreferences; | ||
import org.jabref.logic.importer.Importer; | ||
import org.jabref.logic.importer.ParserResult; | ||
import org.jabref.logic.importer.fileformat.pdf.PdfContentImporter; | ||
import org.jabref.logic.importer.fileformat.pdf.PdfEmbeddedBibFileImporter; | ||
import org.jabref.logic.importer.fileformat.pdf.PdfGrobidImporter; | ||
import org.jabref.logic.importer.fileformat.pdf.PdfImporter; | ||
import org.jabref.logic.importer.fileformat.pdf.PdfVerbatimBibtexImporter; | ||
import org.jabref.logic.importer.fileformat.pdf.PdfXmpImporter; | ||
import org.jabref.logic.l10n.Localization; | ||
import org.jabref.logic.util.TaskExecutor; | ||
import org.jabref.model.entry.BibEntry; | ||
|
||
public class PdfMergeDialog { | ||
|
||
/** | ||
* Constructs a merge dialog for a PDF file. This dialog calls various {@link PdfImporter}s, collects the results, and lets the user choose between them. | ||
* <p> | ||
* {@link PdfImporter}s try to extract a {@link BibEntry} out of a PDF file, | ||
* but it does not perform this 100% perfectly, it is only a set of heuristics that in some cases might work, in others not. | ||
* Thus, JabRef provides this merge dialog that collects the results of all {@link PdfImporter}s | ||
* and gives user a choice between field values. | ||
* | ||
* @param entry the entry to merge with | ||
* @param filePath the path to the PDF file. This PDF is used as the source for the {@link PdfImporter}s. | ||
* @param preferences the preferences to use. Full preference object is required, because of current implementation of {@link MultiMergeEntriesView}. | ||
* @param taskExecutor the task executor to use when the multi merge dialog executes the importers. | ||
*/ | ||
public static MultiMergeEntriesView createMergeDialog(BibEntry entry, Path filePath, GuiPreferences preferences, TaskExecutor taskExecutor) { | ||
MultiMergeEntriesView dialog = new MultiMergeEntriesView(preferences, taskExecutor); | ||
|
||
dialog.setTitle(Localization.lang("Merge PDF metadata")); | ||
|
||
dialog.addSource(Localization.lang("Entry"), entry); | ||
dialog.addSource(Localization.lang("Verbatim"), wrapImporterToSupplier(new PdfVerbatimBibtexImporter(preferences.getImportFormatPreferences()), filePath)); | ||
dialog.addSource(Localization.lang("Embedded"), wrapImporterToSupplier(new PdfEmbeddedBibFileImporter(preferences.getImportFormatPreferences()), filePath)); | ||
|
||
if (preferences.getGrobidPreferences().isGrobidEnabled()) { | ||
dialog.addSource("Grobid", wrapImporterToSupplier(new PdfGrobidImporter(preferences.getImportFormatPreferences()), filePath)); | ||
} | ||
|
||
dialog.addSource(Localization.lang("XMP metadata"), wrapImporterToSupplier(new PdfXmpImporter(preferences.getXmpPreferences()), filePath)); | ||
dialog.addSource(Localization.lang("Content"), wrapImporterToSupplier(new PdfContentImporter(), filePath)); | ||
|
||
return dialog; | ||
} | ||
|
||
private static Supplier<BibEntry> wrapImporterToSupplier(Importer importer, Path filePath) { | ||
return () -> { | ||
try { | ||
ParserResult parserResult = importer.importDatabase(filePath); | ||
if (parserResult.isInvalid() || parserResult.isEmpty() || !parserResult.getDatabase().hasEntries()) { | ||
return null; | ||
} | ||
return parserResult.getDatabase().getEntries().getFirst(); | ||
} catch (IOException e) { | ||
return null; | ||
} | ||
}; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.