Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor PDF import #12310

Merged
merged 28 commits into from
Jan 25, 2025
Merged

Refactor PDF import #12310

merged 28 commits into from
Jan 25, 2025

Conversation

InAnYan
Copy link
Collaborator

@InAnYan InAnYan commented Dec 18, 2024

Updates:

  • Moved PDF related importers into pdf package.
  • Added explanation comments about how PDF importing works.
  • Cleared out methods: big are split into small, duplicated sections are deduplicated (a common code is extracted into a method).
  • When user imports only 1 PDF file, a merge dialog appears.
  • Duplicated methods are implemented in PdfImporter (some importDatabase, recognizedFormat, etc.).
  • And one important thing: PDDocument is sent to importDatabase function of specific PDF importers, as they often use this type. Added in order to reduce duplicate work.

I think people more drag-n-drop PDF files, rather click "Extract metadata from PDF", so I think this new merge dialog is useful (it's not new, just new place where it's opened 😄).

Mandatory checks

  • I own the copyright of the code submitted and I licence it under the MIT license
    - [ ] Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
    - [ ] Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@InAnYan
Copy link
Collaborator Author

InAnYan commented Dec 18, 2024

WIP

Currently tests don't work

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

In case of issues with the import order, double check that you activated Auto Import.
You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

In case of issues with the import order, double check that you activated Auto Import.
You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

In case of issues with the import order, double check that you activated Auto Import.
You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use Checkstyle to identify issues.
Please carefully follow the setup guide for the codestyle.
Afterwards, please run checkstyle locally and fix the issues.

In case of issues with the import order, double check that you activated Auto Import.
You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

@InAnYan
Copy link
Collaborator Author

InAnYan commented Dec 23, 2024

Couple of questions:

  1. Should importers redefine getId()? In Importer class, getId() is constructed from getName().
  2. Should getName() of importers be localized?

I'm asking these questions, as there are small inconsistencies in PDF importers

@InAnYan InAnYan marked this pull request as ready for review December 23, 2024 13:44
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

@Siedlerchr
Copy link
Member

ID is for command line usage and is overwritten in all typical Importers, e..g MedlinePlainImporter ,
name is the name of the importer and often the file format or the source.
However as it's the name of the Importer format e.g. Medline/bibliioscape etc which is a proper noun (Eigenname) it does not make sense to translate it

@koppor
Copy link
Member

koppor commented Dec 23, 2024

ID is for command line usage and is overwritten in all typical Importers, e..g MedlinePlainImporter , name is the name of the importer and often the file format or the source. However as it's the name of the Importer format e.g. Medline/bibliioscape etc which is a proper noun (Eigenname) it does not make sense to translate it

@InAnYan Please add this (slightliy updated) as JavaDoc!

# Conflicts:
#	src/main/java/org/jabref/logic/importer/fileformat/pdf/PdfContentImporter.java
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You modified Markdown (*.md) files and did not meet JabRef's rules for consistently formatted Markdown files.
To ensure consistent styling, we have markdown-lint in place.
Markdown lint's rules help to keep our Markdown files consistent within this repository and consistent with the Markdown files outside here.

You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "Markdown".

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You modified Markdown (*.md) files and did not meet JabRef's rules for consistently formatted Markdown files.
To ensure consistent styling, we have markdown-lint in place.
Markdown lint's rules help to keep our Markdown files consistent within this repository and consistent with the Markdown files outside here.

You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "Markdown".

@koppor
Copy link
Member

koppor commented Dec 23, 2024

Another tnice show case for refactoring miner:

image

# Conflicts:
#	build.gradle
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You modified Markdown (*.md) files and did not meet JabRef's rules for consistently formatted Markdown files.
To ensure consistent styling, we have markdown-lint in place.
Markdown lint's rules help to keep our Markdown files consistent within this repository and consistent with the Markdown files outside here.

You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "Markdown".

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.

You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.

@koppor
Copy link
Member

koppor commented Jan 20, 2025

DevCall decision: Fix conflicts and then this is good to go! 🎉

# Conflicts:
#	build.gradle
#	src/main/java/org/jabref/logic/importer/fileformat/PdfMergeMetadataImporter.java
#	src/main/java/org/jabref/logic/importer/fileformat/pdf/PdfContentImporter.java
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code currently does not meet JabRef's code guidelines.
We use OpenRewrite to ensure "modern" Java coding practices.
The issues found can be automatically fixed.
Please execute the gradle task rewriteRun, check the results, commit, and push.

You can check the detailed error output by navigating to your pull request, selecting the tab "Checks", section "Tests" (on the left), subsection "OpenRewrite".

@Siedlerchr Siedlerchr added this pull request to the merge queue Jan 25, 2025
Merged via the queue into JabRef:main with commit df438b8 Jan 25, 2025
25 checks passed
@Siedlerchr Siedlerchr deleted the refactor/pdf-import branch January 25, 2025 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants