Skip to content

Latest commit

 

History

History
80 lines (45 loc) · 2.37 KB

README.md

File metadata and controls

80 lines (45 loc) · 2.37 KB

License

This repository stores simple scripts to explore scientific publication pdfs using ChatGPT API. This is created with the simple intention of sharing useful code to look across the litterature. THIS CODE IS EXPERIMENTAL. We share so that others could try it and if it is proves useful, please let us know.

Running

  1. At the moment, there is a simple script. To run it, you first need to create your conda environment as :

conda create --name <your_env_name> python=3.10

  1. Then activate it:

conda activate <your_env_name>

  1. Install the package

pip install .

  1. Go the script folder:

cd scripts

  1. Copy your openAi API key in the .env file. You can find this here: https://platform.openai.com/account/api-keys

  2. Run it using:

python pdf_summary.py --path_pdf <path_to_your_pdf> --save_summary True

This will save a little text file along with your pdf with the same filename but with a .txt extension.

Parameters

--path_pdf: Path to a PDF file that you want to summarize.

Type: string

Default: os.path.join(script_path, '../example/2020.12.15.422967v4.full.pdf')

--save_summary: Save the generated summary in a txt file alongside the PDF file.

Type: boolean

Default: True

--save_raw_text: Save the raw text in a txt file along the pdf file.

Type: boolean

Default: False

--cut_bibliography: Try not to summarize the bibliography at the end of the PDF file.

Type: boolean

Default: True

--chunk_length: Determines the final length of the summary by summarizing the document in chunks. More chunks result in a longer summary but may lead to inconsistency across sections. Typically, 1 is a good value for an abstract, and 2 or 3 for more detailed summaries.

Type: integer

Default: 1

Example

See the example/ folder for example runs. There is a typical short example (length of a typical abstract) and a long summary (using chunk_length 4).

How to contribute?

  1. First go to tests/ and read the README.md

  2. Make a PR against main. This will run CI through github actions.

Credits

This repository was started by Jerome Lecoq on April 12th 2023. Please reach out [email protected] for any questions. If this is useful to you, 👋 are welcome!