Chi_Scraper

NOTE: This program is in Alpha. Key QOL improvements are coming soon. Keep an eye on the github or on my blog for new releases. For bugs, raise a GitHub issue, or shoot me an email.

This application is designed to search for academic articles from the ArXiv repository, rank them based on relevance, and display the results in a web browser. It leverages LLLMs running locally or on the cloud to enhance the search and ranking processes.

Key Scripts

searchArxiv.py: This script pulls articles from ArXiv using a keyword matching algorithm. It reads configurations from cofig.nyaml to determine search criteria and output settings.
rank_articles.py: This script assigns relevance scores to the articles retrieved by searchArxiv.py. It uses AI models specified in the configuration to evaluate the relevance of each article.
app.py: This script displays the articles in a web browser, providing an interactive interface for users to explore the search results.

Configuration

The application is configured using the config.yaml file. Key settings include:

ai_model: Specifies the AI model to use for ranking.
host: Indicates whether to use OpenAI's servers or run the AI locally.
config_directory: Directory where configuration files are stored.
output_directory: Directory where markdown files for each article are saved.
config_modelname: Which keyword matching file should be used
lookback_date: How many days backwards to search the ArXiv

Getting Started

Install Dependencies: Ensure all necessary Python packages and external tools (like Ollama for local AI hosting) are installed.

ChiScraper Dependancies

python -m venv .venv
# For Mac/Linux
source ./.venv/bin/actiivate
# For Windows
.venv/Scripts/Activate.ps1`

pip install -r ./requirements.txt

If you want to run the LLM locally, follow instructions on helpFiles\localAIEval.md

Configure the Application: Edit config.yaml to set your preferences for search criteria, AI model, and output settings.
Run the Scripts:
- Execute searchArxiv.py to fetch articles.
- Run rank_articles.py to rank the articles based on relevance.
- Launch app.py to view the articles in your web browser.

Additional Resources

AI Ranking: For more details on running the AI ranking algorithm, refer to helpFiles/aiRanking.md.
Local AI Evaluation: Instructions for setting up local AI evaluation are available in helpFiles/localAIEval.md.
Search Configuration: Learn about configuring search criteria in helpFiles/searchConfig.md.

By following these steps and utilizing the provided scripts, you can efficiently search, rank, and explore academic articles tailored to your research needs.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Configs		Configs
MyLibrary		MyLibrary
helpFiles		helpFiles
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
ai_eval.py		ai_eval.py
analyse.py		analyse.py
app.py		app.py
articleDB.py		articleDB.py
config.yaml		config.yaml
demo_filter.py		demo_filter.py
doiHandler.py		doiHandler.py
download_articles.py		download_articles.py
fetch_from_arxiv.py		fetch_from_arxiv.py
loadAiEvalToDB.py		loadAiEvalToDB.py
rank_articles.py		rank_articles.py
requirements.txt		requirements.txt
search_arxiv.py		search_arxiv.py
test_filter.py		test_filter.py
tweak_md_files.py		tweak_md_files.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chi_Scraper

Key Scripts

Configuration

Getting Started

ChiScraper Dependancies

Additional Resources

About

Releases

Packages

Contributors 3

Languages

CJones-Optics/arXivScraper

Folders and files

Latest commit

History

Repository files navigation

Chi_Scraper

Key Scripts

Configuration

Getting Started

ChiScraper Dependancies

Additional Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages