dsresumatch
is a Python library designed to do the analysis, evaluation, and scoring of resumes in PDF format. With tools to extract, clean, and analyze text, it allows users to identify missing, count word frequencies, evaluate keyword matches, and generate scores for resumes based on predefined criteria. This package is especially useful for recruiters, hiring managers, and job seekers aiming to optimize resumes for keyword-based Applicant Tracking Systems (ATS).
-
Text Processing
read_pdf(file_path)
: Extracts raw text from a PDF file.clean_text(raw_text)
: Cleans and preprocesses extracted text by removing punctuation, stop words, and converting to lowercase.count_words_in_pdf(file_path)
: Counts word frequencies in a PDF file.
-
Section Evaluation
missing_section(clean_text, add_benchmark_sections=None)
: Identifies sections missing from the resume compared to benchmark sections.
-
Keyword Analysis
evaluate_keywords(cleaned_text, keywords=None, use_only_supplied_keywords=False)
: Matches keywords against the resume text and evaluates the coverage.
-
Resume Scoring
resume_score(cleaned_text, keywords=None, use_only_supplied_keywords=False, add_benchmark_sections=[], feedback=True)
: Scores resumes based on keyword matching, benchmark sections, and provides detailed feedback on missing or extra keywords and sections.
dsresumatch
addresses a unique niche in the Python ecosystem by focusing on resume analysis and scoring, particularly for optimizing resumes for Applicant Tracking Systems (ATS) for Data Scientists. While there are general-purpose text analysis libraries such as:
- NLTK: For advanced natural language processing tasks.
- spaCy: For large-scale NLP and text processing.
There are no Python packages that consistently support resume matching (e.g., resume-matcher, which was last updated in February 2024). However, there are some Python programs available, such as resume-job-matcher and Resume Compatibility.
If you are looking for general PDF text extraction, libraries like PyPDF2 and pdfplumber might suit your needs. However, dsresumatch
builds on this functionality to provide domain-specific tools tailored to resume evaluation.
$ pip install dsresumatch
dsresumatch
extract text from pdf, count words from pdf, :
# Import required functions
from dsresumatch.pdf_cv_processing import read_pdf, count_words_in_pdf, clean_text
from dsresumatch.sections_check import missing_section, extra_section
from dsresumatch.evaluate_keywords import evaluate_keywords
from dsresumatch.resume_scoring import resume_score
#additonal imports would be determined later
file_path = "~/Desktop/my_example_cv.pdf" # Specify the file path
raw_text = read_pdf(file_path) # Read text from the PDF
cleaned_text = clean_text(raw_text) # Clean and preprocess the text
word_counts = count_words_in_pdf(file_path) # Count words in the PDF
add_benchmark_sections = ["Work Experience", "Education", "Skills", "Projects", "Certifications"] # (Optional) give keywords
missing = missing_section(cleaned_text, benchmark_sections) # Identify missing or extra sections
keywords = ["Python", "Data Analysis", "Machine Learning"] # Evaluate keywords
keyword_evaluation = evaluate_keywords(cleaned_text, keywords)
resume_summary = resume_score(
cleaned_text,
keywords=keywords,
add_benchmark_sections=benchmark_sections,
feedback=True,
) # Score the resume
print("Word Counts:", word_counts)
print("Missing Sections:", missing)
print("Extra Sections:", extra)
print("Keyword Evaluation:", keyword_evaluation)
print("Resume Summary:", resume_summary)
Nelli Hovhannisyan, Ashita Diwan, Timothy Singh, Jia Quan Lim
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
dsresumatch
was created by Nelli Hovhannisyan, Ashita Diwan, Timothy Singh, Jia Quan Lim. It is licensed under the terms of the MIT license.
dsresumatch
was created with cookiecutter
and the py-pkgs-cookiecutter
template.