Goal of Project: Categorize the emails in a set of Issue Categories from shared mailboxes for Finance AP Team.
Background: On daily basis, the Finance AP Team gets a lot of emails on different issues or queries from their customers or vendors & the team members are responsible to resolve & respond to them all. Finance
Solution Approach: Extraction of the emails from shared mailboxes using automation. Applying Text Analytics & Natural Language Processing on the email body & Identify the category of issue or query that the vendor/customer needs to enquire from Finance AP Team.
Benefits: This Text Analytics of big volume of emails on years/ half yearly/ quarterly/ monthly basis helps leaders identify the area of improvements & help the operational team to identify the issue automatically & allocate/redirect those to the responsible team member.
6 months of emails from 10 shared mailboxes of Finance Account Payable Team. (55,000 emails)
- Python * Excel VBA * NaiveBayes * NLTK * scikit-learn * pandas * Numpy * matplotlib * seaborn * wordCloud
- Extracting Unstructured Data
- Labelling Dataset (categorizing emails into a set of standard queries using key phrases & validating with domain experts – AP Team)
- Exploratory Data Analysis (EDA)
- Pre-processing (Feature Extraction)
- Model selection (for multiclass text classification)
- Model training (Multinominal Naïve Bayes)
- Model evaluation
- Hyperparameter tuning
- Saving Model & reusing on unseen data.