The objective of this project is to develop a Walmart sales forecasting model using the best machine learning algorithms. The project involves understanding and cleaning up the dataset, building regression models to predict sales based on single and multiple features, evaluating model performance using metrics like RMSE, and performing data exploration, manipulation, feature selection, and predictive modeling. The project aims to provide insights into the dataset and conclude with reliable sales forecasts for Walmart.
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
This is the historical data that covers sales from 2010-02-05 to 2012-11-01, in the files 'stores' and 'features'. Within this file you will find the following fields :-
- Store - the store number
- Date - the week of sales
- Weekly_Sales - sales for the given store
- IsHoliday - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
- Temperature - Temperature on the day of sale
- Fuel_Price - Cost of fuel in the region
- CPI – Prevailing consumer price index
- Unemployment - Prevailing unemployment rate
We aim to solve the problem statement by creating a plan of action, Here are some of the necessary steps:
- Importing Libraries
- Loading Dataset
- Data Exploration
- Data Preprocessing
- Feature Selection/Extraction
- Predictive Modelling
- Project Outcomes & Conclusion