Auto Price Prediction: Data Analysis and Predictive Modeling

Project Overview

Objective: Build a predictive model to understand how car prices vary with design and engineering features, enabling strategic adjustments in automotive design and pricing.

Dataset: 1985 Auto Imports Database with 205 instances and 26 attributes.

Key Tasks:

Data Analysis: Clean, explore, and preprocess data.
Predictive Modeling: Train and compare regression models.
Insight Generation: Translate model results into actionable business strategies.

Project Structure

├── data
│   ├── 1.1 raw               # Raw datasets (auto_imports.csv)
│   └── 1.2 processed         # Processed training/testing splits
├── docs                      # Problem statement and documentation
├── notebooks                 # Jupyter notebooks for analysis and modeling
├── report                    # Final Report.md and supporting files
├── results
│   ├── 365csv pre-analysis   # EDA outputs (statistics, visualizations)
│   ├── figures               # Saved plots (boxplots, histograms)
│   └── models                # Serialized models (final_lasso_model.joblib)
├── scripts                   # Utility scripts (AutoPrice.py, utility.py)
├── LICENSE
├── README.md
└── requirements.txt          # Python dependencies

Key Features

Data Preprocessing:
- Missing value imputation (median/mode).
- Outlier capping using IQR and domain knowledge.
- PCA for dimensionality reduction (95% variance retained).
Model Development:
- Compared 7 models: Lasso, XGBoost, Gradient Boosting, etc.
- Hyperparameter tuning with cross-validation.
Deployment-Ready:
- Best model: Lasso Regression (R² = 0.917, RMSE = 1,987).
- Interpretable coefficients for business strategy (e.g., BMW adds $7,347 to price).

Installation

Clone the repository:

git clone https://github.com/dhaneshbb/AutoPricePred.git
cd AutoPricePred

Install dependencies:
```
pip install -r requirements.txt
```
Launch Jupyter Notebook:
```
jupyter notebook
```

Usage

Data Analysis:
- Run AutoPricePred_Analysis.ipynb to explore data cleaning, EDA, and visualizations.
Model Training:
- The notebook includes code for model comparison, hyperparameter tuning, and saving the best model.

Inference:

import joblib
model = joblib.load('models/final_lasso_model.joblib')
prediction = model.predict([input_features])

Results

Model	Test RMSE	Test R²	Training Time (s)	Overfit (Δ R²)
Lasso (α=10)	1,987	0.917	0.019	0.033
XGBoost	1,663	0.942	0.143	0.052
Gradient Boosting	1,842	0.928	0.118	0.051

Final Model: Lasso Regression (α=10)

Rationale: Balances interpretability, speed, and generalizability.
Key Drivers: Luxury brands (make_bmw, make_mercedes-benz), engine location, and PCA components.
Performance:

Metric	Value
Test R²	0.917
Test RMSE	1,987
Cross-Validation R²	0.899 ± 0.027

Key Insights:

Luxury brands (BMW, Mercedes) command significant price premiums.
Rear-engine vehicles are associated with higher prices.
Vehicle size/power (PCA_1) is a critical pricing factor.

Challenges & Solutions

Challenge	Solution
Missing Values (18%) with data Leakage	Median/mode imputation + column drop.
Multicollinearity	PCA and VIF-based feature removal.
High Cardinality	Regularization (Lasso) for sparsity.
Model Overfitting	Cross-validation and hyperparameter tuning.

Contributing

Contributions are welcome!

Fork the repository.
Create a feature branch: git checkout -b feature/new-feature.
Commit changes: git commit -m 'Add new feature'.
Push to the branch: git push origin feature/new-feature.
Submit a pull request.

License

This project is licensed under the terms of the MIT License. See LICENSE for details.

Made with using insightfulpy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto Price Prediction: Data Analysis and Predictive Modeling

Table of Contents

Project Overview

Project Structure

Key Features

Installation

Usage

Results

Final Model: Lasso Regression (α=10)

Challenges & Solutions

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
data		data
docs		docs
notebooks		notebooks
report		report
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

dhaneshbb/AutoPricePred

Folders and files

Latest commit

History

Repository files navigation

Auto Price Prediction: Data Analysis and Predictive Modeling

Table of Contents

Project Overview

Project Structure

Key Features

Installation

Usage

Results

Final Model: Lasso Regression (α=10)

Challenges & Solutions

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages