The DataSciSalary Predictor project aims to predict the salary of data science employees based on several variables. This project provides insights into the factors influencing salaries and helps in forecasting salary ranges for future job roles in the field of data science.
How can we accurately predict the salary of data science employees using various factors derived from a dataset?
-
Original Dataset: ds_salaries.csv
-
Preprocessed Dataset: preprocessed_data-science.csv
The dataset includes various attributes that influence the salary of data science professionals. These variables were analyzed, cleaned, and visualized during the project to extract meaningful insights.
- Data Extraction
Loaded the dataset ds_salaries.csv into the environment.
- Data Cleaning
Handled missing values, duplicates, and inconsistent data.
Cleaned data was saved to preprocessed_data-science.csv.
- Data Analysis and Visualization
Explored relationships between variables and their impact on salary.
Visualizations were created to highlight trends and patterns.
- Modeling (Optional Future Step)
The project can be extended to include machine learning models for salary prediction based on the preprocessed data.
-
ds_salaries.csv: Original dataset used for analysis.
-
preprocessed_data-science.csv: Cleaned dataset after preprocessing.
-
d.ipynb: Jupyter Notebook containing code for data extraction, cleaning, and visualization.
-
P0.pptx: PowerPoint presentation summarizing the project and its future applications.
-
requirements.txt: List of Python dependencies required to run the project.
The project uses the following Python libraries and tools:
-
pandas
-
numpy
-
matplotlib
-
seaborn
To install the dependencies, run:
pip install -r requirements.txt
-
Clone the repository or download the project files.
-
Install the required dependencies using the requirements.txt file.
-
Open the Jupyter Notebook d.ipynb to review the data cleaning and visualization steps.
-
Use P0.pptx for understanding the project objectives, methodology, and future applications.
-
Machine Learning Models: Train models to predict salaries based on the cleaned data.
-
Dashboard Integration: Develop a dashboard to interactively visualize salary trends.
-
Industry Insights: Use the analysis to guide HR and recruitment strategies for data science roles.