DataSciSalary Predictor

Project Overview

The DataSciSalary Predictor project aims to predict the salary of data science employees based on several variables. This project provides insights into the factors influencing salaries and helps in forecasting salary ranges for future job roles in the field of data science.

Problem Statement

How can we accurately predict the salary of data science employees using various factors derived from a dataset?

Dataset Information

Original Dataset: ds_salaries.csv
Preprocessed Dataset: preprocessed_data-science.csv

Dataset Description

The dataset includes various attributes that influence the salary of data science professionals. These variables were analyzed, cleaned, and visualized during the project to extract meaningful insights.

Project Workflow

Data Extraction

Loaded the dataset ds_salaries.csv into the environment.

Data Cleaning

Handled missing values, duplicates, and inconsistent data.

Cleaned data was saved to preprocessed_data-science.csv.

Data Analysis and Visualization

Explored relationships between variables and their impact on salary.

Visualizations were created to highlight trends and patterns.

Modeling (Optional Future Step)

The project can be extended to include machine learning models for salary prediction based on the preprocessed data.

Files and Directories

ds_salaries.csv: Original dataset used for analysis.
preprocessed_data-science.csv: Cleaned dataset after preprocessing.
d.ipynb: Jupyter Notebook containing code for data extraction, cleaning, and visualization.
P0.pptx: PowerPoint presentation summarizing the project and its future applications.
requirements.txt: List of Python dependencies required to run the project.

Requirements

The project uses the following Python libraries and tools:

pandas
numpy
matplotlib
seaborn

To install the dependencies, run:

pip install -r requirements.txt

How to Run the Project

Clone the repository or download the project files.
Install the required dependencies using the requirements.txt file.
Open the Jupyter Notebook d.ipynb to review the data cleaning and visualization steps.
Use P0.pptx for understanding the project objectives, methodology, and future applications.

Future Applications

Machine Learning Models: Train models to predict salaries based on the cleaned data.
Dashboard Integration: Develop a dashboard to interactively visualize salary trends.
Industry Insights: Use the analysis to guide HR and recruitment strategies for data science roles.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
PO.pptx		PO.pptx
README.md		README.md
d.ipynb		d.ipynb
ds_salaries.csv		ds_salaries.csv
preprocessed_data-science.csv		preprocessed_data-science.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataSciSalary Predictor

Project Overview

Problem Statement

Dataset Information

Dataset Description

Project Workflow

Files and Directories

Requirements

How to Run the Project

Future Applications

About

Releases

Packages

Languages

Josephvarghes/DataSciSalary_Predictor

Folders and files

Latest commit

History

Repository files navigation

DataSciSalary Predictor

Project Overview

Problem Statement

Dataset Information

Dataset Description

Project Workflow

Files and Directories

Requirements

How to Run the Project

Future Applications

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages