Skip to content

Josephvarghes/DataSciSalary_Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataSciSalary Predictor

Project Overview

The DataSciSalary Predictor project aims to predict the salary of data science employees based on several variables. This project provides insights into the factors influencing salaries and helps in forecasting salary ranges for future job roles in the field of data science.

Problem Statement

How can we accurately predict the salary of data science employees using various factors derived from a dataset?

Dataset Information

  • Original Dataset: ds_salaries.csv

  • Preprocessed Dataset: preprocessed_data-science.csv

Dataset Description

The dataset includes various attributes that influence the salary of data science professionals. These variables were analyzed, cleaned, and visualized during the project to extract meaningful insights.

Project Workflow

  • Data Extraction

Loaded the dataset ds_salaries.csv into the environment.

  • Data Cleaning

Handled missing values, duplicates, and inconsistent data.

Cleaned data was saved to preprocessed_data-science.csv.

  • Data Analysis and Visualization

Explored relationships between variables and their impact on salary.

Visualizations were created to highlight trends and patterns.

  • Modeling (Optional Future Step)

The project can be extended to include machine learning models for salary prediction based on the preprocessed data.

Files and Directories

  • ds_salaries.csv: Original dataset used for analysis.

  • preprocessed_data-science.csv: Cleaned dataset after preprocessing.

  • d.ipynb: Jupyter Notebook containing code for data extraction, cleaning, and visualization.

  • P0.pptx: PowerPoint presentation summarizing the project and its future applications.

  • requirements.txt: List of Python dependencies required to run the project.

Requirements

The project uses the following Python libraries and tools:

  • pandas

  • numpy

  • matplotlib

  • seaborn

To install the dependencies, run:

pip install -r requirements.txt

How to Run the Project

  • Clone the repository or download the project files.

  • Install the required dependencies using the requirements.txt file.

  • Open the Jupyter Notebook d.ipynb to review the data cleaning and visualization steps.

  • Use P0.pptx for understanding the project objectives, methodology, and future applications.

Future Applications

  • Machine Learning Models: Train models to predict salaries based on the cleaned data.

  • Dashboard Integration: Develop a dashboard to interactively visualize salary trends.

  • Industry Insights: Use the analysis to guide HR and recruitment strategies for data science roles.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published