GitHub - RameshKoppad/Asteroid_Project: To detect if an asteroid is potentially hazardous or not if it is found to be a Near-Earth Object by predicting the diameter

🚀 Problem Statement 💡 :

Predict the diameter of the asteroids from the given dataset.🌌

Variable Name	Description
a	semi-major axis[au]
e	eccentricity
i	inclination wrt x-y ecliptic plane [deg]
om	longitude of the ascending node
w	argument of perihelion
q	perihelion distance [au]
ad	aphelion distance [au]
per_y	orbital period [years]
data_arc	data arc-span [d]
condition_code	orbit condition code
n_obs_use	number of observations used
H	absolute magnitude parameter
diameter	diameter of asteroid [km]
extent	object bi or tri-axial ellipsoid dimensions [km]
albedo	geometric albedo
rot_per	standard gravitational parameter [m×Gm×G]
bv	color index B-V magnitude difference
ub	color index U-B magnitude difference
IR	color index I-R magnitude difference
spec_B	spectral taxonomic type (SMASSII)
spec_T	spectral taxonomic type (Tholen)
neo	near earth object
pha	physically hazardous asteroid
moid	earth minimum orbit intersection distance [au]

🚀 Dataset 👩‍💻 :

Click Here for the Dataset

Process

1. Feature Enginerring and EDA

2. Data Analysis and Visualization

3. Modelling and Error Analysis

4. Advanced Modelling and Feature Engineering

5. Deployment and Productionization

👨🏻‍💻 Solution 💯

✅ Basic Operations 💯

File Name -> Feature_Engineering_and_EDA.ipynb

Imported Necessary Liberaries, Imported Dataset, Checked for shape of dataset, Finding the Information about datatypes of features, Spliting the features accouring to their datatype.

✅ Feature Engineering and Exploratorty Data Analysis 💯

File Name -> 1. Project_58_Phase _2_Ramesh_D_Koppad_Feature_Engineering_and_EDA.ipynb

1. Univeriate Analysis

a) On Numeric Data

Finding the mean, standard deviation, 25th percentile, 75th percentile, etc

b) On Categorical Data

Made a frequency graph of variables and concluded its outcome

2. Biveriate Analysis

a) On Numeric Data

Finding Correlation between features using heatmap and some functions

Outcome : 4 Features are found to be heavily correlated with pearson correlation coeffcient greater then 0.97 or less the -0.97

b) On Categorical Data

Performed Chi-Square Test on crosstab for finding out the relationship between the features

Outcome : Many Features have found to be related with p-value less than 0.05 (95% confidence)

3. Handeling Missing Values

a) On Categorical Data

Found the percentage of NaN values for each feature. Eliminated those features whose NaN values are greater then Non-NaN values and filled the rest with Mode of Data.

b) On Numeric Data

Found the percentage of NaN values for each feature. Eliminated those features whose NaN values are greater then Non-NaN values, Two features where having some values repeted many time in the data so filled the NaN values of this features with mean of top 10 most occuring values. Filled the remaining features missing value by mean of entire feature. As 'diameter' was the target variable so its missing values was needed to be droped. Then converted the 'diameter' to float type variable.

4. Handeling Outliers

Roughly estimated the number of outliers using box plot visualization. Then found the actual number of outliers greater then 4th Standard Deviation using Z-Score Method. Finally created a Upper Limit and Lower Limit of outliers using InterQunatile Range and used Capping method to treat this outliers(i.e Replacing the values greater then Upper limit with Upper limit and replacing the values lower then lower limit by lower limit.)

Saved the Clean Dataset with name 'Clean_Dataset.csv'

✅ Data Analysis and Visualization 💯

File Name -> 2. Project_58_Phase _2_Ramesh_D_Koppad_Data_Visualisation.ipynb

Imported 'Clean_Dataset.csv' dataset for visualization
Designed a heatmap of correlation with annotation
Designed a scatter plot for feature 'a' VS 'n'
Designed a scatter plot for feature 'n' VS 'per_y'
Designed a countour plot for categorical feature 'class'
Designed a categorical scatter plot for features 'class' VS 'data_arc'
Designed a categorical scatter plot for features 'ma' VS 'class'
Designed a multi-categorical scatter plot for feature 'class' VS 'per' with hue as 'pha'
Designed a box plot for feature 'class' VS 'Diameter'
Designed a point plot for feature 'pha' VS 'Diameter'
Designed a point plot for feature 'condition_code' VS 'Diameter'
Designed a point plot for feature 'condition_code' VS 'a'
Designed a kde plot of 'Diameter' VS 'Density'
Designed a kde plot of 'Diameter' VS 'Density' with hue as 'class'
Designed a Distribution Histogram for feature 'a'
Designed a Distribution Histogram for feature 'i'

✅ Modelling and Error Analysis 💯

File Name -> 3. Project_58_Phase_3_Ramesh_D_Koppad_Modelling_and_Error_Analysis.ipynb

Imported 'Clean_Dataset.csv' dataset for model formation.
Converted Categorical Data into numeric using Dummy Variable Trap (Droping one dummy variable column as it can be predicted by other).
Splited the Data into Training and Testing Data ( 80 % of Data is used for Training and Testing is performed on 20% of data).
Scaled the Data between 0 and 1 using Standard Scaler.
Designed a evaluation function which will return Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, R-Squared Value of the Model.
Algorithms and their R-Squared Values :

Some of the regressor algorithms are tested here and their R-Squared values are :

A) Random Forest :-> 0.9696

B) K Nearest Neighbour :-> 0.9116

C) Linear Regression :-> 0.8812

D) Decision Tree :-> 0.9402

E) XG Boost :-> 0.9703

✅ Advanced Modelling and Feature Engineering 💯

File Name -> 4. Project_58_Phase_4_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb

To increase the accuracy of Model I have performed additional updates as followes :

Checked for multi-collinearity in the data using variance inflation factor and eliminated such features having very high value.
Used 85% of data for training and performed testing on 15% of data.
Applied an additional Algorithm along with previous ones and Imporved R-Squared values where as follows :

A) Random Forest :-> 0.9696

B) K Nearest Neighbour :-> 0.9116

C) Linear Regression :-> 0.8812

D) Decision Tree :-> 0.9402

E) XG Boost :-> 0.9703

F) Optimized XG Boost :-> 0.9709

G) Artificial Neural Network :-> 0.9670

🌟 Conclusion After Update: Best Performing Model is Optimized XG Boost Regressor. 🌟

✅ Advanced Modelling and Feature Engineering 💯

File Name -> 5. Project_58_Phase_5_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb

To further increase the accuracy of Model I have performed additional updates as followes :

Used MinMax Scaler instead of Standard Scaler.
Increased the number of iterations on CatBoost Regressor.

Imporved R-Squared values where as follows :

A) Random Forest :-> 0.9696

B) K Nearest Neighbour :-> 0.9116

C) Linear Regression :-> 0.8812

D) Decision Tree :-> 0.9402

E) XG Boost :-> 0.9703

F) Optimized XG Boost :-> 0.9709

G) Artificial Neural Network :-> 0.9670

H) Cat Boost Regressor :-> 0.9727

🌟 Conclusion: Best Performing Model is Cat Boost Regressor. 🌟

Note: You can try out with another algorithms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Problem Statement 💡 :

Predict the diameter of the asteroids from the given dataset.🌌

🚀 Dataset 👩‍💻 :

Process

1. Feature Enginerring and EDA

2. Data Analysis and Visualization

3. Modelling and Error Analysis

4. Advanced Modelling and Feature Engineering

5. Deployment and Productionization

👨🏻‍💻 Solution 💯

✅ Basic Operations 💯

File Name -> Feature_Engineering_and_EDA.ipynb

✅ Feature Engineering and Exploratorty Data Analysis 💯

File Name -> 1. Project_58_Phase _2_Ramesh_D_Koppad_Feature_Engineering_and_EDA.ipynb

✅ Data Analysis and Visualization 💯

File Name -> 2. Project_58_Phase _2_Ramesh_D_Koppad_Data_Visualisation.ipynb

✅ Modelling and Error Analysis 💯

File Name -> 3. Project_58_Phase_3_Ramesh_D_Koppad_Modelling_and_Error_Analysis.ipynb

✅ Advanced Modelling and Feature Engineering 💯

File Name -> 4. Project_58_Phase_4_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb

✅ Advanced Modelling and Feature Engineering 💯

File Name -> 5. Project_58_Phase_5_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Deployment		Deployment
templates		templates
1. Project_58_Phase _2_Ramesh_D_Koppad_Feature_Engineering_and_EDA.ipynb		1. Project_58_Phase _2_Ramesh_D_Koppad_Feature_Engineering_and_EDA.ipynb
2. Project_58_Phase _2_Ramesh_D_Koppad_Data_Visualisation.ipynb		2. Project_58_Phase _2_Ramesh_D_Koppad_Data_Visualisation.ipynb
3. Project_58_Phase_3_Ramesh_D_Koppad_Modelling_and_Error_Analysis.ipynb		3. Project_58_Phase_3_Ramesh_D_Koppad_Modelling_and_Error_Analysis.ipynb
4. Project_58_Phase_4_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb		4. Project_58_Phase_4_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb
5. Project_58_Phase_5_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb		5. Project_58_Phase_5_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb
README.md		README.md

RameshKoppad/Asteroid_Project

Folders and files

Latest commit

History

Repository files navigation

🚀 Problem Statement 💡 :

Predict the diameter of the asteroids from the given dataset.🌌

🚀 Dataset 👩‍💻 :

Process

1. Feature Enginerring and EDA

2. Data Analysis and Visualization

3. Modelling and Error Analysis

4. Advanced Modelling and Feature Engineering

5. Deployment and Productionization

👨🏻‍💻 Solution 💯

✅ Basic Operations 💯

File Name -> Feature_Engineering_and_EDA.ipynb

✅ Feature Engineering and Exploratorty Data Analysis 💯

File Name -> 1. Project_58_Phase _2_Ramesh_D_Koppad_Feature_Engineering_and_EDA.ipynb

✅ Data Analysis and Visualization 💯

File Name -> 2. Project_58_Phase _2_Ramesh_D_Koppad_Data_Visualisation.ipynb

✅ Modelling and Error Analysis 💯

File Name -> 3. Project_58_Phase_3_Ramesh_D_Koppad_Modelling_and_Error_Analysis.ipynb

✅ Advanced Modelling and Feature Engineering 💯

File Name -> 4. Project_58_Phase_4_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb

✅ Advanced Modelling and Feature Engineering 💯

File Name -> 5. Project_58_Phase_5_Ramesh_D_Koppad_Advanced_Modeling_and_Feature_Engineering.ipynb

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages