Relationship between Suicide Rate and Happiness Report Factors/indicators

Project Steps:

Define the purpose/ problem
Search for the data
Download the data
Explore the data
Clean & prepare the data for analysis
Analyze & interpret the data
Visualize the data

Project Purpose: Identify the relationship between a crude suicide rate & the happiness measures, which are the next seven factors:happiness score, economic production (GPD per capita), social support, life expectancy, freedom, absence of corruption, and generosity of a country. These seven factors could be found in the World Happiness Report.

Data Source:

GDP per Capita and Suicide rates https://www.kaggle.com/harshav05/gdp-per-capita-and-suicide-rates World Happiness Report https://www.kaggle.com/unsdsn/world-happiness?select=2016.csv

Suicide Rate Dataset

Importing & Exploring:

import suicide rates dataset
explore the dataset

o see the first 5 rows (head())

o see the number of columns & rows (shape)

o see details about the columns (info())

o see if there is missing value(isnull().sum())

Data Cleaning:

remove the unnecessary columns
make the country name columns the index so we can join the datasets after
split this dataset to two datasets, one for 2015, and the other for 2016

Descriptive Analysis:

apply descriptive analysis on suicide rates for 2015 & 2016 to haver bigger idea about the data
apply boxplot to the dataset to explore the min, max, median, & outliers for each of the datasets
explore what are the top 5 countries for each year

The Results of the descriptive analysis:

2015:

the boxplot shows that the top 5 countries are outliers, which means that the top 5 countries have suicide rate that differ so much from the rest of the world
interesting that 4 of the top 5 countries are in Europe (Kazakhstan locates in Eurasia)

2016:

there was a slight decline in the number of outliers (from 5 to 4), in the max rates (in 2015 the max rate was almost 35 & in 2016 the max rate was almost 32)
Suriname took the 5th place, and as a result, there were 3 countries from Europe, and 2 from South America

World Happiness Report Dataset

Importing & Exploring:

import world happiness report 2015 dataset
explore the dataset

o see the first 5 rows (head())

o see the number of columns & rows (shape)

o see details about the columns (info())

o see if there is missing value(isnull().sum())

*This process run twice, first to import & explore world happiness report dataset for 2015 and the second to import & explore world happiness report dataset for 2016

Data Cleaning:

remove the unnecessary columns
make the country name columns the index so we can join the datasets after

*This process run twice, first to clean happiness report dataset for 2015 and the second to clean world happiness report dataset for 2016

merge the world happiness report for 2015 dataset with suicide rate for 2015 dataset in one dataset called “suiciderate_happiness_report_2015”
merge the world happiness report for 2016 dataset with suicide rate for 2016 dataset in one dataset called “suiciderate_happiness_report_2016”
there are two datasets prepared for analysis, suicide rates & happiness report for 2015 dataset & suicide rates & happiness report for 2016 dataset

Data Analysis & Visualization:

see the correlation between the variables in the suicide rate & happiness report for 2015 dataset using corr() function
create a correlation matrix using heat map
create a function that will return a scatter plot with a regression line, the Pearson Correlation Coefficient, the p-value & returns if there is a correlation, if the correlation is negative or positive, if it is weak, moderate, strong, or very strong and if it is significant or not

The function:

def correlation(x, y, t):

sns.regplot(x, y)

plt.title('Correlation between suicide rate &' + ' ' + t)

pearson_coef, p_value = stats.pearsonr(x, y)

print('The Pearson Correlation Coefficient is', pearson_coef, 'with a P-value of P =', p_value)

if 0 <= pearson_coef <= 0.19:

    print('There is NO correlation')
    
elif 0.20 <= pearson_coef <= 0.40:

    print('Weak positive correlation')
    
elif 0.40 <= pearson_coef <= 0.59:

    print('Moderate positive correlation')
    
elif 0.60 <= pearson_coef <= 0.79:

    print('Strong positive correlation')
    
elif 0.80 <= pearson_coef <= 1:

    print('Very stong positive correlation')
    
elif -0.19 <= pearson_coef <= -0.01:

    print('There is NO correlation')
    
elif -0.39 <= pearson_coef <= -0.20:

    print('Weak negative correlation')
    
elif -0.59 <= pearson_coef <= -0.40:

    print('Moderate negative correlation')
    
elif -0.79 <= pearson_coef <= -0.60:

    print('Strong negative correlation')
    
elif -0.80 >= pearson_coef >= -1:

    print('Very stong negative correlation')
    
if p_value <= 0.05 and 0.20 <= pearson_coef <= 1:

    print('This positive correlation is significant')
    
elif p_value <= 0.05 and -0.20 >= pearson_coef >= -1:

    print('This negative correlation is significant')
    
elif p_value > 0.05 and 0.20 <= pearson_coef <= 1:

    print('This positive correlation is NOT significant')
    
elif p_value > 0.05 and -0.20 >= pearson_coef >= -1:

    print('This negative correlation is NOT significant')

create y2015 variable that presents suicide rate for 2015
find the correlation between suicide rate for 2015 & each indicator of happiness report for 2015 (7 factors/indicators) correlation between suicide rate & Happiness Score 2015 using the correlation function we’ve created
see the correlation between the variables in the suicide rate & happiness report for 2015 dataset using corr() function
create a correlation matrix using heat map
create y2016 variable that presents suicide rate for 2016
find the correlation between suicide rate for 2015 & each indicator of happiness report for 2016 (7 factors/indicators) correlation between suicide rate & Happiness Score 2016 using the correlation function

Results:

2015:

There are weak positive correlations between suicide rate (dependent variable/ target) and Happiness Score, Economy (GDP per Capita), Family and Health (Life Expectancy) for 2015 (independent variables/ predictors) and these correlations are significant

2016:

as in the previous analysis, there are weak positive correlations between suicide rate (dependent variable/ target) and Happiness Score, Economy (GDP per Capita), Family and Health (Life Expectancy) for 2016 (independent variables/ predictors) and these correlations are significant

Main Result:

according to these analyses, we can state that a country that has a higher happiness score, higher GDP, higher family (social support), or higher Health (Life Expectancy), tends to have a higher suicide rate.
As a result, we can state that the developed countries tend to have higher suicide rates

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
images		images
README.md		README.md
Relationship between Suicide Rate and Happiness Report Factors 2015 - 2016.ipynb		Relationship between Suicide Rate and Happiness Report Factors 2015 - 2016.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Relationship between Suicide Rate and Happiness Report Factors/indicators

About

Releases

Packages

Languages

Reemalraeai/Relationship-between-Suicide-Rate-and-Happiness-Report-Factors

Folders and files

Latest commit

History

Repository files navigation

Relationship between Suicide Rate and Happiness Report Factors/indicators

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages