-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathStatistical_significance
42 lines (29 loc) · 1.39 KB
/
Statistical_significance
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
###Working with gender and wages
### data here https://www.census.gov/data/tables/time-series/demo/income-poverty/historical-income-people.html
import pandas as pd
import matplotlib.pyplot as plt
df_1=pd.read_excel('p06a.xlsx', skiprows=7)
df_2=df_1.iloc[0:20]
df_3 = df_2.rename(columns={"Unnamed: 0": "Year", "Unnamed: 1": "Men_with_income", "Current\ndollars": 'Mean_current_dollars', '2019\ndollars': "Mean_2019_dollars", "Unnamed: 4": "Women_with_income", "Current\ndollars.1": "women_mean_current_dollars", "2019\ndollars.1": "Mean_w_2019_dollars"})
df_3['difference']=df_3['Mean_2019_dollars']-df_3['Mean_w_2019_dollars']
df_3['difference_pcnt']=df_3['Mean_2019_dollars']/df_3['Mean_w_2019_dollars']
nums=df_3['difference_pcnt']
years=df_3['Years']
years=df_3['Year'].astype(str).str[0:4]
years=years.values[::-1]
nums=nums.values[::-1]
plt.figure(figsize=(9, 3))
plt.plot(years, nums)
### Standard deviation of a proportion
import math
sample_standard_deviation= math.sqrt(p*(1-p))
Z test
z_value=(value-population_mean)/standard_deviation
T test =(sample_mean-population_mean)/standard_error
###Student, "The Probable Error of a Mean." Biometrika (Mar, 1908): 1-25.
standard_error=standard_deviation/root_samples_size
###What is wrong with this data? does it make sense?
###What is a log t
### Eample with SPSS http://core.ecu.edu/psyc/wuenschk/docs30/Sex-Salary.pdf
### Log transformation
### Kurtosis