-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathplayWithDistributions.Rmd
138 lines (109 loc) · 3.75 KB
/
playWithDistributions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: "Play with distributions"
output:
html_document: default
html_notebook: default
---
```{r setUp}
startTime <- Sys.time() # always good to time your code (see end)
```
```{r testNormal}
# Testing Normal with a mean of 500 and standard deviation of 100 ----
normTest <- rnorm(1000, mean = 500, sd = 100)
summary(normTest)
# if you want to force the x axis to have a specific range a - b, use the option xlim = c(a,b)
hist(normTest)
# use a qnormal plot to test for normality
# R lets us add a theoretical line to the plot for comparison
qqnorm(normTest, main = "Q-Q plot for normal distribution")
qqline(normTest, col = 2)
```
```{r shapiroW}
# now test for normality using the Shapiro-Wilks test
shapiro.test(normTest)
# a p > 0.05 indicates the distribution is normal
```
```{r testSkewPos}
# Create a +ve skewed version by force ----
normTestPSkew <- normTest[normTest > mean(normTest)] # values > than the mean
summary(normTestPSkew)
hist(normTestPSkew, main = "Example of positive skew")
# use a qnormal plot to test for normality
qqnorm(normTestPSkew, main = "Q-Q plot for positive skew")
qqline(normTestPSkew, col = 2)
# now test for normality using the Shapiro-Wilks test
shapiro.test(normTestPSkew)
# is the p value > 0.05 now?
# Test skew
library(e1071) # install first if needed
# or library(moments) # install first if needed
skewness(normTestPSkew)
```
```{r testSkewNeg}
# Create a -ve skewed version by force ----
normTestNSkew <- normTest[normTest < mean(normTest)] # values < than the mean
summary(normTestNSkew)
hist(normTestNSkew, main = "Example of negative skew")
# use a qnormal plot to test for normality
qqnorm(normTestNSkew, main = "Q-Q plot for negative skew")
qqline(normTestNSkew, col = 2)
skewness(normTestNSkew)
```
```{r testPoisson}
# Testing Poisson, lambda = average number of events per unit time ----
poisTest <- rpois(1000, lambda = 3)
summary(poisTest)
hist(poisTest)
```
```{r testBinomial}
# Testing Binomial ----
binomTest <- rbinom(1000, 20, 0.2)
hist(binomTest)
summary(binomTest)
```
```{r testExp}
# Testing Exponential ----
expTest <- rexp(1000, 1)
hist(expTest)
summary(expTest)
```
```{r testStdNorm}
# Testing standard normal - R defaults to mean = 0 & sd = 1 if you don't specifiy ----
stNormTest <- rnorm(1000)
summary(stNormTest)
hist(stNormTest)
# add lower & upper 95% lines
abline(v = c(-1.96,1.96), col="red")
# add text labels
text(-1.96,100, "-1.96", col="red")
text(1.96,100, "1.96", col="red")
text(-1.96, 180, pos = "4", "<----------", col = "red")
text(1.96, 180, pos = "2", "---------->", col = "red")
text(0, 180, "XX% of the distribution is here", col = "red", pos = 3) # how to make background a different colour?
```
```{r calcCI}
# Calculating confidence intervals using the initial normal distribution ----
m <- mean(normTest)
s <- sd(normTest)
n <- length(normTest)
se <- s/sqrt(n)
error <- qnorm(0.975)*se
normTestCIu <- m + error
normTestCIl <- m - error
hist(normTest, sub = "Red line = mean, green lines = 95% CI of the mean - not central 95% of the distribution")
# add the mean
abline(v = m, col="red")
# add lower & upper 95% lines
abline(v = c(normTestCIl,normTestCIu), col="green")
```
Tell the user the answer rounding the answer to 3 decimal places:
* normTest upper 95% CI: `r round(normTestCIu, 3)`
* normTest mean: `r round(m, 3)`
* normTest lower 95% CI: `r round(normTestCIl, 3)`
Now go back to the top and change the value of 1000 to 10,000 in:
* normTest <- rnorm(1000, mean = 500, sd = 100)
Then re-run the whole script - what do you notice about the new 95% CIs?
# About
Code last run at: **`r Sys.time()`**
Results saved to: `r getwd()`/
Analysis completed in: `r round(Sys.time() - startTime,2)` seconds using [knitr](https://cran.r-project.org/package=knitr) in [RStudio](http://www.rstudio.com).