playWithDistributions.Rmd

---
title: "Play with distributions"

output:
  html_document: default
  html_notebook: default
---

```{r setUp}
startTime <- Sys.time() # always good to time your code (see end)
```

```{r testNormal}
# Testing Normal with a mean of 500 and standard deviation of 100 ----
normTest <- rnorm(1000, mean = 500, sd = 100)
summary(normTest)
# if you want to force the x axis to have a specific range a - b, use the option xlim = c(a,b)
hist(normTest)

# use a qnormal plot to test for normality
# R lets us add a theoretical line to the plot for comparison
qqnorm(normTest, main = "Q-Q plot for normal distribution")
  qqline(normTest, col = 2)

```

```{r shapiroW}
# now test for normality using the Shapiro-Wilks test
shapiro.test(normTest)
# a p > 0.05 indicates the distribution is normal
```

```{r testSkewPos}
# Create a +ve skewed version by force ----
normTestPSkew <- normTest[normTest > mean(normTest)] # values > than the mean
summary(normTestPSkew)
hist(normTestPSkew, main = "Example of positive skew")

# use a qnormal plot to test for normality
qqnorm(normTestPSkew, main = "Q-Q plot for positive skew")
  qqline(normTestPSkew, col = 2)

# now test for normality using the Shapiro-Wilks test
shapiro.test(normTestPSkew)
# is the p value > 0.05 now?

# Test skew
library(e1071) # install first if needed
# or library(moments) # install first if needed
skewness(normTestPSkew)
```

```{r testSkewNeg}
# Create a -ve skewed version by force ----
normTestNSkew <- normTest[normTest < mean(normTest)] # values < than the mean
summary(normTestNSkew)
hist(normTestNSkew, main = "Example of negative skew")

# use a qnormal plot to test for normality
qqnorm(normTestNSkew, main = "Q-Q plot for negative skew")
  qqline(normTestNSkew, col = 2)

skewness(normTestNSkew)
```
```{r testPoisson}
# Testing Poisson, lambda = average number of events per unit time ----
poisTest <- rpois(1000, lambda = 3)
summary(poisTest)
hist(poisTest)
```

```{r testBinomial}
# Testing Binomial ----
binomTest <- rbinom(1000, 20, 0.2)
hist(binomTest)
summary(binomTest)
```

```{r testExp}
# Testing Exponential ----
expTest <- rexp(1000, 1)
hist(expTest)
summary(expTest)
```

```{r testStdNorm}
# Testing standard normal - R defaults to mean = 0 & sd = 1 if you don't specifiy ----
stNormTest <- rnorm(1000)
summary(stNormTest)
hist(stNormTest)
# add lower & upper 95% lines
abline(v = c(-1.96,1.96), col="red")
# add text labels
text(-1.96,100, "-1.96", col="red")
text(1.96,100, "1.96", col="red")
text(-1.96, 180, pos = "4", "<----------", col = "red")
text(1.96, 180, pos = "2", "---------->", col = "red")
text(0, 180, "XX% of the distribution is here", col = "red", pos = 3) # how to make background a different colour?
```

```{r calcCI}
# Calculating confidence intervals using the initial normal distribution ----
m <- mean(normTest)
s <- sd(normTest)
n <- length(normTest)
se <- s/sqrt(n)

error <- qnorm(0.975)*se

normTestCIu <- m + error
normTestCIl <- m - error

hist(normTest, sub = "Red line = mean, green lines = 95% CI of the mean - not central 95% of the distribution")
# add the mean
abline(v = m, col="red")
# add lower & upper 95% lines
abline(v = c(normTestCIl,normTestCIu), col="green")
```

Tell the user the answer rounding the answer to 3 decimal places:

 * normTest upper 95% CI: `r round(normTestCIu, 3)`
 * normTest mean: `r round(m, 3)`
 * normTest lower 95% CI: `r round(normTestCIl, 3)`

Now go back to the top and change the value of 1000 to 10,000 in:

 * normTest <- rnorm(1000, mean = 500, sd = 100)

Then re-run the whole script - what do you notice about the new 95% CIs?

# About

Code last run at: **`r Sys.time()`** 

Results saved to: `r getwd()`/

Analysis completed in: `r round(Sys.time() - startTime,2)` seconds using [knitr](https://cran.r-project.org/package=knitr) in [RStudio](http://www.rstudio.com).