-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathassignment3_Perreault.rmd
197 lines (146 loc) · 4.87 KB
/
assignment3_Perreault.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
title: 'Bios 6301: Assignment 3'
author: 'Andrea Perreault'
output: pdf_document
---
*Due Tuesday, 11 October, 1:00 PM*
50 points total.
$5^{n=day}$ points taken off for each day late.
**QUESTION 1**
**10 points**
1. Use GitHub to turn in the first three homework assignments. Make sure the teacher (couthcommander) and TA (chipmanj) are collaborators. (5 points)
2. Commit each assignment individually. This means your repository should have at least three commits. (5 points)
**QUESTION 2**
**15 points**
Write a simulation to calculate the power for the following study
design. The study has two variables, treatment group and outcome.
There are two treatment groups (0, 1) and they should be assigned
randomly with equal probability. The outcome should be a random normal
variable with a mean of 60 and standard deviation of 20. If a patient
is in the treatment group, add 5 to the outcome. 5 is the true
treatment effect. Create a linear model for the outcome by the
treatment group, and extract the p-value (hint: see assigment1).
Test if the p-value is less than or equal to the alpha level, which
should be set to 0.05.
Repeat this procedure 1000 times. The power is calculated by finding
the percentage of times the p-value is less than or equal to the alpha
level. Use the `set.seed` command so that the professor can reproduce
your results.
1. Find the power when the sample size is 100 patients. (10 points)
```{r}
patients1 = 100
groups = c(0,1)
mean = 60
sd = 20
alpha = 0.05
p_vals = numeric(1000)
significant1 = 0
set.seed(100)
for (i in seq(p_vals)) {
treatment <- sample(groups, patients1, replace = TRUE)
outcome <- rnorm(100, mean, sd)
data <- data.frame(cbind(treatment, outcome))
data$outcome[data$treatment == 1] <- data$outcome[data$treatment == 1] + 5
model <- lm(outcome ~ treatment, data = data)
pval <- summary(model)$coefficients[2,4]
p_vals[i] <- pval
if (pval < alpha) {
significant1 = significant1 + 1
#print("TRUE")
} else {
#print("FALSE")
}
}
significant1
power = significant1 / length(p_vals)
power
```
2. Find the power when the sample size is 1000 patients. (5 points)
```{r}
patients2 = 1000
significant2 = 0
for (i in seq(p_vals)) {
treatment <- sample(groups, patients2, replace = TRUE)
outcome <- rnorm(1000, mean, sd)
data <- data.frame(cbind(treatment, outcome))
data$outcome[data$treatment == 1] <- data$outcome[data$treatment == 1] + 5
model <- lm(outcome ~ treatment, data = data)
pval <- summary(model)$coefficients[2,4]
p_vals[i] <- pval
if (pval < alpha) {
significant2 = significant2 + 1
#print("TRUE")
} else {
#print("FALSE")
}
}
significant2
power = significant2 / length(p_vals)
power
```
**QUESTION 3**
**15 points**
Obtain a copy of the [football-values lecture](https://github.com/couthcommander/football-values).
Save the `2016/proj_wr16.csv` file in your working directory. Read
in the data set and remove the first two columns.
```{r}
wr <- read.csv("proj_wr16.csv", header = TRUE, sep = ",")
head(wr)
wr[,1] <- NULL
wr[,1] <- NULL
head(wr)
```
1. Show the correlation matrix of this data set. (3 points)
```{r}
cor.wr <- cor(wr)
cor.wr
```
2. Generate a data set with 30 rows that has a similar correlation
structure. Repeat the procedure 10,000 times and return the mean
correlation matrix. (10 points)
```{r}
cor.wr <- cor(wr)
cov.wr <- var(wr)
means.wr <- colMeans(wr)
library(MASS)
wr.sim1 <- mvrnorm(30, mu = means.wr, Sigma = cov.wr, empirical = FALSE)
cor.sim1 <- cor(wr.sim1)
cor.sim1; cor.wr
matrix.wr <- 0
sims <- 10000
for (i in seq(sims)) {
wr.sim1 <- mvrnorm(30, mu = means.wr, Sigma = cov.wr, empirical = FALSE)
matrix.wr <- matrix.wr + cor(wr.sim1)
}
matrix.mean <- matrix.wr/sims
matrix.mean
```
3. Generate a data set with 30 rows that has the exact correlation
structure as the original data set. (2 points)
```{r}
wr.sim2 <- mvrnorm(30, mu = means.wr, Sigma = cov.wr, empirical = TRUE)
cor.sim2 <- cor(wr.sim2)
cor.sim2; cor.wr
```
**QUESTION 4**
**10 points**
Use \LaTeX to create the following expressions.
1. Equation 1 (4 points)
![equation1](eq1.png)
\begin{equation}
P(B) = \sum_{j} P(B | A_{j}) P(A_{j}),
\\
\Rightarrow P(A_{i} | B) = \frac{P(B | A_{i}) P(A_{i})}{\sum_{j} P(B | A_{j}) P(A_{j})}
\end{equation}
2. Equation 2 (3 points)
![equation2](eq2.png)
\begin{equation}
\hat{f} (\zeta) = \int_\infty^\infty f(x) e^{-2 \pi i x \zeta} \, dx
\end{equation}
3. Equation 3 (3 points)
![equation3](eq3.png)
\begin{equation}
\textbf{J} = \frac{d\textbf{f}}{d\textbf{x}}
= \begin{bmatrix} \frac{\partial \textbf{f}}{\partial x_{1}} & \cdots & \frac{\partial \textbf{f}}{\partial x_{n}} \end{bmatrix}
= \begin{bmatrix} \frac{\partial f}{\partial x_{1}} & \cdots & \frac{\partial f}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial fm}{\partial x_{1}} & \cdots & \frac{\partial fm}{\partial x_{n}} \end{bmatrix}
\end{equation}