-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathassignment4_Perreault-graded.Rmd
396 lines (316 loc) · 13.1 KB
/
assignment4_Perreault-graded.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
---
title: 'Bios 6301: Assignment 4'
author: 'Andrea Perreault'
output: pdf_document
---
**Grade 46/50**
*Due Tuesday, 01 November, 1:00 PM*
$5^{n=day}$ points taken off for each day late.
50 points total.
Submit a single knitr file (named `homework4.rmd`), along with a valid PDF output file. Inside the file, clearly indicate which parts of your responses go with which problems (you may use the original homework document as a template). Add your name as `author` to the file's metadata section. Raw R code/output or word processor files are not acceptable.
Failure to name file `homework4.rmd` or include author name may result in 5 points taken off.
### Question 1 ###
**15 points**
A problem with the Newton-Raphson algorithm is that it needs the derivative $f'$. If the derivative is hard to compute or does not exist, then we can use the *secant method*, which only requires that the function $f$ is continuous.
Like the Newton-Raphson method, the **secant method** is based on a linear approximation to the function $f$. Suppose that $f$ has a root at $a$. For this method we assume that we have *two* current guesses, $x_0$ and $x_1$, for the value of $a$. We will think of $x_0$ as an older guess and we want to replace the pair $x_0$, $x_1$ by the pair $x_1$, $x_2$, where $x_2$ is a new guess.
To find a good new guess $x_2$ we first draw the straight line from $(x_0,f(x_0))$ to $(x_1,f(x_1))$, which is called a secant of the curve $y = f(x)$. Like the tangent, the secant is a linear approximation of the behavior of $y = f(x)$, in the region of the points $x_0$ and $x_1$. As the new guess we will use the x-coordinate $x_2$ of the point at which the secant crosses the x-axis.
The general form of the recurrence equation for the secant method is:
$$x_{i+1} = x_i - f(x_i)\frac{x_i - x_{i-1}}{f(x_i) - f(x_{i-1})}$$
Notice that we no longer need to know $f'$ but in return we have to provide *two* initial points, $x_0$ and $x_1$.
**Write a function that implements the secant algorithm.** Validate your program by finding the root of the function $f(x) = \cos(x) - x$. Compare its performance with the Newton-Raphson method -- which is faster, and by how much? For this example $f'(x) = -\sin(x) - 1$.
```{r}
# secant method
secant <- function(g1, g2, f, tol=10e-7, maxiter=1000) {
i <- 0
while(abs(g1-g2) > tol && i < maxiter) {
x <- g2 - f(g2)*(g2-g1)/(f(g2)-f(g1))
g2 <- g1
x <- g2
i <- i + 1
}
if(i==maxiter) {
print('failed to converge')
return(NULL)
} else {
print(sprintf('converges at %s after %s iterations', x, i))
}
}
f <- function(x) cos(x) - x
# compare to Newton-Raphson method
newton <- function(guess, f, fp, tol=10e-7, maxiter=1000) {
i <- 1 # counter
while(abs(f(guess)) > tol && i < maxiter) {
guess <- guess - f(guess)/fp(guess)
i <- i + 1
}
if(i==maxiter) {
print('failed to converge')
return(NULL)
} else {
print(sprintf('converges at %s after %s iterations', guess, i))
}
}
f <- function(x) cos(x) - x
fp <- function(x) (-sin(x))-1
```
```{r eval=FALSE}
# test 1
(system.time(replicate(1000, secant(0.5, 0.75, f))))
(system.time(replicate(1000, newton(0.5, f, fp))))
# test 2
(system.time(replicate(1000, secant(0, 0.2, f))))
(system.time(replicate(1000, newton(0.25, f, fp))))
```
```
The Newton-Raphson method is faster, taking 0.155 seconds to complete 1000 replicates.
This is compared to 0.189 seconds for the secant method. However, the amount of time it
takes to find the root for these methods varies depending on the guesses input by the
user. For example, it takes 0.175 seconds for the secant method to find the root if the
user gives 0 and 0.2 as initial guesses, compared to 0.210 with the Newton-Raphson method.
```
**JC Grading -1**
**Comment: ** Please also provide output showing that the secant and newton rhapson methods converge on the same answer.
### Question 2 ###
**18 points**
The game of craps is played as follows. First, you roll two six-sided dice; let x be the sum of the dice on the first roll. If x = 7 or 11 you win, otherwise you keep rolling until either you get x again, in which case you also win, or until you get a 7 or 11, in which case you lose.
Write a program to simulate a game of craps. You can use the following snippet of code to simulate the roll of two (fair) dice:
```{r}
x <- sum(ceiling(6*runif(2)))
```
1. The instructor should be able to easily import and run your program (function), and obtain output that clearly shows how the game progressed. Set the RNG seed with `set.seed(100)` and show the output of three games. (lucky 13 points)
```{r}
set.seed(100)
craps <- function(x, numGames) {
results <- matrix(0, nrow = numGames, ncol = 2, dimnames = list(NULL, c("Win", "Lose")))
for(i in 1:numGames) {
win <- FALSE
lose <- FALSE
if(x==7||x==11){
win <- TRUE
print('Yay! You win :)')
} else {
x1 <- 0
while(x1!=x && x1!=7 && x1!=11){
print('Cross your fingers and roll again.')
x1 <- sum(ceiling(6*runif(2)))
print(x1)
}
if(x1==7||x1==11){
lose <- TRUE
print("Sorry, you lose :(")
} else {
win <- TRUE
print(sprintf('You rolled %s again, you win!', x))
}
}
if(win) {
results[i, 1] <- 1
} else {
results[i, 2] <- 1
}
}
return(results)
}
x=sum(ceiling(6*runif(2)))
results <- craps(x, 3)
results
```
2. Find a seed that will win ten straight games. Consider adding an argument to your function that disables output. Show the output of the ten games. (5 points)
```{r}
set.seed(500)
craps <- function(x, numGames) {
results <- matrix(0, nrow = numGames, ncol = 2, dimnames = list(NULL, c("Win", "Lose")))
for(i in 1:numGames) {
win <- FALSE
lose <- FALSE
if(x==7||x==11){
win <- TRUE
} else {
x1 <- 0
while(x1!=x && x1!=7 && x1!=11){
x1 <- sum(ceiling(6*runif(2)))
}
if(x1==7||x1==11){
lose <- TRUE
} else {
win <- TRUE
}
}
if(win) {
results[i, 1] <- 1
} else {
results[i, 2] <- 1
}
}
return(results)
}
x=sum(ceiling(6*runif(2)))
results <- craps(x, 10)
results
```
### Question 3 ###
**12 points**
Obtain a copy of the [football-values lecture](https://github.com/couthcommander/football-values). Save the five 2016 CSV files in your working directory.
Modify the code to create a function. This function will create dollar values given information (as arguments) about a league setup. It will return a data.frame and write this data.frame to a CSV file. The final data.frame should contain the columns 'PlayerName', 'pos', 'points', 'value' and be orderd by value descendingly. Do not round dollar values.
Note that the returned data.frame should have `sum(posReq)*nTeams` rows.
Define the function as such (6 points):
```{r}
# path: directory path to input files
# file: name of the output file; it should be written to path
# nTeams: number of teams in league
# cap: money available to each team
# posReq: number of starters for each position
# points: point allocation for each category
ffvalues <- function(path, file='outfile.csv', nTeams=12, cap=200, posReq=c(qb=1, rb=2, wr=3, te=1, k=1),
points=c(fg=4, xpt=1, pass_yds=1/25, pass_tds=4, pass_ints=-2, rush_yds=1/10, rush_tds=6, fumbles=-2, rec_yds=1/20, rec_tds=6)) {
# read in files
k <- read.csv('proj_k16.csv', header=TRUE, stringsAsFactors=FALSE)
qb <- read.csv('proj_qb16.csv', header=TRUE, stringsAsFactors=FALSE)
rb <- read.csv('proj_rb16.csv', header=TRUE, stringsAsFactors=FALSE)
te <- read.csv('proj_te16.csv', header=TRUE, stringsAsFactors=FALSE)
wr <- read.csv('proj_wr16.csv', header=TRUE, stringsAsFactors=FALSE)
# generate unique list of column names
cols <- unique(c(names(k), names(qb), names(rb), names(te), names(wr)))
k[,'pos'] <- 'k'
qb[,'pos'] <- 'qb'
rb[,'pos'] <- 'rb'
te[,'pos'] <- 'te'
wr[,'pos'] <- 'wr'
# append 'pos' to unique column list
cols <- c(cols, 'pos')
# create common columns in each data.frame, initialize values to zero
k[,setdiff(cols, names(k))] <- 0
qb[,setdiff(cols, names(qb))] <- 0
rb[,setdiff(cols, names(rb))] <- 0
te[,setdiff(cols, names(te))] <- 0
wr[,setdiff(cols, names(wr))] <- 0
# combine data.frames by row, using consistent column order
x <- rbind(k[,cols], qb[,cols], rb[,cols], te[,cols], wr[,cols])
# convert NFL stat to fantasy points
x[,'p_fg'] <- x[,'fg']*points[[1]]
x[,'p_xpt'] <- x[,'xpt']*points[[2]]
x[,'p_pass_yds'] <- x[,'pass_yds']*points[[3]]
x[,'p_pass_tds'] <- x[,'pass_tds']*points[[4]]
x[,'p_pass_ints'] <- x[,'pass_ints']*points[[5]]
x[,'p_rush_yds'] <- x[,'rush_yds']*points[[6]]
x[,'p_rush_tds'] <- x[,'rush_tds']*points[[7]]
x[,'p_fumbles'] <- x[,'fumbles']*points[[8]]
x[,'p_rec_yds'] <- x[,'rec_yds']*points[[9]]
x[,'p_rec_tds'] <- x[,'rec_tds']*points[[10]]
# sum selected column values for every row
x[,'points'] <- rowSums(x[,grep("^p_", names(x))])
# create new data.frame ordered by points descendingly
x2 <- x[order(x[,'points'], decreasing=TRUE),]
# determine the row indeces for each position
k.idx <- which(x2[,'pos']=='k')
qb.idx <- which(x2[,'pos']=='qb')
rb.idx <- which(x2[,'pos']=='rb')
te.idx <- which(x2[,'pos']=='te')
wr.idx <- which(x2[,'pos']=='wr')
# calculate marginal points by subtracting "baseline" player's points
x2[k.idx, 'marg'] <- x2[k.idx,'points'] - x2[k.idx[12],'points']
x2[qb.idx, 'marg'] <- x2[qb.idx,'points'] - x2[qb.idx[12],'points']
x2[rb.idx, 'marg'] <- x2[rb.idx,'points'] - x2[rb.idx[24],'points']
x2[te.idx, 'marg'] <- x2[te.idx,'points'] - x2[te.idx[12],'points']
x2[wr.idx, 'marg'] <- x2[wr.idx,'points'] - x2[wr.idx[36],'points']
# create a new data.frame subset by non-negative marginal points
x3 <- x2[x2[,'marg'] >= 0,]
# re-order by marginal points
x3 <- x3[order(x3[,'marg'], decreasing=TRUE),]
# reset the row names
rownames(x3) <- NULL
# calculation for player value
x3[,'value'] <- x3[,'marg']*(nTeams*cap-nrow(x3))/sum(x3[,'marg']) + 1
# save dollar values to a csv file
x4 <- x3[,c('PlayerName','pos','points','value')]
write.csv(x4, file)
# return data.frame with dollar values
as.data.frame(x4, row.names=NULL)
}
```
1. Call `x1 <- ffvalues('.')`
a. How many players are worth more than $20? (1 point)
```{r}
x1 <- ffvalues('.')
which(x1[,'value']>20)
```
```
There are 46 players that are worth more than $20.
```
b. Who is 15th most valuable running back (rb)? (1 point)
```{r}
which(x1[,'pos']=='rb')[15]
```
```
The 15th most valuable running back is at position 47, which is Carlos Hyde.
```
2. Call `x2 <- ffvalues(getwd(), '16team.csv', nTeams=16, cap=150)`
a. How many players are worth more than $20? (1 point)
```{r}
x2 <- ffvalues(getwd(), '16team.csv', nTeams=16, cap=150)
which(x2[,'value']>20)
```
```
There are 46 players that are worth more than $20.
```
b. How many wide receivers (wr) are in the top 40? (1 point)
```{r}
which(x2[,'pos']=='wr')[1:40]
```
```
There are 36 wide receivers in the top 40.
```
**JC Grading -1**
**Comment: ** which tells you the position of the wide receivers. So, count how many have the position <= 40. There are 18.
3. Call:
```{r eval=FALSE}
x3 <- ffvalues('.', 'qbheavy.csv', posReq=c(qb=2, rb=2, wr=3, te=1, k=0),
points=c(fg=0, xpt=0, pass_yds=1/25, pass_tds=6, pass_ints=-2,
rush_yds=1/10, rush_tds=6, fumbles=-2, rec_yds=1/20, rec_tds=6))
```
a. How many players are worth more than $20? (1 point)
```{r}
x3 <- ffvalues('.', 'qbheavy.csv', posReq=c(qb=2, rb=2, wr=3, te=1, k=0), points=c(fg=0, xpt=0,
pass_yds=1/25, pass_tds=6, pass_ints=-2, rush_yds=1/10, rush_tds=6, fumbles=-2, rec_yds=1/20, rec_tds=6))
which(x3[,'value']>20)
```
```
There are 46 players that are worth more than $20.
```
b. How many quarterbacks (qb) are in the top 30? (1 point)
```{r}
which(x3[,'pos']=='qb')[1:30]
```
```
There are 12 quarterbacks in the top 30.
```
**JC Grading -2**
**Comment: ** The answers to the above to questions are 50 and 10, respectively. Look to the commands under comment: # calculate marginal points by subtracting "baseline" player's points. You've hard coded the number of players in the position rather than using the function's input posReq.
### Question 4 ###
**5 points**
This code makes a list of all functions in the base package:
```{r}
objs <- mget(ls("package:base"), inherits = TRUE)
funs <- Filter(is.function, objs)
```
Using this list, write code to answer these questions.
1. Which function has the most arguments? (3 points)
```{r}
numArgs <- matrix(0, nrow = length(funs), ncol = 2, dimnames = list(NULL, c("Function", "Arguments")))
numArgs <- as.data.frame(numArgs)
x1 <- lapply(funs, function(x) length(formals(x)))
numArgs[,1] <- names(x1)
numArgs[,2] <- as.integer(x1)
numArgs <- numArgs[order(numArgs[,'Arguments'], decreasing=TRUE),]
head(numArgs)
```
```
The scan function has 22 arguments.
```
2. How many functions have no arguments? (2 points)
```{r}
which(numArgs[,2]==0)
```
```
There are 225 functions with no arguments.
```