-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathjoyPlots.Rmd
104 lines (70 loc) · 2.33 KB
/
joyPlots.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: "Testing the survey package"
author: Ben Anderson ([email protected], `@dataknut`)
date: 'Last run at: `r Sys.time()`'
output:
html_document:
fig_caption: yes
keep_md: yes
number_sections: yes
toc: yes
bibliography: ~/bibliography.bib
---
# Libraries used
* survey [@survey] - for survey data (weighting etc)
* ggplot [@ggplot2] - for graphs
* data.table [@data.table] - for fast big tables
* skimr [@skimr] - for summaries
* knitr [@knitr] - for knited tables
```{r setup}
knitr::opts_chunk$set(echo = TRUE)
library(survey)
library(ggplot2)
library(data.table)
library(skimr)
library(knitr)
```
Load some data that looks a bit like survey data (ggplot's diamonds data) and turn it into a data.table. Then create a random value between 0 and 1 which we will use as a pretend survey non-response weight.
```{r set data}
diamondsDT <- data.table(diamonds)
skim(diamonds)
diamondsDT <- diamondsDT[, wt := runif(nrow(diamondsDT))]
t <- diamondsDT[, .(Mean = mean(wt), Min = min(wt), Max = max(wt), N = .N)]
kable(caption = "Distribution of wt", t)
# create a fake unique 'id' for each row
diamondsDT <- diamondsDT[, id := seq(1:nrow(diamondsDT))]
# create 1/0 code for cut = Fair
diamondsDT <- diamondsDT[, isFair := ifelse(cut == "Fair", 1, 0)]
# check
with(diamondsDT, table(isFair, cut, useNA = "always"))
```
Now we use the wt to pretend to create a large survey design object which has a non-response weight = wt.
```{r Set weight}
diamondsSVY <- svydesign(ids = ~ id,
weights = ~wt,
data = diamondsDT
)
```
Now compare a simple and a weighted table.
```{r Test simple tables}
uwt <- table(diamondsDT$cut, diamondsDT$color)
kable(caption = "Cut vs colour, unweighted", uwt)
wt <- svytable(~ cut + color, diamondsSVY)
kable(caption = "Cut vs colour, unweighted", round(wt,2))
```
Something of a difference...
Now test what happens when we set a category to 1/0 and take the mean and proportion.
```{r Test survey mean and proportions}
meanT <- svymean(isFair ~ color, diamondsSVY)
print("Using svymean")
meanT
confint(meanT)
propT <- svyciprop(~ isFair, diamondsSVY)
print("Using svyciprop")
propT
tableT <- svytable(~ color + isFair, diamondsSVY)
print("Using svytable")
tableT
tableDT <- as.data.table(tableT)
```
# References