-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
231 lines (173 loc) · 7.73 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
fig.retina = 2
)
```
<!-- badges: start -->
# ConfidenceEllipse <img src="man/figures/logo.png" align="right" height="159" alt="" />
[![R-CMD-check](https://github.com/ChristianGoueguel/ConfidenceEllipse/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ChristianGoueguel/ConfidenceEllipse/actions/workflows/R-CMD-check.yaml)
[![Project Status: Active – The project has reached a stable, usable
state and is being actively
developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![Lifecycle:
stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![License:
MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![codecov](https://codecov.io/gh/ChristianGoueguel/ConfidenceEllipse/graph/badge.svg?token=JAGMXN2F70)](https://app.codecov.io/gh/ChristianGoueguel/ConfidenceEllipse)
[![](https://cranlogs.r-pkg.org/badges/last-week/ConfidenceEllipse?color=orange)](https://cran.r-project.org/package=ConfidenceEllipse)
[![](https://cranlogs.r-pkg.org/badges/ConfidenceEllipse?color=yellowgreen)](https://cran.r-project.org/package=ConfidenceEllipse)
[![](https://cranlogs.r-pkg.org/badges/grand-total/ConfidenceEllipse)](https://cran.r-project.org/package=ConfidenceEllipse)
[![CRAN
status](https://www.r-pkg.org/badges/version/ConfidenceEllipse)](https://CRAN.R-project.org/package=ConfidenceEllipse)
[![HitCount](https://hits.dwyl.com/ChristianGoueguel/ConfidenceEllipse.svg?style=flat)](http://hits.dwyl.com/ChristianGoueguel/ConfidenceEllipse)
<!-- badges: end -->
The `ConfidenceEllipse` package computes the coordinate points of confidence region for a given bivariate and trivariate dataset. The size of the elliptical region is determined by the confidence level, and the shape is determined by the covariance matrix. The confidence level is usually chosen to be 95% or 99%, and the resulting confidence region contains the points that are expected to lie within the multivariate distribution.
## Installation
You can install `ConfidenceEllipse` from CRAN using:
```r
install.packages("ConfidenceEllipse")
```
Alternatively you can grab the development version from github using devtools:
```r
install.packages("devtools")
devtools::install_github("ChristianGoueguel/ConfidenceEllipse")
```
## Example
```{r message=FALSE, warning=FALSE}
library(magrittr)
library(tidyselect)
library(patchwork)
library(dplyr)
library(ggplot2)
library(purrr)
library(tidyr)
```
```{r message=FALSE, warning=FALSE}
library(ConfidenceEllipse)
```
### Dataset
The dataset is comprised of 13 different measurements for 180
archaeological glass vessels from different groups (Janssen, K.H.A., De
Raedt, I., Schalm, O., Veeckman, J.: Microchim. Acta 15 (suppl.) (1998)
253-267. Compositions of 15th - 17th century archaeological glass
vessels excavated in Antwerp.)
```{r}
data("glass", package = "ConfidenceEllipse")
```
```{r}
glass %>% glimpse()
```
### Confidence Region
#### Classical and robust confidence ellipse
First, the `confidence_ellipse` function is used to compute coordinate points of the confidence ellipse and then the ellipse is plotted on a two-dimensional plot `x` and `y` of the data. Points that lie outside the ellipse are considered to be outliers, while points that lie within the ellipse are considered to be part of the underlying distribution with the specified confidence level `conf_level`.
```{r message=FALSE, warning=FALSE}
ellipse_95 <- confidence_ellipse(glass, x = SiO2, y = Na2O, conf_level = 0.95)
rob_ellipse_95 <- confidence_ellipse(glass, x = SiO2, y = Na2O, conf_level = 0.95, robust = TRUE)
```
```{r message=FALSE, warning=FALSE}
ellipse_95 %>% glimpse()
```
```{r}
cutoff <- qchisq(0.95, df = 2)
MDsquared <- glass %>%
select(SiO2, Na2O) %>%
as.matrix() %>%
mahalanobis(colMeans(.), cov(.), inverted = FALSE)
```
```{r}
plot1 <-
ggplot() +
geom_path(data = ellipse_95, aes(x = x, y = y), color = "blue", linewidth = 1L) +
geom_point(data = glass %>% mutate(md = MDsquared) %>% filter(md <= cutoff), aes(x = SiO2, y = Na2O), shape = 21L, color = "black", fill = "lightblue", size = 3L) +
geom_point(data = glass %>% mutate(md = MDsquared) %>% filter(md > cutoff), aes(x = SiO2, y = Na2O), shape = 21L, color = "black", fill = "gold", size = 3L) +
labs(x = "SiO2 (wt.%)", y = "Na2O (wt.%)", title = "Classical confidence ellipse\nat 95% confidence level") +
theme_bw() +
theme(legend.position = "none")
```
```{r}
x_mcd <- glass %>%
select(SiO2, Na2O) %>%
as.matrix() %>%
robustbase::covMcd()
```
```{r}
rob_MDsquared <- glass %>%
select(SiO2, Na2O) %>%
as.matrix() %>%
mahalanobis(x_mcd$center, x_mcd$cov)
```
```{r}
plot2 <-
ggplot() +
geom_path(data = rob_ellipse_95, aes(x = x, y = y), color = "blue", linewidth = 1L) +
geom_point(data = glass %>% mutate(md = rob_MDsquared) %>% filter(md <= cutoff), aes(x = SiO2, y = Na2O), shape = 21L, color = "black", fill = "lightblue", size = 3L) +
geom_point(data = glass %>% mutate(md = rob_MDsquared) %>% filter(md > cutoff), aes(x = SiO2, y = Na2O), shape = 21L, color = "black", fill = "gold", size = 3L) +
labs(x = "SiO2 (wt.%)", y = "Na2O (wt.%)", title = "Robust confidence ellipse\nat 95% confidence level") +
theme_bw() +
theme(legend.position = "none")
```
```{r}
plot1 | plot2
```
##### Grouping
For grouping bivariate data, the `.group_by` argument can be used if the data contains an unique grouping variable (`.group_by = NULL` by default). When a grouping variable is provided, the function will compute the ellipses separately for each level of the factor, allowing you to explore potential differences or patterns within subgroups of the data.
It's important to note that the grouping variable should be appropriately coded as a factor before passing it to the `.group_by` argument. If the variable is currently stored as a character or numeric type, you may need to convert it to a factor using functions like `as.factor()` or `forcats::as_factor()`.
```{r}
rpca_scores <- glass %>%
select(where(is.numeric) )%>%
pcaPP::PCAproj(method = "qn") %>%
pluck("scores") %>%
as_tibble() %>%
mutate(glassType = glass %>% pull(glassType)) %>%
rename(PC1 = Comp.1, PC2 = Comp.2)
```
```{r message=FALSE, warning=FALSE}
ellipse_pca <- rpca_scores %>% confidence_ellipse(x = PC1, y = PC2, .group_by = glassType)
```
```{r}
ggplot() +
geom_point(data = rpca_scores, aes(x = PC1, y = PC2, color = glassType, shape = glassType), size = 3L) +
geom_path(data = ellipse_pca, aes(x = x, y = y, color = glassType), linewidth = 1L) +
scale_color_brewer(palette = "Set1", direction = 1) +
labs(x = "PC1", y = "PC2", title = "Principal component analysis") +
theme_bw() +
theme(legend.position = "none")
```
#### Ellipsoid
The `confidence_ellipsoid` function accepts an additional variable `z` and computes the ellipsoid for trivariate data.
```{r message=FALSE, warning=FALSE}
ellipsoid_grp <- glass %>% confidence_ellipsoid(SiO2, Na2O, Fe2O3, glassType)
```
```{r}
ellipsoid_grp %>% glimpse()
```
```{r message=FALSE, warning=FALSE}
rgl::setupKnitr(autoprint = TRUE)
rgl::plot3d(
x = ellipsoid_grp$x,
y = ellipsoid_grp$y,
z = ellipsoid_grp$z,
xlab = "SiO2 (wt.%)",
ylab = "Na2O (wt.%)",
zlab = "Fe2O3 (wt.%)",
type = "s",
radius = 0.03,
col = as.numeric(ellipsoid_grp$glassType)
)
rgl::points3d(
x = glass$SiO2,
y = glass$Na2O,
z = glass$Fe2O3,
col = as.numeric(glass$glassType),
size = 5
)
rgl::view3d(theta = 260, phi = 30, fov = 60, zoom = .85)
```