-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
234 lines (161 loc) · 7.36 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
---
output: github_document
---
```{r echo=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
options(width = 100)
library(Twitmo)
library(magrittr)
```
# Twitmo <img src="man/figures/hexSticker.png" width="160px" align="right"/>
<!-- badges: start -->
[![](https://www.r-pkg.org/badges/version/Twitmo?color=green)](https://cran.r-project.org/package=Twitmo) [![R-CMD-check](https://github.com/abuchmueller/Twitmo/workflows/R-CMD-check/badge.svg)](https://github.com/abuchmueller/Twitmo/actions) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
<!-- badges: end -->
The goal of `Twitmo` is to facilitate topic modeling in R with Twitter data. `Twitmo` provides a broad range of methods to sample, pre-process and visualize contents of geo-tagged tweets to make modeling the public discourse easy and accessible.
## Common questions
### Can I use `Twitmo` for pseudo-document pooling if I already sampled data earlier from Twitter without `Twitmo`?
Yes, this is possible in the Github version of `Twitmo.` You can use `pool_tweets()` on any data frame, that has a 'text' and a 'hashtags' columns that are also named that way. Any additional columns you might have, can additionally be used as document meta-data in a STM (see below).
### Can I use `Twitmo` to model topical prevalence over time?
`Twitmo` has no built-in methods for this purpose, however slicing your data time wise and fitting multiple LDA models then comparing topical prevalence over time can be accomplished with `Twitmo` in conjunction with `ggplot2`.
## Installation
## Important Note for **NEW** users
If you are using `Twitmo` for the first time, you might not already have `rtweet` installed. If you have `rtweet` version \>= 1.0.0 installed, you will not be able to use certain parts of `Twitmo`, like parsing/loading tweets because of breaking changes in `rtweet`. Since CRAN, by default, only distributes the latest version of a package and R does not respect upper boundaries on dependencies I am currently working on a solution. **You make sure you have the correct version of `rtweet` installed by running**
```{r rtweet-installation, eval=FALSE}
## install remotes package if it's not already
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
devtools::install_version("rtweet", version = "0.7.0", repos = "http://cran.us.r-project.org")
```
You can install `Twitmo` from CRAN with:
```{r cran-installation, eval=FALSE}
install.packages("Twitmo")
```
or install from Github where the correct version of `rtweet` will automatically be installed.
You can install `Twitmo` from Github with:
```{r gh-installation, eval = FALSE}
## install remotes package if it's not already
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
## install dev version of Twitmo from github
remotes::install_github("abuchmueller/Twitmo")
```
**Note**: Installing from Github may require you to have Rtools on your system.
- [Windows](https://cran.r-project.org/bin/windows/Rtools/ "Rtools for Windows (CRAN)")
- [macOS](https://thecoatlessprofessor.com/programming/cpp/r-compiler-tools-for-rcpp-on-macos/ "Rtools for macOS")
## Collecting geo-tagged tweets
Make sure you have a regular Twitter Account before start to sample your tweets.
```{r eval=FALSE}
# Live stream tweets from the UK for 30 seconds and save to "uk_tweets.json" in current working directory
get_tweets(method = 'stream',
location = "GBR",
timeout = 30,
file_name = "uk_tweets.json")
# Use your own bounding box to stream US mainland tweets
get_tweets(method = 'stream',
location = c(-125, 26, -65, 49),
timeout = 30,
file_name = "tweets_from_us_mainland.json")
```
## Load your tweets from a json file into a data frame
A small sample with raw tweets is included in the package. Access via:
```{r message=FALSE, warning=FALSE}
raw_path <- system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")
mytweets <- load_tweets(raw_path)
```
## Pool tweets into long pseudo-document
```{r}
pool <- pool_tweets(mytweets)
pool.corpus <- pool$corpus
pool.dfm <- pool$document_term_matrix
```
## Find optimal number of topics
```{r ldatuner, warning=FALSE}
find_lda(pool.dfm)
```
## Fitting a LDA model
```{r}
model <- fit_lda(pool.dfm, n_topics = 7)
```
## View most relevant terms for each topic
```{r}
lda_terms(model)
```
or which hashtags are heavily associated with each topic
```{r}
lda_hashtags(model)
```
## Inspecting LDA distributions
Check the distribution of your LDA Model with
```{r}
lda_distribution(model)
```
# Filtering tweets
Sometimes you can build better topic models by blacklisting or whitelisting certain keywords from your data. You can do this with a keyword dictionary using the `filter_tweets()` function. In this example we exclude all tweets with "football" or "mood" in them from our data.
```{r}
mytweets %>% dim()
filter_tweets(mytweets, keywords = "football,mood", include = FALSE) %>% dim()
```
Analogously if you want to run your collected tweets through a whitelist use
```{r}
mytweets %>% dim()
filter_tweets(mytweets, keywords = "football,mood", include = TRUE) %>% dim()
```
# Fitting a structural topic model (STM)
Structural topic models can be fitted with additional external covariates. In this example we metadata that comes with the tweets such as retweet count. This works with parsed unpooled tweets. Pre-processing and fitting is done with one function.
```{r echo=TRUE, results='hide'}
stm_model <- fit_stm(mytweets, n_topics = 7, xcov = ~ retweet_count + followers_count + reply_count + quote_count + favorite_count,
remove_punct = TRUE,
remove_url = TRUE,
remove_emojis = TRUE,
stem = TRUE,
stopwords = "en")
```
STMs can be inspected via
```{r}
summary(stm_model)
```
## Visualizing models with `LDAvis`
Make sure you have `LDAvis` and `servr` installed.
```{r ldavis-intallation, eval = FALSE}
## install LDAvis package if it's not already
if (!requireNamespace("LDAvis", quietly = TRUE)) {
install.packages("LDAvis")
}
## install servr package if it's not already
if (!requireNamespace("servr", quietly = TRUE)) {
install.packages("servr")
}
```
Export fitted models into interactive `LDAvis` visualizations with one line of code
```{r, eval=FALSE}
to_ldavis(model, pool.corpus, pool.dfm)
## for STM use (included in the stm package)
stm::toLDAvis(stm_model, stm_model$prep$documents)
```
![](man/figures/to_ldavis.png)
## Plotting geo-tagged tweets
Plot your tweets onto a static map
```{r fig.width = 6}
plot_tweets(mytweets, region = "USA(?!:Alaska|:Hawaii)", alpha=0.1)
```
or plot the distribution of a certain hashtag onto a static map (UK data not included)
```{r eval=FALSE}
plot_hashtag(uk_tweets, region = "UK", hashtag = "foodwaste", ignore_case=TRUE, alpha=0.2)
```
<img src="man/figures/ht_map_uk.png" width="247"/>
## Interactive maps with `leaflet`
Use scroll wheel to zoom into and out of the map. Click markets to see tweets. Make sure you have the `leaflet` package installed.
```{r eval=FALSE}
## install leaflet package if it's not already
if (!requireNamespace("leaflet", quietly = TRUE)) {
install.packages("leaflet")
}
cluster_tweets(mytweets)
```
![](man/figures/leaflet_us.png)