diff --git a/README.md b/README.md index 92cd3ee..ebadea7 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ My xaringan theme (from [xaringanthemer](https://pkg.garrickadenbuie.com/xaringa ``` mono_accent( base_color = "#F48024", - header_font_google = google_font("IBM Plex Sans"), + header_font_google = google_font("IBM Plex Sans", "700"), text_font_google = google_font("IBM Plex Sans Condensed"), code_font_google = google_font("IBM Plex Mono") ) diff --git a/css/footer_plus.css b/css/footer_plus.css index 0fafa16..2d74814 100644 --- a/css/footer_plus.css +++ b/css/footer_plus.css @@ -1,4 +1,4 @@ -.large { font-size: 150% } +.large { font-size: 160% } .title-slide .remark-slide-number { display: none; diff --git a/css/xaringan-themer.css b/css/xaringan-themer.css index ff9a7b7..9388451 100644 --- a/css/xaringan-themer.css +++ b/css/xaringan-themer.css @@ -17,7 +17,7 @@ * * ------------------------------------------------------- */ @import url(https://fonts.googleapis.com/css?family=IBM+Plex+Sans+Condensed); -@import url(https://fonts.googleapis.com/css?family=IBM+Plex+Sans); +@import url(https://fonts.googleapis.com/css?family=IBM+Plex+Sans:700); @import url(https://fonts.googleapis.com/css?family=IBM+Plex+Mono); diff --git a/slides.Rmd b/slides.Rmd index 817451e..7d80306 100644 --- a/slides.Rmd +++ b/slides.Rmd @@ -46,11 +46,11 @@ background-size: cover -# Text Modeling +# **Text Modeling** ### USING TIDY DATA PRINCIPLES -### Julia Silge | IBM Community Day: AI +.large[**Julia Silge | IBM Community Day: AI**] --- class: left, middle @@ -74,7 +74,7 @@ background-size: cover background-image: url(figs/white_bg.svg) background-size: cover -# Text in the real world +# **Text in the real world** -- @@ -122,7 +122,7 @@ background-size: 450px background-image: url(figs/white_title.svg) background-size: cover -# Two powerful NLP modeling approaches +# **Two powerful NLP techniques** -- @@ -137,13 +137,13 @@ background-size: cover background-image: url(figs/white_bg.svg) background-size: cover -# Topic modeling +# **Topic modeling** -- .large[Each document = mixture of topics] +- .large[Each **document** = mixture of topics] -- -- .large[Each topic = mixture of words] +- .large[Each **topic** = mixture of words] --- @@ -157,11 +157,11 @@ class: center, middle background-image: url(figs/white_title.svg) background-size: cover -# GREAT LIBRARY HEIST `r emo::ji("sleuth")` +# **GREAT LIBRARY HEIST** `r emo::ji("sleuth")` --- -## Downloading your text data +## **Downloading your text data** ```{r} library(tidyverse) @@ -180,7 +180,7 @@ books --- -## Someone has torn your books apart! `r emo::ji("sob")` +## **Someone has torn your books apart!** `r emo::ji("sob")` ```{r} @@ -198,7 +198,7 @@ by_chapter --- -## Can we put them back together? +## **Can we put them back together?** ```{r} library(tidytext) @@ -214,7 +214,7 @@ word_counts --- -## Can we put them back together? +## **Can we put them back together?** ```{r} words_sparse <- word_counts %>% @@ -225,7 +225,7 @@ class(words_sparse) --- -## Train a topic model +## **Train a topic model** Use a sparse matrix or a `quanteda::dfm` object as input @@ -240,7 +240,7 @@ summary(topic_model) --- -## Exploring the output of topic modeling +## **Exploring the output of topic modeling** .large[Time for tidying!] @@ -252,7 +252,7 @@ chapter_topics --- -## Exploring the output of topic modeling +## **Exploring the output of topic modeling** ```{r} top_terms <- chapter_topics %>% @@ -265,7 +265,7 @@ top_terms ``` --- -## Exploring the output of topic modeling +## **Exploring the output of topic modeling** ```{r, eval=FALSE} top_terms %>% @@ -293,7 +293,7 @@ top_terms %>% --- -## How are documents classified? +## **How are documents classified?** ```{r} chapters_gamma <- tidy(topic_model, matrix = "gamma", @@ -304,7 +304,7 @@ chapters_gamma --- -## How are documents classified? +## **How are documents classified?** ```{r} chapters_parsed <- chapters_gamma %>% @@ -315,7 +315,7 @@ chapters_parsed --- -## How are documents classified? +## **How are documents classified?** ```{r, eval=FALSE} chapters_parsed %>% @@ -343,14 +343,14 @@ class: center, middle background-image: url(figs/white_title.svg) background-size: cover -# GOING FARTHER `r emo::ji("rocket")` +# **GOING FARTHER** `r emo::ji("rocket")` --- background-image: url(figs/white_bg.svg) background-size: cover -## Tidying model output +## **Tidying model output** ### Which words in each document are assigned to which topics? @@ -364,7 +364,7 @@ background-size: 850px --- -## Using stm +## **Using stm** - .large[Document-level covariates] @@ -393,7 +393,7 @@ background-size: 950px background-image: url(figs/white_title.svg) background-size: cover -# Stemming? +# **Stemming?** .large[Advice from [Schofield & Mimno](https://mimno.infosci.cornell.edu/papers/schofield_tacl_2016.pdf)] @@ -417,12 +417,12 @@ background-image: url(figs/white_title.svg) background-size: cover -# Text classification +# **Text classification**

--- -## Downloading your text data +## **Downloading your text data** ```{r} library(tidyverse) @@ -440,7 +440,7 @@ books --- -## Making a tidy dataset +## **Making a tidy dataset** .large[Use this kind of data structure for EDA! `r emo::ji("nail")`] @@ -458,7 +458,7 @@ tidy_books --- -## Cast to a sparse matrix +## **Cast to a sparse matrix** .large[And build a dataframe with a response variable] @@ -474,7 +474,7 @@ books_joined <- data_frame(document = as.integer(rownames(sparse_words))) %>% --- -## Train a glmnet model +## **Train a glmnet model** ```{r} library(glmnet) @@ -490,7 +490,7 @@ model <- cv.glmnet(sparse_words, is_jane, family = "binomial", --- -## Tidying our model +## **Tidying our model** .large[Tidy, then filter to choose some lambda from glmnet output] @@ -508,7 +508,7 @@ Intercept <- coefs %>% --- -## Tidying our model +## **Tidying our model** ```{r} classifications <- tidy_books %>% @@ -522,7 +522,7 @@ classifications --- -## Understanding our model +## **Understanding our model** ```{r, eval=FALSE} coefs %>% @@ -551,7 +551,7 @@ coefs %>% --- -## ROC +## **ROC** ```{r} comment_classes <- classifications %>% @@ -569,7 +569,7 @@ roc <- comment_classes %>% --- -## ROC +## **ROC** ```{r} roc %>% @@ -592,7 +592,7 @@ roc %>% --- -## AUC for model +## **AUC for model** ```{r} roc %>% @@ -601,7 +601,7 @@ roc %>% --- -## Misclassifications +## **Misclassifications** Let's talk about misclassifications. Which documents here were incorrectly predicted to be written by Jane Austen? @@ -616,7 +616,7 @@ roc %>% --- -## Misclassifications +## **Misclassifications** Let's talk about misclassifications. Which documents here were incorrectly predicted to *not* be written by Jane Austen? @@ -636,14 +636,14 @@ background-image: url(figs/tmwr_0601.png) background-position: 50% 70% background-size: 750px -## Workflow for text mining/modeling +## **Workflow for text mining/modeling** --- background-image: url(figs/lizzieskipping.gif) background-position: 50% 55% background-size: 750px -# Go explore real-world text! +# **Go explore real-world text!** --- diff --git a/slides.html b/slides.html index 61d3597..546e3d1 100644 --- a/slides.html +++ b/slides.html @@ -33,11 +33,11 @@ <img src="figs/so-logo.svg" width="30%"/> -# Text Modeling +# **Text Modeling** ### USING TIDY DATA PRINCIPLES -### Julia Silge | IBM Community Day: AI +.large[**Julia Silge | IBM Community Day: AI**] --- class: left, middle @@ -61,7 +61,7 @@ background-image: url(figs/white_bg.svg) background-size: cover -# Text in the real world +# **Text in the real world** -- @@ -109,7 +109,7 @@ background-image: url(figs/white_title.svg) background-size: cover -# Two powerful NLP modeling approaches +# **Two powerful NLP techniques** -- @@ -124,13 +124,13 @@ background-image: url(figs/white_bg.svg) background-size: cover -# Topic modeling +# **Topic modeling** -- .large[Each document = mixture of topics] +- .large[Each **document** = mixture of topics] -- -- .large[Each topic = mixture of words] +- .large[Each **topic** = mixture of words] --- @@ -144,11 +144,11 @@ background-image: url(figs/white_title.svg) background-size: cover -# GREAT LIBRARY HEIST 🕵 +# **GREAT LIBRARY HEIST** 🕵️‍♂️ --- -## Downloading your text data +## **Downloading your text data** ```r @@ -185,7 +185,7 @@ --- -## Someone has torn your books apart! 😭 +## **Someone has torn your books apart!** 😭 @@ -221,7 +221,7 @@ --- -## Can we put them back together? +## **Can we put them back together?** ```r @@ -254,7 +254,7 @@ --- -## Can we put them back together? +## **Can we put them back together?** ```r @@ -272,7 +272,7 @@ --- -## Train a topic model +## **Train a topic model** Use a sparse matrix or a `quanteda::dfm` object as input @@ -315,7 +315,7 @@ --- -## Exploring the output of topic modeling +## **Exploring the output of topic modeling** .large[Time for tidying!] @@ -345,7 +345,7 @@ --- -## Exploring the output of topic modeling +## **Exploring the output of topic modeling** ```r @@ -376,7 +376,7 @@ ``` --- -## Exploring the output of topic modeling +## **Exploring the output of topic modeling** ```r @@ -394,7 +394,7 @@ --- -## How are documents classified? +## **How are documents classified?** ```r @@ -423,7 +423,7 @@ --- -## How are documents classified? +## **How are documents classified?** ```r @@ -452,7 +452,7 @@ --- -## How are documents classified? +## **How are documents classified?** ```r @@ -474,14 +474,14 @@ background-image: url(figs/white_title.svg) background-size: cover -# GOING FARTHER 🚀 +# **GOING FARTHER** 🚀 --- background-image: url(figs/white_bg.svg) background-size: cover -## Tidying model output +## **Tidying model output** ### Which words in each document are assigned to which topics? @@ -495,7 +495,7 @@ --- -## Using stm +## **Using stm** - .large[Document-level covariates] @@ -525,7 +525,7 @@ background-image: url(figs/white_title.svg) background-size: cover -# Stemming? +# **Stemming?** .large[Advice from [Schofield & Mimno](https://mimno.infosci.cornell.edu/papers/schofield_tacl_2016.pdf)] @@ -549,12 +549,12 @@ background-size: cover -# Text classification +# **Text classification** <h1 class="fa fa-balance-scale fa-fw"></h1> --- -## Downloading your text data +## **Downloading your text data** ```r @@ -590,7 +590,7 @@ --- -## Making a tidy dataset +## **Making a tidy dataset** .large[Use this kind of data structure for EDA! 💅] @@ -626,7 +626,7 @@ --- -## Cast to a sparse matrix +## **Cast to a sparse matrix** .large[And build a dataframe with a response variable] @@ -643,7 +643,7 @@ --- -## Train a glmnet model +## **Train a glmnet model** ```r @@ -659,7 +659,7 @@ --- -## Tidying our model +## **Tidying our model** .large[Tidy, then filter to choose some lambda from glmnet output] @@ -678,7 +678,7 @@ --- -## Tidying our model +## **Tidying our model** ```r @@ -710,7 +710,7 @@ --- -## Understanding our model +## **Understanding our model** ```r @@ -729,7 +729,7 @@ --- -## ROC +## **ROC** ```r @@ -748,7 +748,7 @@ --- -## ROC +## **ROC** ```r @@ -779,7 +779,7 @@ --- -## AUC for model +## **AUC for model** ```r @@ -796,7 +796,7 @@ --- -## Misclassifications +## **Misclassifications** Let's talk about misclassifications. Which documents here were incorrectly predicted to be written by Jane Austen? @@ -814,21 +814,21 @@ ## # A tibble: 10 x 2 ## Probability text ## <dbl> <chr> -## 1 0.890 " But who shall dwell in these worlds if they be" -## 2 0.857 "horizon--\"they're starving in heaps, bolting, treading o… -## 3 0.821 Her eyes met my brother's, and her hesitation ended. -## 4 0.961 did not know what to make of her. One shell, and they wou… -## 5 0.806 Such things, I told myself, could not be. -## 6 0.991 been out of England before, she would rather die than trus… -## 7 0.910 breed. I tell you, I'm grim set on living. And if I'm no… -## 8 0.876 all seriously. Yet though they wore no clothing, it was i… -## 9 0.893 her. -## 10 0.880 evening paper, after wiring for authentication from him an… +## 1 0.860 reading steadily with all his thoughts about his subject, … +## 2 0.927 range not very different from ours except that, according … +## 3 0.880 they did not wish to destroy the country but only to crush… +## 4 0.827 decorum were necessarily different from ours; and not only… +## 5 0.901 the innkeeper, she would, I think, have urged me to stay in +## 6 0.880 evening paper, after wiring for authentication from him an… +## 7 0.832 "\"Take this!\" said the slender lady, and she gave my bro… +## 8 0.806 Such things, I told myself, could not be. +## 9 0.962 "\"Be a man!\" said I. \"You are scared out of your wits!… +## 10 0.905 had my doubts. You're slender. I didn't know that it was… ``` --- -## Misclassifications +## **Misclassifications** Let's talk about misclassifications. Which documents here were incorrectly predicted to *not* be written by Jane Austen? @@ -846,16 +846,16 @@ ## # A tibble: 10 x 2 ## Probability text ## <dbl> <chr> -## 1 0.190 my part, except the shops and public places. The country i… -## 2 0.0759 me, I did not once put my foot out of doors, though I was … -## 3 0.176 Newcastle, a place quite northward, it seems, and there th… -## 4 0.173 I was never more annoyed! The insipidity, and yet the nois… -## 5 0.174 "of selecting a wife, as I certainly did.\"" -## 6 0.0456 They descended the hill, crossed the bridge, and drove to … -## 7 0.195 the first of September, than any body else in the country. -## 8 0.187 half-a-mile, and then found themselves at the top of a con… -## 9 0.184 glancing over it, said, in a colder voice: -## 10 0.0271 to the edge of the water, and one of its narrowest parts. … +## 1 0.135 occasional appearance of some trout in the water, and talk… +## 2 0.176 Newcastle, a place quite northward, it seems, and there th… +## 3 0.184 glancing over it, said, in a colder voice: +## 4 0.187 half-a-mile, and then found themselves at the top of a con… +## 5 0.0226 window that he wore a blue coat, and rode a black horse. +## 6 0.173 I was never more annoyed! The insipidity, and yet the nois… +## 7 0.174 "of selecting a wife, as I certainly did.\"" +## 8 0.157 "as I sit by the fire.\"" +## 9 0.164 one sleepless night out of two. +## 10 0.193 struck with the action of doing a very gallant thing, and … ``` --- @@ -865,14 +865,14 @@ background-position: 50% 70% background-size: 750px -## Workflow for text mining/modeling +## **Workflow for text mining/modeling** --- background-image: url(figs/lizzieskipping.gif) background-position: 50% 55% background-size: 750px -# Go explore real-world text! +# **Go explore real-world text!** ---