Skip to content

nika-akin/Social-Media-and-Text-Mining-Workshop-2022

Repository files navigation

Social Media and Text Mining Workshop with R

Workshop material on working with social media data and text mining methods in R

Made with woRkshoptools

Part of the conference: „Forschung zur Digitalisierung in der kulturellen Bildung“ (29-09-2022)

Contact: Veronika Batzdorfer ([email protected])


Background

Social media are central sites of collective opinion formation and form an important basis for describing and explaining social phenomena (e.g., online radicalisation). However, when working with this type of data, decisions in all phases of the research cycle (from data collection to pre-processing steps to analytical decisions) carry risks of bias for validity and reliability aspects.

About

This workshop will include an introduction to how large amounts of text data from Twitter, which are openly available, can be made accessible and usable for research purposes. It will combine conceptual considerations and practical applications in R.

  • Strategies to collect and process textual data with application programming interfaces (APIs) using common R tools.
  • Potentials of bias in the research data cycle
  • Basics of natural language processing (NLP), data cleaning (e.g. with 'quanteda' or 'textclean') and application of common NLP tools for automated text analysis
  • Outlook on topic modelling (or word embeddings)
  • Bias and ethics in NLP

Requirements

pkgs <- c("here", "lubridate", "quanteda", "quanteda.textstats", "tidyverse", 
"academictwitteR", "tibble", "kableExtra", "tidytext", 
"textclean", "academictwitteR")

install.packages(pkgs)

Contents

Slides

Time Content
09:00 - 10:30 Concepts & challenges when analysing social web data

https://github.com/nika-akin/-Social-Media-and-Text-Mining-Workshop-2022/blob/main/content/sessions/1_1_analyse_social_web_data.pdf

10:30 - 11:00 Coffee break
11:30 - 12:30 Getting Started with Twitter data: (i) Sampling, (ii) Pre-processing/ data wrangling & (iii) Basics of textual analyses (frequencies/ co-occurences/ networks)

https://htmlpreview.github.io/?https://github.com/nika-akin/-Social-Media-and-Text-Mining-Workshop-2022/blob/main/content/sessions/2_1.nb.html

12:30 - 13:30 Lunch
13:30 - 15:00 Twitter Demo & Crawling Social web data

https://github.com/nika-akin/-Social-Media-and-Text-Mining-Workshop-2022/blob/main/content/sessions/3_1_analyse_social_web_data.pdf

15:00 - 15:30 Coffee break
15:30 - 17:00 Outlook Advanced NLP techniques (e.g., Topic Modeling) & Social web data collection; Bias and Ethics with NLP

https://github.com/nika-akin/-Social-Media-and-Text-Mining-Workshop-2022/blob/main/content/sessions/4_1_Ausblick.pdf

Data

Twitter Features

Feature ID Type Description
post_id Numeric identifier of tweet
followers Numeric number of followers in profile
friends Numeric number of friends in profile
post_created character date of posting tweet
post_text character text of original tweet
user_id Numeric identifier of user
label Numeric depression categorization: 1 = depression tweet, 2 = non-depression
favourites Numeric number of external favorites of the tweet
user_id Numeric identifier of user

About

NLP-Introduction to working with Twitter data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published