GitHub - rjshanahan/twitter_scraper: Web Scraper for Twitter pages

Twitter webscraper for specific pages

Python web scraper using Selenium and BeautifulSoup modules to extract text from various Twitter pages.

The program uses Selenium (and ChromeDriver) to automate user behaviour within a browser session to load a specific Twitter page (no login) and load data from dynamic scrolling. Once the pages are rendered the HTML is extracted and sieved through BeautifulSoup. Note: it will continue scraping until 1) end of feed is reached, 2) manual interrupt by killing the connection.

This program will extract the following and output to a CSV file with punctuation and other non-text characters removed:

full tweet text from each Twitter page
date
header
url
user name
popularity metrics (string containing retweets/favourites)
like_fave: integer value for number of times 'favorited'
share_rtwt: integer value for number of times 'retweeted'

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
twitter_selenium_scraper.py		twitter_selenium_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter webscraper for specific pages

Python web scraper using Selenium and BeautifulSoup modules to extract text from various Twitter pages.

About

Releases

Packages

Languages

rjshanahan/twitter_scraper

Folders and files

Latest commit

History

Repository files navigation

Twitter webscraper for specific pages

Python web scraper using Selenium and BeautifulSoup modules to extract text from various Twitter pages.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages