Python web scraper using Selenium and BeautifulSoup modules to extract text from various Twitter pages.
The program uses Selenium (and ChromeDriver) to automate user behaviour within a browser session to load a specific Twitter page (no login) and load data from dynamic scrolling. Once the pages are rendered the HTML is extracted and sieved through BeautifulSoup. Note: it will continue scraping until 1) end of feed is reached, 2) manual interrupt by killing the connection.
This program will extract the following and output to a CSV file with punctuation and other non-text characters removed:
- full tweet text from each Twitter page
- date
- header
- url
- user name
- popularity metrics (string containing retweets/favourites)
- like_fave: integer value for number of times 'favorited'
- share_rtwt: integer value for number of times 'retweeted'