WordPress Content Scraper

Collects posts/pages from a CSV list of Wordpress URLs, spin's them, then prepares them in a JSON file.

Requirements

This set of scripts is specifically designed to run on:

Python 3
Windows 10 (although it should work on Vista, 7 and 8)
MacOS Monterey

Setup

Install Python for Windows
From the project root, run python setup.py
Add appropriate values to the .env file

Running the "application"

This is done in 3 parts...

1. Download the articles

Compile a list of all URL articles or pages you want to pull content from
Add CSV file with list of all URLs to the ./sources folder

2. Spin and compile the articles

Using terminal, bash, PowerShell or similar, navigate to ./scrapers
Run python scrape-press.py
Wait for the script to finish compiling the JSON file to the ./data folder

2. Import to your blog

Install a processor / importer on your blogging platform (if you're using WordPress, WP All Import is brilliant)
Upload the ./data/____.json file to the importer
Map the appropriate fields
Run your importer