Checkpoint #2
Command line interface
- Wrap up your web scraper to be able to call it with different arguments from the terminal.
- Examples to arguments you can use: different data types to scrape (e.g. hashtags on instagram, product categories on amazon), timespan/dates to scrape (scrape only data that was created in a certain timespan, etc), different technical parameters (DB params, number of iterations to scrape, etc).
- Use click or argparse packages.
- Add documentation to the different CLI arguments to the README.md file, including the default values.
Database implementation
- Design an ERD for your data. Think about which fields should be primary and foreign keys, and how you distinct new entries from already existing ones.
- Take notice of primary and foreign keys.
- Write a script that creates your database structure (python or sql), it should be separate from the main scraper code (but should be part of the project and submitted as well).
- Add to your scraper the ability to store the data it scrapes to the database you designed. It should store only new data and avoid duplicates.
- Work with Mysql database.
- If you'd like you can use ORM tools such as sqlalchemy.
- Add a DB documentation section to the README.md file, including an ERD diagram and explanations about each table and its columns.