Web crawler - books

My first attempt at creating a web crawler in Python. It traverses an online book shop and grabs book titles and URLs.

Lessons Learned

While working on this project I have learned the following:

In addition, my code takes into account that sometimes duplicate data can be scraped so the duplicates are ignored.

Python: Get it from here: https://www.python.org/downloads/ or via Microsoft Store

venv:

pip install virtualenv

Scrapy:

python -m pip install scrapy

MongoDB: Download the relevant to you installer from https://www.mongodb.com/docs/manual/installation/#mongodb-community-edition-installation-tutorials. Additionally, you will need to run the below command in cmd:

python -m pip install pymongo

To deploy this project run the following commands in the cmd:

venv\Scripts\activate.bat

or add it to your PATH

  scrapy startproject books

test> use books_db
switched to db books_db
books_db> db.createCollection("books")
{ ok: 1 }
books_db> show collections
books
books_db>