-
Notifications
You must be signed in to change notification settings - Fork 2
CLI
Aas1kk edited this page Sep 21, 2024
·
2 revisions
Follow these Step for The CLI Tool
- Python
-
requests
- For making HTTP requests. -
beautifulsoup4
- For parsing the HTML. -
lxml
- A fast XML and HTML parser. -
beautifultable
- For displaying scraped data in a table.
Ollama is integrated with this tool so that data parsing can be done according to your needs!
Install the dependencies using the following command:
git clone https://github.com/aa-sikkkk/WebScrape.git
cd WebScrape
pip install -r requirements.txt
python scrap.py
Data Storage
{
"scraped_data": {
"alias_name": {
"url": "http://example.com",
"title": "Example Website",
"all_anchor_href": [...],
"all_anchors": [...],
"all_images_data": [...],
"all_images_source_data": [...],
"all_h1_data": [...],
"all_h2_data": [...],
"all_h3_data": [...],
"all_p_data": [...],
"scraped_at": "dd/mm/yyyy hh:mm:ss",
"status": true,
"domain": "example.com"
}
}
}