CLI

Follow these Step for The CLI Tool

Requirements

Python
requests - For making HTTP requests.
beautifulsoup4 - For parsing the HTML.
lxml - A fast XML and HTML parser.
beautifultable - For displaying scraped data in a table.

Ollama is integrated with this tool so that data parsing can be done according to your needs!

Install the dependencies using the following command:

git clone https://github.com/aa-sikkkk/WebScrape.git
cd WebScrape

pip install -r requirements.txt

python scrap.py

Data Storage

{
    "scraped_data": {
        "alias_name": {
            "url": "http://example.com",
            "title": "Example Website",
            "all_anchor_href": [...],
            "all_anchors": [...],
            "all_images_data": [...],
            "all_images_source_data": [...],
            "all_h1_data": [...],
            "all_h2_data": [...],
            "all_h3_data": [...],
            "all_p_data": [...],
            "scraped_at": "dd/mm/yyyy hh:mm:ss",
            "status": true,
            "domain": "example.com"
        }
    }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI

Requirements

Clone this wiki locally