Sentinel Crawler is an advanced web crawling tool designed for Open Source Intelligence (OSINT) professionals, security analysts, and investigators. It automates the systematic collection of web-based intelligence, extracting relevant URLs, analyzing structured content, and leveraging AI for in-depth analysis.
With capabilities including user-agent rotation, depth-based crawling, keyword identification, and AI-powered analysis, Sentinel Crawler enables precise data extraction for intelligence operations.
- Configurable Crawling Parameters – Define depth, URL limits, and seed URLs for targeted intelligence gathering.
- User-Agent Randomization – Avoid detection and prevent request blocking.
- Keyword Detection & Content Parsing – Identify terms related to credentials, personal data, and operational intelligence.
- AI-Enhanced Content Analysis – Integrates Ollama Gemma for automated text evaluation and anomaly detection.
- Data Extraction & Comparison – Identifies significant differences between raw and analyzed content.
- Secure Output Storage – Retains all processed intelligence in structured reports.
Ensure you have Python 3.7+ installed.
git clone https://github.com/yourusername/OSINTai.git
cd OSINTai
python -m venv osintaiENV
source osintaiENV/bin/activate # macOS/Linux
osintaiENV\Scripts\activate # Windows
pip install -r requirements.txt
(If requirements.txt
is unavailable, manually install:)
pip install beautifulsoup4 requests
brew install ollama # macOS (Homebrew)
Execute the crawler interactively:
python OSINTai3.2.py
Enter the seed URL to start crawling: https://example.com
Enter the maximum depth for crawling: 3
Enter the maximum number of URLs to crawl: 100
Enter search terms (comma-separated) or press enter to skip: email, username, password
- Extracted and categorized URLs
- Content intelligence analysis using Ollama Gemma
- Structural and semantic variations in extracted data
- Generated reports stored as structured text files
- Precision OSINT Gathering – Automates intelligence collection efficiently.
- Obfuscation Features – Employs randomized User-Agents to prevent detection.
- AI-Driven Intelligence Processing – Enhances raw data with contextual analysis.
- Customizable Crawling – Define search parameters for tailored intelligence needs.
This tool is strictly for ethical intelligence research and compliance-driven investigations. Unauthorized data extraction or use of this tool for illicit activities is prohibited. The author assumes no liability for misuse.
This project is distributed under the MIT License.
Contributions are encouraged! Submit pull requests to improve functionality and security.
Paint the pigeon red.