- List of OCR Datasets
- Images and OCR Texts from Medieval Documents
- inv3d - A high-resolution 3D invoice dataset for template-guided single-image document unwarping
- IAM Handwriting Database
- FUNDS - Form Understanding in Noisy Scanned Documents
- Real World Document Collections - 16 Class, document classification dataset
- COCO Text 2
- SynthText
- IIIT 5K-word - 5000 cropped word images from Scene Texts and born-digital images
- v7 Labs Open Data - Collection of open CV datasets for different tasks
- List of German NLP Datasets - Also lists for other languages
- ConceptNet - A knowledge graph that connects words and phrases of natural language
- Huggingface Datasets - The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
- BigML Datasets
- OpenML Datasets
- Google Datasearch - Google search for data sets
- Datasets for Chatbot training
- NetworkRepository - Repo of Graph Datasets categorized (types, domains)
- Apify - Provides free tier
- BrightData - Provides a free tier
- Webscraper.io - List of test crawling sites
- To Scrape Sandboxes - Ficticious book store and quotes site to scrape