GitHub - qaraqalpaq/TesseractOCR-API

Certainly! Below is a README.md template that documents the API of your Flask application. This documentation includes descriptions of the API endpoints, their expected inputs, and the outputs.

Video Instruction:

# Tesseract OCR Flask API Documentation

This document provides detailed information about the API endpoints of the Flask application designed for OCR (Optical Character Recognition) using Tesseract.



## API Endpoints

### 1. Get Supported Languages

- **Endpoint:** `/languages`
- **Method:** `GET`
- **Description:** Returns a list of languages supported by the OCR service.
- **Response:**
  - **200 OK:** JSON containing the supported languages and their details.
  - **404 Not Found:** Error message if the `languages.json` file is not found.
  - **500 Internal Server Error:** Error message for any other internal errors.

### 2. Upload File for OCR

- **Endpoint:** `/upload`
- **Method:** `POST`
- **Description:** Uploads a file (PDF or image) for OCR processing.
- **Input:**
  - Form-data with:
    - `file`: The file to be processed.
    - `language` (optional): Language code for OCR (default is 'eng').
- **Response:**
  - **202 Accepted:** JSON containing the task ID and file UUID upon successful receipt of the file.
  - **400 Bad Request:** Error message if no file is provided or if the file is invalid.
  - **500 Internal Server Error:** Error message for any processing errors.

### 3. Download Processed Text

- **Endpoint:** `/download/<filename>`
- **Method:** `GET`
- **Description:** Downloads the OCR processed text file.
- **URL Parameters:**
  - `filename`: The name of the file to download.
- **Response:**
  - **200 OK:** The requested text file.
  - **404 Not Found:** Error message if the requested file is not found.

## Usage

### Getting Supported Languages

Request:

GET /languages


Response:
```json
[
  {
    "language_code": "eng",
    "language_name": "English",
    "traineddata_link": "https://..."
  },
  ...
]

Uploading a File for OCR

Request:

POST /upload
Form-data:
  file: [file content]
  language: 'eng' (optional)

Response:

{
  "message": "File received",
  "task_id": "task-uuid",
  "file_uuid": "file-uuid"
}

Downloading Processed Text File

Request:

GET /download/file-uuid.txt

Response:

The processed text file for download.

Notes

The OCR process is asynchronous. After uploading a file, use the provided file_uuid to download the processed text once it's ready.
Ensure the uploaded file is in a supported format (PDF or image).
For the list of supported languages, refer to the response from the /languages endpoint.


This README provides a concise yet comprehensive overview of your API, making it easier for users to understand how to interact with it. You can enhance this document by adding more details, examples, or clarifications as needed for your specific application.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
models		models
outputs		outputs
uploads		uploads
.gitignore		.gitignore
Dockerfile		Dockerfile
README.MD		README.MD
app.py		app.py
docker-compose.yml		docker-compose.yml
languages.json		languages.json
requirements.txt		requirements.txt
sample.pdf		sample.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uploading a File for OCR

Downloading Processed Text File

Notes

About

Releases

Packages

Languages

qaraqalpaq/TesseractOCR-API

Folders and files

Latest commit

History

Repository files navigation

Uploading a File for OCR

Downloading Processed Text File

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages