Transcript Service

Overview

The Transcript Service is a Flask-based web application that allows users to fetch, manage, and synchronize transcripts from YouTube videos. It integrates with various APIs, including YouTube Data API, YouTube Transcript API, and Eleven Labs for audio generation. The service supports features like fetching transcripts for a specific channel, translating transcripts, and generating audio from text.

Features

Fetch Transcripts: Retrieve transcripts for videos from a specified YouTube channel.
Translation: Translate transcripts into English using the Claude API (off by default).
Audio Generation: Generate audio files from transcripts using Eleven Labs.
Job Management: Track the status of transcript fetching jobs.
S3 Synchronization: Sync training data to an AWS S3 bucket.
Web Interface: A user-friendly web interface to interact with the service.

Setup

Prerequisites

Python 3.7 or higher
Flask
AWS account (for S3 synchronization)
YouTube Data API key
Anthropic API key (for translation)
Eleven Labs API key (for audio generation)

Installation

Clone the repository:

git clone <repository-url>
cd <repository-directory>

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install dependencies:
```
pip install -r requirements.txt
```
Set up environment variables: Create a .env file in the root directory
```
cp .env.example .env
```
Run the application: You can start the Flask application using the provided Makefile:
```
make service
```

Usage

Fetching Transcripts

To fetch transcripts for a specific YouTube channel, you can use the following endpoint:

GET /transcripts?channel_name=<channel_name>&author=<author_name>

Viewing Transcripts

To view a specific transcript, navigate to:

GET /transcripts/view/<video_id>

Generating Audio

To generate audio from a transcript, use the following endpoint:

GET /generate_audio/<video_id>/<type>

Syncing Training Data

To synchronize training data to your S3 bucket, use the following endpoint:

POST /sync_training_data

Job Status

To check the status of a transcript fetching job, use:

GET /job_status/<job_id>

Fetching a Single Transcript

To fetch a single transcript for a given video URL, you can use the following endpoint:

GET /single_transcript?url=<video_url>&translate=<true|false> (default is false)

CLI Usage

The Transcript Service also provides a command-line interface (CLI) for interacting with the service. You can access the CLI commands by running:

python transcript_service.py

Available Commands

fetch-transcripts: Fetch transcripts for a given YouTube channel.

Usage:

python transcript_service.py fetch-transcripts --channel_name <channel_name> [--author <author_name>]

fetch-single-transcript: Fetch a single transcript for a given video URL.

Usage:

python transcript_service.py fetch-single-transcript <video_url> [--translate]

check-job-status: Check the status of a transcription job.
- Usage:
```
python transcript_service.py check-job-status <job_id>
```

Web Interface

The application provides a web interface that can be accessed at http://localhost:5000. You can use this interface to interact with the various features of the service.

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
templates		templates
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
transcript_service.py		transcript_service.py
youtube_search.py		youtube_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcript Service

Overview

Features

Setup

Prerequisites

Installation

Usage

Fetching Transcripts

Viewing Transcripts

Generating Audio

Syncing Training Data

Job Status

Fetching a Single Transcript

CLI Usage

Available Commands

Web Interface

Contributing

License

About

Releases

Packages

Languages

jonathanhudak/knowledge-collector

Folders and files

Latest commit

History

Repository files navigation

Transcript Service

Overview

Features

Setup

Prerequisites

Installation

Usage

Fetching Transcripts

Viewing Transcripts

Generating Audio

Syncing Training Data

Job Status

Fetching a Single Transcript

CLI Usage

Available Commands

Web Interface

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages