This Python script provides a way to transcribe an audio file and generate a corrected transcript using OpenAI's Whisper ASR model. It then feeds the output of the audio transcription to GPT-4 in order to correct any spelling or grammar mistakes.
This script requires the openai Python library. You can install it using pip:
pip install openai
You'll need an API key from OpenAI to use their servcies. Once you have a key set it as an environment variable:
export OPENAI_API_KEY='your-api-key'
- The script first checks if a file named 'subtitle-correction-prompt.txt' and 'subtitle-transcription-prompt.txt' exist in the same directory. This file should contain the system prompt for the AI model and the subtitle creation model.
- It then transcribes an audio file using the 'whisper-1' model from the OpenAI API.
- The transcription is then processed by GPT-4 AI model to generate a corrected transcript.
The script requires the following parameters:
temperature
: The temperature parameter for the AI model.system_prompt
: The system prompt for the AI model. This is read from the 'prompt.txt' file.audio_file
: The audio file to transcribe.
These parameters should be passed to the generate_corrected_transcript
function.
- Make sure you have a file named 'subtitle-correction-prompt.txt' in the same directory as the script. This file should contain the system prompt for the AI model for correcting the text of the initial subtitle result.
- Setup a virtual env:
python -m venv .venv
- Activate it:
source .venv/bin/activate
- Install requirements:
pip install -r requirements.txt
- Run the script:
python audio_to_srt.py
The transcriptions are saved to files in SubRip (.srt) format. Both the original and post-processed transcriptions are saved for comparison.
- Accept a youtube link
- Parse audio from YT file
- create a simple webfront end or API endpoint