-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Michelle Asuamah
authored and
Michelle Asuamah
committed
Feb 28, 2025
1 parent
f06d580
commit a16ed03
Showing
3 changed files
with
212 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletions
12
fern/pages/01-getting-started/transcribe-an-audio-file/_category_.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
{ | ||
"position": 2, | ||
"label": "Transcribe a pre-recorded audio file", | ||
"collapsible": true, | ||
"collapsed": true, | ||
"link": { | ||
"type": "generated-index", | ||
"title": "Transcribe a pre-recorded audio file", | ||
"description": "Learn how to transcribe and analyze an audio file.", | ||
"slug": "/getting-started/transcribe-an-audio-file" | ||
} | ||
} |
193 changes: 193 additions & 0 deletions
193
fern/pages/01-getting-started/transcribe-an-audio-file/python.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
--- | ||
title: 'Transcribe a pre-recorded audio file in Python' | ||
subtitle: 'Learn how to transcribe and analyze an audio file in Python .' | ||
hide-nav-links: true | ||
description: 'Learn how to transcribe and analyze an audio file in Python.' | ||
--- | ||
|
||
<Info title="Universal-2 is live"> | ||
Dive into our research paper to see how we're redefining speech AI accuracy. Read more [here](https://www.assemblyai.com/research/universal-2). | ||
</Info> | ||
|
||
## Overview | ||
|
||
By the end of this tutorial, you'll be able to: | ||
|
||
- Transcribe a pre-recorded audio file. | ||
- Enable [Speaker Diarization](/docs/speech-to-text/speaker-diarization) to detect speakers in an audio file. | ||
|
||
Here's the full sample code for what you'll build in this tutorial: | ||
|
||
```python | ||
import assemblyai as aai | ||
|
||
aai.settings.api_key = "<YOUR_API_KEY>" | ||
|
||
transcriber = aai.Transcriber() | ||
|
||
# You can use a local filepath: | ||
# audio_file = "./example.mp3" | ||
|
||
# Or use a publicly-accessible URL: | ||
audio_file = ( | ||
"https://assembly.ai/sports_injuries.mp3" | ||
) | ||
|
||
config = aai.TranscriptionConfig(speaker_labels=True) | ||
|
||
transcript = transcriber.transcribe(audio_file, config) | ||
|
||
if transcript.status == aai.TranscriptStatus.error: | ||
print(f"Transcription failed: {transcript.error}") | ||
exit(1) | ||
|
||
print(transcript.text) | ||
|
||
for utterance in transcript.utterances: | ||
print(f"Speaker {utterance.speaker}: {utterance.text}") | ||
``` | ||
|
||
## Before you begin | ||
|
||
To complete this tutorial, you need: | ||
|
||
- [Python](https://www.python.org/), [TypeScript](https://www.typescriptlang.org/), [Go](https://go.dev), Java, [.NET](https://dotnet.microsoft.com/en-us/download), or [Ruby](https://www.ruby-lang.org/en/documentation/installation/) installed. | ||
- A <a href="https://www.assemblyai.com/dashboard/signup" target="_blank">free AssemblyAI account</a>. | ||
|
||
|
||
## Step 1: Install the SDK | ||
Install the package via pip: | ||
|
||
|
||
```bash | ||
pip install assemblyai | ||
``` | ||
|
||
## Step 2: Configure the SDK | ||
|
||
In this step, you 'll create an SDK client and configure it to use your API key. | ||
|
||
<Steps> | ||
<Step> | ||
Browse to <a href="https://www.assemblyai.com/app/account" target="_blank">Account</a>, and then click the text under **Your API key** to copy it. | ||
</Step> | ||
|
||
<Step> | ||
Create a new `Transcriber` and configure it to use your API key. Replace `YOUR_API_KEY` with your copied API key. | ||
|
||
```python | ||
import assemblyai as aai | ||
|
||
aai.settings.api_key = "<YOUR_API_KEY>" | ||
|
||
transcriber = aai.Transcriber() | ||
``` | ||
</Step> | ||
</Steps> | ||
|
||
## Step 3: Submit audio for transcription | ||
|
||
In this step, you'll submit the audio file for transcription and wait until it's completes. The time it takes to process an audio file depends on its duration and the enabled models. Most transcriptions complete within 45 seconds. | ||
|
||
<Steps> | ||
<Step> | ||
|
||
Specify a URL to the audio you want to transcribe. The URL needs to be accessible from AssemblyAI's servers. For a list of supported formats, see [FAQ](https://support.assemblyai.com/). | ||
|
||
```python | ||
audio_file = "https://assembly.ai/sports_injuries.mp3" | ||
``` | ||
|
||
<Note title="Local audio files"> | ||
If you want to use a local file, you can also specify a local path, for example: | ||
|
||
```python | ||
audio_file = "./example.mp3" | ||
``` | ||
</Note> | ||
|
||
<Note title="YouTube"> | ||
|
||
YouTube URLs are not supported. If you want to transcribe a YouTube video, you need to download the audio first. | ||
|
||
</Note> | ||
|
||
</Step> | ||
<Step> | ||
To generate the transcript, pass the audio URL to `client.Transcripts.Transcribe()`. This may take a minute while we're processing the audio. | ||
|
||
|
||
```python | ||
transcript = transcriber.transcribe(audio_file) | ||
``` | ||
|
||
<Tip title="Select the speech model"> | ||
You can select the class of models to use in order to make cost-performance tradeoffs best suited for your application. See [Select the speech model](/docs/speech-to-text/pre-recorded-audio#select-the-speech-model-with-best-and-nano). | ||
</Tip> | ||
</Step> | ||
<Step> | ||
If the transcription failed, the `status` of the transcription will be set to | ||
`error`. To see why it failed you can print the value of `error`. | ||
|
||
```python | ||
if transcript.error: | ||
print(transcript.error) | ||
exit(1) | ||
``` | ||
</Step> | ||
<Step> | ||
|
||
Print the complete transcript. | ||
|
||
```python | ||
print(transcript.text) | ||
``` | ||
</Step> | ||
<Step> | ||
Run the application and wait for it to finish. | ||
</Step> | ||
</Steps> | ||
|
||
You've successfully transcribed your first audio file. You can see all submitted transcription jobs in the <a href="https://www.assemblyai.com/app/processing-queue" target="_blank">Processing queue</a>. | ||
|
||
## Step 4: Enable additional AI models | ||
|
||
You can extract even more insights from the audio by enabling any of our [AI models](/audio-intelligence) using _transcription options_. In this step, you'll enable the [Speaker diarization](/docs/speech-to-text/speaker-diarization) model to detect who said what. | ||
|
||
<Steps> | ||
<Step> | ||
Create a `TranscriptionConfig` with `speaker_labels` set to `True`, and then pass it as the second argument to `transcribe()`. | ||
|
||
```python | ||
config = aai.TranscriptionConfig(speaker_labels=True) | ||
|
||
transcript = transcriber.transcribe(audio_file, config) | ||
``` | ||
</Step> | ||
<Step> | ||
In addition to the full transcript, you now have access to utterances from each speaker. | ||
|
||
```python | ||
for utterance in transcript.utterances: | ||
print(f"Speaker {utterance.speaker}: {utterance.text}") | ||
``` | ||
</Step> | ||
</Steps> | ||
|
||
Many of the properties in the transcript object only become available after you enable the corresponding model. For more information, see the models under [Speech-to-Text](/speech-to-text) and [Audio Intelligence](/audio-intelligence). | ||
|
||
|
||
## Next steps | ||
|
||
In this tutorial, you've learned how to generate a transcript for an audio file and how to extract speaker information by enabling the [Speaker diarization](/docs/speech-to-text/speaker-diarization) model. | ||
|
||
Want to learn more? | ||
|
||
- For more ways to analyze your audio data, explore our [Audio Intelligence models](/audio-intelligence). | ||
- If you want to transcribe audio in real-time, see [Transcribe streaming audio from a microphone](/getting-started/transcribe-streaming-audio-from-a-microphone). | ||
- To search, summarize, and ask questions on your transcripts with LLMs, see [LeMUR](/lemur). | ||
|
||
|
||
## Need some help? | ||
|
||
If you get stuck, or have any other questions, we'd love to help you out. Contact our support team at [email protected] or create a [support ticket](https://www.assemblyai.com/contact/support). |