Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(do not merge): add additional content to docs #144

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
13 changes: 13 additions & 0 deletions cookbook-master/audio-intelligence/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Audio Intelligence 🤖
Use our Audio Intelligence models to analyze audio and gain additional insights beyond speech to text.

## All Audio Intelligence Cookbooks

| Model | Cookbooks |
|----------------|-----------------------------------|
| **Content Moderation** | [Identify Hate Speech in Audio and Video Files](content_moderation.ipynb) |
| **Entity Detection** | [Create a Redacted Transcript with Entity Detection](entity_redaction.ipynb) |
| **Auto Chapters** | [Create Summarized Chapters from Podcasts](auto_chapters.ipynb) |
| **Summarization** | [Summarize Virtual Meetings](summarization.ipynb) |
| **Topic Detection** | [Label Content with Topic Tags](topic_detection.ipynb) |
| **Key Phrases** | [Identify Highlights in Audio and Video Files](key_phrases.ipynb) |
195 changes: 195 additions & 0 deletions cookbook-master/audio-intelligence/auto_chapters.ipynb

Large diffs are not rendered by default.

235 changes: 235 additions & 0 deletions cookbook-master/audio-intelligence/content_moderation.ipynb

Large diffs are not rendered by default.

205 changes: 205 additions & 0 deletions cookbook-master/audio-intelligence/entity_redaction.ipynb

Large diffs are not rendered by default.

229 changes: 229 additions & 0 deletions cookbook-master/audio-intelligence/key_phrases.ipynb

Large diffs are not rendered by default.

191 changes: 191 additions & 0 deletions cookbook-master/audio-intelligence/summarization.ipynb

Large diffs are not rendered by default.

126 changes: 126 additions & 0 deletions cookbook-master/audio-intelligence/topic_detection.ipynb

Large diffs are not rendered by default.

106 changes: 106 additions & 0 deletions cookbook-master/core-transcription/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Core Transcription 🎙️
The Speech Recognition model enables you to transcribe spoken words into written text and is the foundation of all AssemblyAI products.
On top of the core transcription, you can enable other features and models, such as [Speaker Diarization](https://www.assemblyai.com/docs/speech-to-text/speaker-diarization), by adding additional parameters to the same transcription request.

## Table of Contents

- [Core Transcription 🎙️](#core-transcription-️)
- [Table of Contents](#table-of-contents)
- [All Core Transcription Cookbooks](#all-core-transcription-cookbooks)
- [Basic Transcription Workflows](#basic-transcription-workflows)
- [Batch Transcription](#batch-transcription)
- [Hosting Audio Files](#hosting-audio-files)
- [Speaker Labels](#speaker-labels)
- [Automatic Language Detection](#automatic-language-detection)
- [Subtitles](#subtitles)
- [Delete Transcripts](#delete-transcripts)
- [Error Handling and Audio File Fixes](#error-handling-and-audio-file-fixes)
- [Translation](#translation)
- [Async Chunking for Near-Realtime Transcription](#async-chunking-for-near-realtime-transcription)
- [Migration Guides](#migration-guides)
- [Do More with our SDKS](#do-more-with-our-sdks)


## All Core Transcription Cookbooks

<a name="basic"></a>
### Basic Transcription Workflows
[Transcribe an Audio File](transcribe.ipynb)<br>
[Specify a Language](specify-language.ipynb)
[Transcribe YouTube videos](transcribe_youtube_videos.ipynb)<br>
[Build a UI for Transcription with Gradio](gradio-frontend.ipynb)
[Detect Low Confidence Words in a Transcript](detecting-low-confidence-words.md)

<a name="batch"></a>
### Batch Transcription
[Transcribe a batch of files using AssemblyAI](transcribe_batch_of_files)
[Transcribe multiple files simultaneously using our Python SDK](SDK_transcribe_batch_of_files/batch_transcription.ipynb)
[Transcribe multiple files simultaneously using our Node.js SDK](SDK-Node-batch.md)

<a name="host-files"></a>
### Hosting Audio Files
[Transcribe from an AWS S3 Bucket](transcribe_from_s3.ipynb)
[Transcribe Google Drive links](transcribing-google-drive-file.md)<br>
[Transcribe GitHub Files](transcribing-github-files.md)

<a name="speaker-labels"></a>
### Speaker Labels
[Identify Speakers in Audio Recordings](speaker_labels.ipynb)<br>
[Generate Speaker Labels with Make.com](make.com-speaker-labels.md)\
[Calculate Talk/Listen Ratio of Speakers](talk-listen-ratio.ipynb)<br>
[Create a speaker timeline with Speaker Labels](speaker_timeline.ipynb)\
[Use AssemblyAI with Pyannote to generate custom Speaker Labels](Use_AssemblyAI_with_Pyannote_to_generate_custom_Speaker_Labels.ipynb)<br>
[Speaker Diarization with Async Chunking](speaker-diarization-with-async-chunking.ipynb)<br>
[Speaker Identification Across Files w/ AssemblyAI, Pinecone, and Nvidia's TitaNet Model](titanet-speaker-identification.ipynb)

<a name="ald"></a>
### Automatic Language Detection
[Use Automatic Language Detection](automatic-language-detection.ipynb)
[Automatic Language Detection as separate step from Transcription](automatic-language-detection-separate.ipynb)
[Route to Default Language if Language Detection Confidence is Low - JS](automatic-language-detection-route-default-language-js.md)\
[Route to Default Language if Language Detection Confidence is Low - Python](automatic-language-detection-route-default-language-python.ipynb)<br>
[Route to Nano Speech Model if Language Confidence is Low](automatic-language-detection-route-nano-model.ipynb)

<a name="subtitles"></a>
### Subtitles
[Generate Subtitles for Videos](subtitles.ipynb)\
[Create Subtitles with Speaker Labels](speaker_labelled_subtitles.ipynb)<br>
[Create custom-length subtitles with AssemblyAI](subtitle_creation_by_word_count.ipynb)

<a name="delete"></a>
### Delete Transcripts
[Delete a Transcript ](delete_transcript.ipynb)
[Delete transcripts after 24 hours of creation](schedule_delete.ipynb)

<a name="errors"></a>
### Error Handling and Audio File Fixes
[Troubleshoot common errors when starting to use our API](common_errors_and_solutions.md)<br>
[Automatically Retry Server Errors](retry-server-error.ipynb)
[Automatically Retry Upload Errors](retry-upload-error.ipynb)\
[Identify Duplicate Channels in Stereo Files](identify_duplicate_channels.ipynb)\
[Correct Audio Duration Discrepancies with Multi-Tool Validation and Transcoding
](audio-duration-fix.ipynb)

<a name="translate"></a>
### Translation
[Translate Transcripts](translate_transcripts.ipynb)
[Translate Subtitles](translate_subtitles.ipynb)

<a name="chunking"></a>
### Async Chunking for Near-Realtime Transcription
🆕 [Near-Realtime Python Speech-to-Text App](https://github.com/AssemblyAI-Solutions/async-chunk-py)\
🆕 [Near-Realtime Node.js Speech-to-Text App](https://github.com/AssemblyAI-Solutions/async-chunk-js)\
[Split audio file to shorter files](split_audio_file)

<a name="migration-guides"></a>
### Migration Guides
🆕 [AWS Transcribe to AssemblyAI](migration_guides/aws_to_aai.ipynb)\
🆕 [Deepgram to AssemblyAI](migration_guides/dg_to_aai.ipynb)<br>
🆕 [OpenAI to AssemblyAI](migration_guides/oai_to_aai.ipynb)<br>
🆕 [Google to AssemblyAI](migration_guides/google_to_aai.ipynb)


<a name="do-more-with-sdk"></a>
### Do More with our SDKS
[Do more with the JavaScript SDK](do-more-with-sdk-js.md)\
[Do more with the Python SDK](do-more-with-sdk-python.ipynb)
157 changes: 157 additions & 0 deletions cookbook-master/core-transcription/SDK-Node-batch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Transcribe Multiple Files Simultaneously Using the Node SDK

In this guide, we'll show you how to transcribe multiple files simultaneously using the Node SDK.

## Getting Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an account and get your API key from your [dashboard](https://www.assemblyai.com/app/account). This guide will use AssemblyAI's [node SDK](https://github.com/AssemblyAI/assemblyai-node-sdk). If you haven't already, install the SDK in your project by following these [instructions](https://github.com/AssemblyAI/assemblyai-node-sdk#installation).

## Step-by-Step Instructions

Set up your application folder structure by adding an audio folder which will house the files you would like to transcribe, a transcripts folder which will house your completed transcriptions, and a new `.js` file in the root of the project. Your file structure should look like this:
```
BatchApp
├───audio
│ ├───audio-1.mp3
│ └───audio-2.mp3
├───transcripts
├───batch.js
```

In the `batch.js` file import the AssemblyAI package, as well as the node fs and node path packages. Create an AssemblyAI object with your API key:

```
import { AssemblyAI } from "assemblyai";
import * as path from 'path';
import * as fs from 'fs';

const client = new AssemblyAI({
apiKey: <Your API Key>,
});
```

Declare the variables `audioFolder`, `files`, `filePathArr`, and `transcriptsFolder`.
* `audioFolder` will be the relative path to the folder containing your audio files.
* `files` will read the files in the audio folder, and return them in an array.
* `filePathArr` will join the file names with the audio folder name to create the relative path to each individual file.
* `transcriptsFolder` will be the relative path to the folder containing your transcription files.

```
const audioFolder = './audio';
const files = await fs.promises.readdir(audioFolder);
const filePathArr = files.map(file => path.join(audioFolder, file));
const transcriptsFolder = './transcripts';
```

Next, we'll create a promise that will submit the file path for transcription. Make sure to add the parameters for the models you would like to run.

```
const getTranscript = (filePath) => new Promise((resolve, reject) => {
client.transcripts.transcribe({
audio: filePath,
language_detection: true
})
.then(result => resolve(result))
.catch(error => reject(error));
});
```

Next, we will create an async function that will call the `getTranscript` function and write the transcription text from each audio file to an individual text file in the transcripts folder.

```
const processFile = async (file) => {
const getFileName = file.split('audio/'); //Separate the folder name and file name into substrings
const fileName = getFileName[1]; //Grab the 2nd substring which is the file name
const filePath = path.join(transcriptsFolder, `${fileName}.txt`); //Relative path for transcription text files.

const transcript = await getTranscript(file); //Request the transcript
const text = transcript.text; //Grab transcription text from the JSON response

//Write the transcription text to a text file
return new Promise((resolve, reject) => {
fs.writeFile(filePath, text, err => {
if (err) {
reject(err);
return;
}

resolve({
ok: true,
message: 'Text File created!'
});
});
});
}
```

Next, we will create the run function. This function will:
* Create an array of unresolved promises with each promise requesting a transcript.
* Use `Promise.all` to iterate over the array of unresolved promises.

Then we'll call the run function
```
const run = async () => {
const unresolvedPromises = filePathArr.map(processFile);
await Promise.all(unresolvedPromises);
}

run()
```

Your final file will look like this:

```
import { AssemblyAI } from "assemblyai";
import * as path from 'path';
import * as fs from 'fs';

const client = new AssemblyAI({
apiKey: <Your API>,
});

const audioFolder = './audio';
const files = await fs.promises.readdir(audioFolder);
const filePathArr = files.map(file => path.join(audioFolder, file));
const transcriptsFolder = './transcripts'

const getTranscript = (filePath) => new Promise((resolve, reject) => {
client.transcripts.transcribe({
audio: filePath,
language_detection: true,
})
.then(result => resolve(result))
.catch(error => reject(error));
});

const processFile = async (file) => {
const getFileName = file.split('audio/')
const fileName = getFileName[1]
const filePath = path.join(transcriptsFolder, `${fileName}.txt`);

const transcript = await getTranscript(file);
const text = transcript.text

return new Promise((resolve, reject) => {
fs.writeFile(filePath, text, err => {
if (err) {
reject(err);
return;
}

resolve({
ok: true,
message: 'Text File created!'
});
});
});
}

const run = async () => {
const unresolvedPromises = filePathArr.map(processFile);
await Promise.all(unresolvedPromises);
}

run()
```

If you have any questions, please feel free to reach out to our Support team - [email protected] or in our Community Discord!
Binary file not shown.
Binary file not shown.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
AssemblyAI is a deep learning company that builds powerful APIs to help you transcribe and understand audio. The most common use case for the API is to automatically convert prerecorded audio and video files, as well as real time audio streams into text transcriptions. Our APIs convert audio and video into text using powerful deep learning models that we research and develop end to end in house. Millions of podcasts, zoom recordings, phone calls, or video files are being transcribed with AssemblyAI every single day. But where AssemblyAI really excels is with helping you understand your data. So let's say we transcribe Joe Biden's State of the Union using AssemblyAI's API, with our Auto chapter feature, you can generate time coded summaries of the key moments of your audio file. For example, with the State of the Union address, we get chapter summaries like this auto Chapters automatically segments your audio or video files into chapters and provides a summary for each of these chapters. With sentiment analysis, we can classify what's being spoken in your audio files as either positive, negative, or neutral. So, for example, in the State of the Union address, we see that this sentence was classified as positive, whereas this sentence was classified as negative. Content Safety Detection can flag sensitive content as it is spoken, like hate speech, profanity, violence, or weapons. For example, in Biden's State of the Union address, content safety detection flagged parts of his speech as being about weapons. This feature is especially useful for automatic content moderation and brand safety use cases. With Autohighlights, you can automatically identify important words and phrases that are being spoken in your data. Owned by the State of the Union address, AssemblyAI's API detected these words and phrases as being important. Lastly, with entity detection, you can identify entities that are spoken in your audio, like organization names or person names. In Biden's speech, these were the entities that were detected. This is just a preview of the most popular features of AssemblyAI API. If you want a full list of features, go check out our documentation linked in the description below. And if you ever need some support, our team of developers is here to help. Every day, developers are using these features to build really exciting applications. From meeting summarizers, to brand safety or contextual targeting platforms, to full blown conversational intelligence tools, we can't wait to see what you build with AssemblyAI.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines from Maine to Maryland to Minnesota are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what's happening here and why, so we called Peter DiCarlo, an associate professor in the Department of Environmental Health and Engineering at Johns Hopkins University. Good morning, professor. Good morning. So what is it about the conditions right now that have caused this round of wildfires to affect so many people so far away? Well, there's a couple of things. The season has been pretty dry already, and then the fact that we're getting hit in the US. Is because there's a couple of weather systems that are essentially channeling the smoke from those Canadian wildfires through Pennsylvania into the Mid Atlantic and the Northeast and kind of just dropping the smoke there. So what is it in this haze that makes it harmful? And I'm assuming it is is it is the levels outside right now in Baltimore are considered unhealthy. And most of that is due to what's called particulate matter, which are tiny particles, microscopic smaller than the width of your hair, that can get into your lungs and impact your respiratory system, your cardiovascular system, and even your neurological your brain. What makes this particularly harmful? Is it the volume of particulate? Is it something in particular? What is it exactly? Can you just drill down on that a little bit more? Yeah. So the concentration of particulate matter I was looking at some of the monitors that we have was reaching levels of what are, in science speak, 150 micrograms per meter cubed, which is more than ten times what the annual average should be, and about four times higher than what you're supposed to have on a 24 hours average. And so the concentrations of these particles in the air are just much, much higher than we typically see. And exposure to those high levels can lead to a host of health problems. And who is most vulnerable? I noticed that in New York City, for example, they're canceling outdoor activities, and so here it is in the early days of summer, and they have to keep all the kids inside. So who tends to be vulnerable in a situation like this? It's the youngest. So children, obviously, whose bodies are still developing. The elderly who know their bodies are more in decline, and they're more susceptible to the health impacts of breathing, the poor air quality. And then people who have preexisting health conditions, people with respiratory conditions or heart conditions can be triggered by high levels of air pollution. Could this get worse? That's a good in some areas, it's much worse than others. And it just depends on kind of where the smoke is concentrated. I think New York has some of the higher concentrations right now, but that's going to change as that air moves away from the New York area. But over the course of the next few days, we will see different areas being hit at different times with the highest concentrations. I was going to ask you, more fires start burning, I don't expect the concentrations to go up too much higher. I was going to ask you and you started to answer this, but how much longer could this last? Or forgive me if I'm asking you to speculate, but what do you think? Well, I think the fires are going to burn for a little bit longer, but the key for us in the US. Is the weather system changing. And so right now, it's kind of the weather systems that are pulling that air into our mid Atlantic and Northeast region. As those weather systems change and shift, we'll see that smoke going elsewhere and not impact us in this region as much. And so I think that's going to be the defining factor. And I think the next couple of days we're going to see a shift in that weather pattern and start to push the smoke away from where we are. And finally, with the impacts of climate change, we are seeing more wildfires. Will we be seeing more of these kinds of wide ranging air quality consequences or circumstances? I mean, that is one of the predictions for climate change. Looking into the future, the fire season is starting earlier and lasting longer and we're seeing more frequent fires. So, yeah, this is probably something that we'll be seeing more frequently. This tends to be much more of an issue in the Western US. So the Eastern US getting hit right now is a little bit new. But yeah, I think with climate change moving forward, this is something that is going to happen more frequently. That's Peter DiCarlo, associate professor in the Department of Environmental Health and Engineering at Johns Hopkins University. Sergeant Carlo, thanks so much for joining us and sharing this expertise with us. Thank you for having me.

Large diffs are not rendered by default.

322 changes: 322 additions & 0 deletions cookbook-master/core-transcription/audio-duration-fix.ipynb

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading