Skip to content

Commit

Permalink
Update subtitle in 6_3_GPU_Whisper-medium
Browse files Browse the repository at this point in the history
  • Loading branch information
plusbang authored Jul 24, 2024
1 parent 4cc288d commit a12e67a
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions ch_6_GPU_Acceleration/6_3_GPU_Whisper-medium.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# 6.2 Run Whisper (medium) on Intel GPUs
# 6.3 Run Whisper (medium) on Intel GPUs

You can use IPEX-LLM to load Transformer-based automatic speech recognition (ASR) models for acceleration on Intel GPUs. With IPEX-LLM, PyTorch models (in FP16/BF16/FP32) for ASR can be loaded and optimized automatically on Intel GPUs with low-bit quantization (supported precisions include INT4/NF4/INT5/FP6/FP8/INT8).

Expand All @@ -7,11 +7,11 @@ In this tutorial, you will learn how to run speech models on Intel GPUs with IPE
> [!NOTE]
> Please make sure that you have prepared the environment for IPEX-LLM on GPU before you started. Refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more information regarding installation and environment preparation. Besides, to process audio files, you also need to install `librosa` by performing `pip install -U librosa`.
## 6.2.1 Download Audio Files
## 6.3.1 Download Audio Files
To start with, the first thing to do is preparing some audio files for this demo. As an example, you can download an English example from multilingual audio dataset [voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli) and one Chinese example from the Chinese audio dataset [AIShell](https://huggingface.co/datasets/carlot/AIShell). You are free to pick other recording clips found within or outside the dataset.


## 6.2.2 Load Model in Low Precision
## 6.3.2 Load Model in Low Precision

One common use case is to load a model from Hugging Face with IPEX-LLM low-bit precision optimization. For Whisper (medium), you could simply import `ipex_llm.transformers.AutoModelForSpeechSeq2Seq` instead of `transformers.AutoModelForSpeechSeq2Seq`, and specify `load_in_4bit=True` parameter accordingly in the `from_pretrained` function.

Expand Down Expand Up @@ -43,7 +43,7 @@ model_in_4bit_gpu = model_in_4bit.to("xpu")
>
> * You could refer to the [API documentation](https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/transformers.html) for more information.
## 6.2.3 Load Whisper Processor
## 6.3.3 Load Whisper Processor

A Whisper processor is also needed for both audio pre-processing, and post-processing model outputs from tokens to texts. IPEX-LLM does not provide a customized implementation for that, so you might want to use the official `transformers` API to load `WhisperProcessor`:

Expand All @@ -56,7 +56,7 @@ processor = WhisperProcessor.from_pretrained(pretrained_model_name_or_path="open
> [!NOTE]
> If you have already downloaded the Whisper (medium) model, you could specify `pretrained_model_name_or_path` to the model path.
## 6.2.4 Run Model to Transcribe English Audio
## 6.3.4 Run Model to Transcribe English Audio

Once you have optimized the Whisper model using IPEX-LLM with INT4 optimization and loaded the Whisper processor, you are ready to begin transcribing the audio through model inference.

Expand Down Expand Up @@ -100,7 +100,7 @@ with torch.inference_mode():
>

## 6.2.5 Run Model to Transcribe Chinese Audio and Translate to English
## 6.3.5 Run Model to Transcribe Chinese Audio and Translate to English

Next, let's move to the Chinese audio `audio_zh.wav`, which is randomly taken from the [AIShell](https://huggingface.co/datasets/carlot/AIShell) dataset. Whisper offers capability to transcribe multilingual audio files, and translate the recognized text into English. The only difference here is to define specific context token through `forced_decoder_ids`:

Expand Down

0 comments on commit a12e67a

Please sign in to comment.