From deb1742fcec64760634adf84460369ada4d24647 Mon Sep 17 00:00:00 2001
From: liyulingyue <852433440@qq.com>
Date: Sun, 17 Nov 2024 20:49:47 +0800
Subject: [PATCH 1/2] add new generate README.md
---
examples/librispeech/asr1/README.md | 415 +++++++++++++++++++++-------
1 file changed, 319 insertions(+), 96 deletions(-)
diff --git a/examples/librispeech/asr1/README.md b/examples/librispeech/asr1/README.md
index ca0081444c7..b6d83678df9 100644
--- a/examples/librispeech/asr1/README.md
+++ b/examples/librispeech/asr1/README.md
@@ -1,76 +1,135 @@
# Transformer/Conformer ASR with Librispeech
This example contains code used to train [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Librispeech dataset](http://www.openslr.org/resources/12)
+
## Overview
-All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function.
+All the scripts you need are in the `run.sh`. There are several stages in the `run.sh`, and each stage has its function.
+
| Stage | Function |
|:---- |:----------------------------------------------------------- |
-| 0 | Process data. It includes:
(1) Download the dataset
(2) Calculate the CMVN of the train dataset
(3) Get the vocabulary file
(4) Get the manifest files of the train, development and test dataset
(5) Get the sentencepiece model |
+| 0 | Process data. It includes:
(1) Download the dataset
(2) Calculate the CMVN of the train dataset
(3) Get the vocabulary file
(4) Get the manifest files of the train, development, and test datasets
(5) Get the sentencepiece model |
| 1 | Train the model |
-| 2 | Get the final model by averaging the top-k models, set k = 1 means to choose the best model |
+| 2 | Get the final model by averaging the top-k models, setting k = 1 means choosing the best model |
| 3 | Test the final model performance |
-| 4 | Get ctc alignment of test data using the final model |
-| 5 | Infer the single audio file |
+| 4 | Get CTC alignment of test data using the final model |
+| 5 | Infer a single audio file |
+| 51 | Export the final model to a static graph format for deployment |
-You can choose to run a range of stages by setting `stage` and `stop_stage `.
+You can choose to run a range of stages by setting the `stage` and `stop_stage` parameters.
For example, if you want to execute the code in stage 2 and stage 3, you can run this script:
```bash
bash run.sh --stage 2 --stop_stage 3
```
-Or you can set `stage` equal to `stop-stage` to only run one stage.
+Or you can set `stage` equal to `stop_stage` to only run one stage.
For example, if you only want to run `stage 0`, you can use the script below:
```bash
bash run.sh --stage 0 --stop_stage 0
```
-The document below will describe the scripts in `run.sh` in detail.
+The script `run.sh` utilizes configuration files, GPU resources, and local scripts to perform the tasks outlined in each stage. Specifically:
+- Configuration files are loaded from `conf/transformer.yaml` and `conf/tuning/decode.yaml`.
+- GPU devices are specified via the `gpus` variable (e.g., `gpus=0,1,2,3`).
+- Local scripts (e.g., `data.sh`, `train.sh`, `avg.sh`, `test.sh`, `align.sh`, `test_wav.sh`, and `export.sh`) handle the respective tasks for data preparation, model training, averaging, testing, alignment, single audio file inference, and model export.
+
+The document below will describe the scripts in the `run.sh` in detail.
## The Environment Variables
-The path.sh contains the environment variables.
+The `path.sh` contains the essential environment variables required for the scripts to run correctly.
```bash
-. ./path.sh
-. ./cmd.sh
+. ./path.sh || exit 1;
+. ./cmd.sh || exit 1;
```
-This script needs to be run first. And another script is also needed:
+These scripts need to be sourced first to ensure that all necessary environment variables are set.
+
+Additionally, another script is also required:
```bash
-source ${MAIN_ROOT}/utils/parse_options.sh
+. ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
```
-It will support the way of using `--variable value` in the shell scripts.
+This script supports the use of `--variable value` options in the shell scripts, allowing for flexible configuration without directly modifying the scripts.
+
+The environment variables set in `path.sh` and `cmd.sh` include paths to directories, executable files, and other configurations necessary for the system to function properly. For example, `MAIN_ROOT` is typically set in `path.sh` to point to the main directory of the project.
+
+The script also uses several other environment variables that are set either directly in the script or through command-line arguments parsed by `parse_options.sh`. These variables include:
+
+- `gpus`: A comma-separated list of GPU IDs to use for training and inference.
+- `stage` and `stop_stage`: These variables control which stages of the process to run. For instance, setting `stage=1` and `stop_stage=3` will run only stages 1, 2, and 3.
+- `conf_path`: The path to the configuration file for the model.
+- `ips`: A placeholder for a comma-separated list of IP addresses (typically used for distributed training).
+- `decode_conf_path`: The path to the decoding configuration file.
+- `avg_num`: The number of best models to average for improving model performance.
+- `audio_file`: The path to an audio file for single-file testing.
+
+These variables are used throughout the script to control the behavior of different stages, such as data preparation, model training, model averaging, testing, alignment, and exporting the model. For example:
+
+```bash
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+ # prepare data
+ bash ./local/data.sh || exit -1
+fi
+
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ # train model
+ CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
+fi
+
+# ... other stages follow similar patterns ...
+```
+
+By setting these environment variables appropriately, users can customize the behavior of the script to suit their specific needs and resources.
## The Local Variables
-Some local variables are set in `run.sh`.
-`gpus` denotes the GPU number you want to use. If you set `gpus=`, it means you only use CPU.
-`stage` denotes the number of stages you want to start from in the experiments.
-`stop stage` denotes the number of the stage you want to end at in the experiments.
-`conf_path` denotes the config path of the model.
-`avg_num` denotes the number K of top-K models you want to average to get the final model.
-`audio file` denotes the file path of the single file you want to infer in stage 5
-`ckpt` denotes the checkpoint prefix of the model, e.g. "conformer"
-You can set the local variables (except `ckpt`) when you use `run.sh`
+Some local variables are set in the `run.sh` script and can be configured to customize the execution of the script. Here is a detailed explanation of each variable:
+
+`gpus` denotes the GPU number or numbers you want to use for training or inference. If you set `gpus=` (an empty value), it means you only use the CPU. Multiple GPUs can be specified using a comma-separated list, e.g., `0,1,2,3`.
+
+`stage` denotes the number of the stage you want to start from in the experiments. This allows you to skip certain stages, such as data preparation, if they have already been completed.
+
+`stop_stage` denotes the number of the stage you want to end at in the experiments. This is useful when you only want to run a subset of the stages.
+
+`conf_path` denotes the path to the configuration file of the model. This file contains all the necessary parameters and settings for the model.
+
+`avg_num` denotes the number K of top-K models you want to average to get the final model. Averaging multiple models can improve the robustness and performance of the final model.
+
+`decode_conf_path` denotes the path to the configuration file used for decoding. This file contains settings related to decoding, such as beam size and language model weight.
+
+`audio_file` denotes the file path of the single audio file you want to infer in stage 5. This is useful for testing the model on a specific audio file.
+
+`ckpt` denotes the checkpoint prefix of the model. This is the name of the directory under which the model checkpoints are saved, e.g., "conformer". Note that you cannot set this variable via the command line; it is derived from the `conf_path`.
+
+`ips` (optional) can be used to specify the IP addresses of multiple machines for distributed training. This is not typically used in single-machine setups.
+
+You can set the local variables (except `ckpt`) when you use the `run.sh` script via command-line options. For example, you can set the `gpus` and `avg_num` when you use the following command:
-For example, you can set the `gpus` and `avg_num` when you use the command line:
```bash
bash run.sh --gpus 0,1 --avg_num 20
```
+
+The script uses the `parse_options.sh` utility to parse these command-line options and set the corresponding variables. The script then proceeds to execute the stages specified by `stage` and `stop_stage`, using the configured variables throughout the process.
## Stage 0: Data Processing
-To use this example, you need to process data firstly and you can use stage 0 in `run.sh` to do this. The code is shown below:
+To use this example, you need to process the data first. Stage 0 in the `run.sh` script is dedicated to this task. The relevant code snippet is shown below:
+
```bash
- if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
- # prepare data
- bash ./local/data.sh || exit -1
- fi
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+ # prepare data
+ bash ./local/data.sh || exit -1
+fi
```
-Stage 0 is for processing the data.
-If you only want to process the data. You can run
+Stage 0 is specifically for processing the data. This stage prepares all necessary datasets and metadata required for subsequent training and evaluation phases.
+
+If you only want to process the data without proceeding to other stages, you can run the following command:
+
```bash
bash run.sh --stage 0 --stop_stage 0
```
-You can also just run these scripts in your command line.
+
+Alternatively, you can manually execute the data processing script in your command line. Ensure you have sourced the necessary environment setup scripts first:
+
```bash
-. ./path.sh
-. ./cmd.sh
+source path.sh
bash ./local/data.sh
```
-After processing the data, the `data` directory will look like this:
+
+After successfully processing the data, the `data` directory will be populated with the following structure:
+
```bash
data/
|-- dev.meta
@@ -88,131 +147,295 @@ data/
|-- test.meta
`-- train.meta
```
+
+This directory structure contains manifests, metadata files, vocabulary files, and mean-std normalization parameters necessary for the training and evaluation of your model. Each file and directory serves a specific purpose and is essential for the pipeline to function correctly.
## Stage 1: Model Training
-If you want to train the model. you can use stage 1 in `run.sh`. The code is shown below.
+If you want to train the model, you can use stage 1 in the `run.sh` script. The relevant code segment is shown below:
```bash
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
- # train model, all `ckpt` under `exp` dir
- CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt}
- fi
+ # train model, all `ckpt` under `exp` dir
+ CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
+fi
```
-If you want to train the model, you can use the script below to execute stage 0 and stage 1:
+
+To train the model, you can use the following command to execute stages 0 and 1:
```bash
bash run.sh --stage 0 --stop_stage 1
```
-or you can run these scripts in the command line (only use CPU).
+
+Alternatively, you can run the necessary scripts directly in the command line. If you only want to use the CPU, you can use:
+```bash
+. ./path.sh
+. ./cmd.sh
+bash ./local/data.sh
+CUDA_VISIBLE_DEVICES= ./local/train.sh conf/transformer.yaml transformer
+```
+
+If you want to use GPUs (assuming you have multiple GPUs configured in the `gpus` variable, e.g., `gpus=0,1,2,3`), you can specify the GPU devices as follows:
```bash
. ./path.sh
. ./cmd.sh
bash ./local/data.sh
-CUDA_VISIBLE_DEVICES= ./local/train.sh conf/conformer.yaml conformer
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./local/train.sh conf/transformer.yaml transformer
```
+
+Remember to replace `` with the actual IP addresses of the machines if you are running the training in a distributed setting. If you are running on a single machine, you can leave the `ips` variable empty in the `run.sh` script.
+
+By running the above commands, the script will prepare the data (if `stage 0` is included), and then proceed to train the model (stage 1). The trained checkpoints will be saved under the `exp` directory.
## Stage 2: Top-k Models Averaging
-After training the model, we need to get the final model for testing and inference. In every epoch, the model checkpoint is saved, so we can choose the best model from them based on the validation loss or we can sort them and average the parameters of the top-k models to get the final model. We can use stage 2 to do this, and the code is shown below:
+
+After training the model, we need to get the final model for testing and inference. In every epoch, the model checkpoint is saved, so we can choose the best model from them based on the validation loss or we can sort them and average the parameters of the top-k models to get the final model. Averaging the top-k models often leads to a more robust and generalizable final model.
+
+We can use stage 2 to perform this model averaging, and the relevant code snippet is shown below:
+
```bash
- if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
- # avg n best model
- avg.sh best exp/${ckpt}/checkpoints ${avg_num}
- fi
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+ # avg n best model
+ avg.sh best exp/${ckpt}/checkpoints ${avg_num}
+fi
```
-The `avg.sh` is in the `../../../utils/` which is define in the `path.sh`.
-If you want to get the final model, you can use the script below to execute stage 0, stage 1, and stage 2:
+
+Here, `avg.sh` is a script located in the `../../../utils/` directory, which is defined in the `path.sh` script. This script is responsible for averaging the parameters of the top-k models. The `${ckpt}` variable represents the basename of the configuration file (excluding the extension), and `${avg_num}` specifies the number of top models to average.
+
+To execute this stage along with the previous stages (stage 0 for data preparation and stage 1 for model training), you can use the following command:
+
```bash
bash run.sh --stage 0 --stop_stage 2
```
-or you can run these scripts in the command line (only use CPU).
+
+Alternatively, you can run the individual scripts in the command line (CPU-only) using the following sequence:
```bash
-. ./path.sh
+source path.sh
. ./cmd.sh
bash ./local/data.sh
-CUDA_VISIBLE_DEVICES= ./local/train.sh conf/conformer.yaml conformer
-avg.sh best exp/conformer/checkpoints 20
+CUDA_VISIBLE_DEVICES= ./local/train.sh conf/transformer.yaml transformer
+avg.sh best exp/transformer/checkpoints 30
```
+
+In this example, `conf/transformer.yaml` is the configuration file for the transformer model, `transformer` is the basename used for the checkpoints, and `30` is the number of top models to average.
+
+Make sure to adjust the `conf_path`, `ckpt`, `gpus`, and `avg_num` variables in the main script according to your specific setup and requirements. The `avg_ckpt` variable will be used in subsequent stages for testing and inference with the averaged model.
## Stage 3: Model Testing
-The test stage is to evaluate the model performance. The code of test stage is shown below:
+The test stage is designed to evaluate the model performance. This stage uses the averaged checkpoint (avg_n) obtained from the previous stage to assess the model's accuracy and reliability. The code snippet responsible for this stage is provided below:
```bash
- if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
- # test ckpt avg_n
- CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
- fi
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
+ # test ckpt avg_n
+ CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
+fi
```
-If you want to train a model and test it, you can use the script below to execute stage 0, stage 1, stage 2, and stage 3 :
+
+Here's a breakdown of the script:
+- `CUDA_VISIBLE_DEVICES=0` specifies that only the first GPU (with index 0) should be used for testing.
+- `./local/test.sh` is the script that performs the testing.
+- `${conf_path}` is the path to the configuration file that defines the model architecture and other training parameters.
+- `${decode_conf_path}` is the path to the decoding configuration file that includes parameters like beam size and language model weight.
+- `exp/${ckpt}/checkpoints/${avg_ckpt}` specifies the location of the averaged checkpoint to be tested.
+
+If you want to train a model from scratch and test it up to stage 3, you can use the following script:
```bash
bash run.sh --stage 0 --stop_stage 3
```
-or you can run these scripts in the command line (only use CPU).
+
+Alternatively, you can manually run the relevant scripts in the command line. Below is an example using only the CPU (by omitting the `CUDA_VISIBLE_DEVICES` setting):
```bash
. ./path.sh
. ./cmd.sh
bash ./local/data.sh
-CUDA_VISIBLE_DEVICES= ./local/train.sh conf/conformer.yaml conformer
-avg.sh best exp/conformer/checkpoints 20
-CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
+# Assuming you have set the `gpus`, `conf_path`, `decode_conf_path`, `avg_num`, and other variables correctly
+# Train the model
+# (Note: This step is omitted here for brevity, but it involves running `./local/train.sh` with appropriate arguments)
+# Average the best models
+avg.sh best exp/${ckpt}/checkpoints ${avg_num}
+# Test the averaged checkpoint
+CUDA_VISIBLE_DEVICES= ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
```
+
+Remember to customize the paths and parameters according to your specific setup and model configuration.
## Pretrained Model
-You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md).
+You can get the pretrained transformer or conformer models from [this](../../../docs/source/released_model.md) link.
+
+After downloading the pretrained model, you can use the `tar` command to unpack it. Once unpacked, you can leverage a series of scripts to handle various tasks such as training, testing, aligning, and exporting the model.
-using the `tar` scripts to unpack the model and then you can use the script to test the model.
+Here's a detailed guide on how to use these scripts:
-For example:
+### Unpacking the Pretrained Model
```bash
wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
source path.sh
-# If you have process the data and get the manifest fileļ¼ you can skip the following 2 steps
+```
+
+### Data Preparation
+If you haven't processed the data and obtained the manifest file, you need to run the following commands to prepare the data:
+```bash
bash local/data.sh --stage -1 --stop_stage -1
bash local/data.sh --stage 2 --stop_stage 2
-CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
```
-The performance of the released models are shown in [here](./RESULTS.md).
+These steps are optional if you already have the necessary data prepared.
-## Stage 4: CTC Alignment
-If you want to get the alignment between the audio and the text, you can use the ctc alignment. The code of this stage is shown below:
+### Training the Model
+You can train the model using the provided training script. Set the GPU devices, configuration file path, and other parameters as needed:
```bash
- if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
- # ctc alignment of test data
- CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
- fi
+gpus=0,1,2,3
+conf_path=conf/transformer.yaml
+stage=0
+stop_stage=1
+
+CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt_name} ${ips}
```
-If you want to train the model, test it and do the alignment, you can use the script below to execute stage 0, stage 1, stage 2, and stage 3 :
+Replace `${ckpt_name}` with an appropriate name for your checkpoint and `${ips}` with the IP addresses of the nodes (if you're using multiple nodes for distributed training).
+
+### Averaging Best Models
+After training, you can average the best models to improve performance:
```bash
-bash run.sh --stage 0 --stop_stage 4
+avg_num=30
+stage=2
+stop_stage=2
+
+avg.sh best exp/${ckpt_name}/checkpoints ${avg_num}
+```
+
+### Testing the Model
+You can test the averaged model using the testing script:
+```bash
+decode_conf_path=conf/tuning/decode.yaml
+avg_ckpt=avg_${avg_num}
+stage=3
+stop_stage=3
+
+CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt_name}/checkpoints/${avg_ckpt}
```
-or if you only need to train a model and do the alignment, you can use these scripts to escape stage 3(test stage):
+
+### CTC Alignment
+To perform CTC alignment on the test data:
```bash
-bash run.sh --stage 0 --stop_stage 2
-bash run.sh --stage 4 --stop_stage 4
+stage=4
+stop_stage=4
+
+CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} ${decode_conf_path} exp/${ckpt_name}/checkpoints/${avg_ckpt}
+```
+
+### Testing a Single WAV File
+You can also test a single WAV file using the provided script:
+```bash
+audio_file=data/demo_002_en.wav
+stage=5
+stop_stage=5
+
+CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt_name}/checkpoints/${avg_ckpt} ${audio_file}
+```
+
+### Exporting the Model
+Finally, you can export the model for inference:
+```bash
+stage=51
+stop_stage=51
+
+CUDA_VISIBLE_DEVICES= ./local/export.sh ${conf_path} exp/${ckpt_name}/checkpoints/${avg_ckpt} exp/${ckpt_name}/checkpoints/${avg_ckpt}.jit
+```
+
+The performance of the released models are shown in [this](./RESULTS.md) document.
+## Stage 4: CTC Alignment
+
+This stage is dedicated to obtaining the alignment between the audio and the text using Connectionist Temporal Classification (CTC). CTC is a powerful technique that aligns the input sequence to the target sequence without needing an alignment pre-specified.
+
+```bash
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+ # ctc alignment of test data
+ CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
+fi
+```
+
+To perform CTC alignment, you need to have a trained model and its average checkpoint. The script above will use the specified configuration file (`conf_path`), decoding configuration (`decode_conf_path`), and the average checkpoint (`avg_ckpt`) located in the `exp/${ckpt}/checkpoints/` directory.
+
+If you have already trained a model and wish to obtain the alignment for your test data, you can use the following commands. These commands assume you have already set up your environment and paths correctly:
+
+```bash
+# Assuming you have already trained the model and obtained the average checkpoint
+bash run.sh --stage 0 --stop_stage 4
```
-or you can also use these scripts in the command line (only use CPU).
+
+Alternatively, if you only need to perform the alignment without retraining the model, you can specify the stages directly:
+
```bash
+# Load necessary environment variables
. ./path.sh
. ./cmd.sh
-bash ./local/data.sh
-CUDA_VISIBLE_DEVICES= ./local/train.sh conf/conformer.yaml conformer
-avg.sh best exp/conformer/checkpoints 20
-# test stage is optional
-CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
-CUDA_VISIBLE_DEVICES= ./local/align.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
+
+# Assuming `conf_path` and `decode_conf_path` are already set correctly
+CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
```
+
+In the script, `CUDA_VISIBLE_DEVICES=0` specifies that the alignment process should run on the first GPU. If you do not have a GPU or want to use a different one, you can adjust this setting accordingly.
+
+Note that the `align.sh` script will generate alignment information for your test data, which can be useful for various applications such as visualization, error analysis, and more.
+
+If you encounter any issues during this stage, ensure that all dependencies are correctly installed, and that the paths to the configuration files and checkpoints are correct. Additionally, check the logs for any error messages that might provide insight into the problem.
## Stage 5: Single Audio File Inference
-In some situations, you want to use the trained model to do the inference for the single audio file. You can use stage 5. The code is shown below
+
+In some situations, you may want to use the trained model to perform inference on a single audio file. This can be accomplished using Stage 5. The relevant code snippet is shown below:
+
+```bash
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+ # test a single .wav file
+ CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
+fi
+```
+
+Before running this script, you have the option to train the model yourself using the following command:
+
```bash
- if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
- # test a single .wav file
- CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
- fi
+bash run.sh --stage 0 --stop_stage 3
```
-you can train the model by yourself using ```bash run.sh --stage 0 --stop_stage 3```, or you can download the pretrained model through the script below:
+
+Alternatively, you can download a pretrained model using the script below. Please note that the configuration file and model checkpoint may vary depending on the specific experiment you are running. For this example, let's assume you are using a Conformer model trained on Librispeech:
+
```bash
wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
```
-You can download the audio demo:
+
+You can also download a sample audio file to test the inference:
+
```bash
wget -nc https://paddlespeech.bj.bcebos.com/datasets/single_wav/en/demo_002_en.wav -P data/
```
-You need to prepare an audio file or use the audio demo above, please confirm the sample rate of the audio is 16K. You can get the result of the audio demo by running the script below.
+
+Please ensure that your audio file, whether it's the sample provided or your own, has a sample rate of 16K. To run the inference on the sample audio file, you can use the following command:
+
+```bash
+CUDA_VISIBLE_DEVICES= ./local/test_wav.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_20 data/demo_002_en.wav
+```
+
+In this command:
+- `conf/conformer.yaml` is the configuration file for the Conformer model.
+- `conf/tuning/decode.yaml` contains decoding parameters.
+- `exp/conformer/checkpoints/avg_20` is the path to the averaged checkpoint of the trained model.
+- `data/demo_002_en.wav` is the path to the audio file you want to perform inference on.
+
+Make sure to adjust the paths and filenames according to your specific setup. If the inference runs successfully, you should see the transcription result of the audio file printed in the console or saved to a file (depending on how the `test_wav.sh` script is implemented).
+## Stage 5: Single Audio File Testing
+The single audio file testing stage is designed to evaluate the model's performance on a specific audio file. The code for this stage is shown below:
+```bash
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+ # test a single .wav file
+ CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
+fi
+```
+To perform the single audio file testing, you can use the script below to execute stages up to and including stage 5:
+```bash
+bash run.sh --stage 0 --stop_stage 5
+```
+Alternatively, you can manually run these scripts in the command line (using only CPU for inference if desired). Below is an example of the full sequence of commands:
```bash
-CUDA_VISIBLE_DEVICES= ./local/test_wav.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20 data/demo_002_en.wav
+source path.sh
+bash ./local/data.sh
+CUDA_VISIBLE_DEVICES= ./local/train.sh ${conf_path} ${ckpt} ${ips}
+avg.sh best exp/${ckpt}/checkpoints ${avg_num}
+CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
+CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
+CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file}
```
+
+Remember to replace `${conf_path}`, `${ckpt}`, `${ips}`, `${decode_conf_path}`, `${avg_num}`, and `${audio_file}` with your actual configuration file path, checkpoint name, IP addresses, decode configuration file path, average number of models, and the path to your audio file, respectively.
From a700c8fdc1740843548dd8f8b2ec145f382ea801 Mon Sep 17 00:00:00 2001
From: liyulingyue <852433440@qq.com>
Date: Sun, 17 Nov 2024 22:24:46 +0800
Subject: [PATCH 2/2] add new generate README.md
---
examples/librispeech/asr1/README.md | 476 +++++++++++++++-------------
1 file changed, 259 insertions(+), 217 deletions(-)
diff --git a/examples/librispeech/asr1/README.md b/examples/librispeech/asr1/README.md
index b6d83678df9..d32cd606679 100644
--- a/examples/librispeech/asr1/README.md
+++ b/examples/librispeech/asr1/README.md
@@ -8,11 +8,11 @@ All the scripts you need are in the `run.sh`. There are several stages in the `r
|:---- |:----------------------------------------------------------- |
| 0 | Process data. It includes:
(1) Download the dataset
(2) Calculate the CMVN of the train dataset
(3) Get the vocabulary file
(4) Get the manifest files of the train, development, and test datasets
(5) Get the sentencepiece model |
| 1 | Train the model |
-| 2 | Get the final model by averaging the top-k models, setting k = 1 means choosing the best model |
+| 2 | Get the final model by averaging the top-k models, setting k = 1 means choosing the best model |
| 3 | Test the final model performance |
| 4 | Get CTC alignment of test data using the final model |
| 5 | Infer a single audio file |
-| 51 | Export the final model to a static graph format for deployment |
+| 51 | Export the final model as a JIT (Just-In-Time) compiled static graph |
You can choose to run a range of stages by setting the `stage` and `stop_stage` parameters.
@@ -25,110 +25,148 @@ For example, if you only want to run `stage 0`, you can use the script below:
```bash
bash run.sh --stage 0 --stop_stage 0
```
-The script `run.sh` utilizes configuration files, GPU resources, and local scripts to perform the tasks outlined in each stage. Specifically:
-- Configuration files are loaded from `conf/transformer.yaml` and `conf/tuning/decode.yaml`.
-- GPU devices are specified via the `gpus` variable (e.g., `gpus=0,1,2,3`).
-- Local scripts (e.g., `data.sh`, `train.sh`, `avg.sh`, `test.sh`, `align.sh`, `test_wav.sh`, and `export.sh`) handle the respective tasks for data preparation, model training, averaging, testing, alignment, single audio file inference, and model export.
-
The document below will describe the scripts in the `run.sh` in detail.
## The Environment Variables
-The `path.sh` contains the essential environment variables required for the scripts to run correctly.
+The `path.sh` contains the essential environment variables required for the system to function correctly.
```bash
-. ./path.sh || exit 1;
-. ./cmd.sh || exit 1;
+. ./path.sh
+. ./cmd.sh
```
-These scripts need to be sourced first to ensure that all necessary environment variables are set.
+These scripts need to be sourced first to ensure all necessary paths and variables are set up properly.
Additionally, another script is also required:
```bash
-. ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
+source ${MAIN_ROOT}/utils/parse_options.sh
```
-This script supports the use of `--variable value` options in the shell scripts, allowing for flexible configuration without directly modifying the scripts.
+This script enhances the shell scripts by enabling the use of `--variable value` options.
-The environment variables set in `path.sh` and `cmd.sh` include paths to directories, executable files, and other configurations necessary for the system to function properly. For example, `MAIN_ROOT` is typically set in `path.sh` to point to the main directory of the project.
+The environment variables set in `path.sh` and `cmd.sh` typically include paths to directories, executable files, and other configuration settings. These variables are crucial for scripts like data preparation, model training, evaluation, and export.
-The script also uses several other environment variables that are set either directly in the script or through command-line arguments parsed by `parse_options.sh`. These variables include:
+Here are some key environment variables that might be set in `path.sh` and used throughout the scripts:
-- `gpus`: A comma-separated list of GPU IDs to use for training and inference.
-- `stage` and `stop_stage`: These variables control which stages of the process to run. For instance, setting `stage=1` and `stop_stage=3` will run only stages 1, 2, and 3.
+- `MAIN_ROOT`: The root directory of the project.
+- `gpus`: A comma-separated list of GPU IDs to use for training and evaluation.
+- `stage` and `stop_stage`: Control the stages of the process to run, enabling partial execution for debugging or testing purposes.
- `conf_path`: The path to the configuration file for the model.
-- `ips`: A placeholder for a comma-separated list of IP addresses (typically used for distributed training).
- `decode_conf_path`: The path to the decoding configuration file.
-- `avg_num`: The number of best models to average for improving model performance.
-- `audio_file`: The path to an audio file for single-file testing.
-
-These variables are used throughout the script to control the behavior of different stages, such as data preparation, model training, model averaging, testing, alignment, and exporting the model. For example:
+- `avg_num`: The number of best models to average for better performance.
+- `audio_file`: A path to an audio file for testing the model with a single input.
+Here is an example of how these variables are used in the script:
```bash
+#!/bin/bash
+set -e
+
+. ./path.sh || exit 1;
+. ./cmd.sh || exit 1;
+
+gpus=0,1,2,3
+stage=0
+stop_stage=50
+conf_path=conf/transformer.yaml
+ips= #xx.xx.xx.xx,xx.xx.xx.xx (fill with actual IP addresses)
+decode_conf_path=conf/tuning/decode.yaml
+avg_num=30
+audio_file=data/demo_002_en.wav
+
+. ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
+
+avg_ckpt=avg_${avg_num}
+ckpt=$(basename ${conf_path} | awk -F'.' '{print $1}')
+echo "checkpoint name ${ckpt}"
+
+# Prepare data, train model, average best models, test, align, test a single .wav file, and export model
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
- # prepare data
bash ./local/data.sh || exit -1
fi
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
- # train model
CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
fi
-# ... other stages follow similar patterns ...
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+ ./local/avg.sh best exp/${ckpt}/checkpoints ${avg_num}
+fi
+
+if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
+ CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
+fi
+
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+ CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
+fi
+
+if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
+ CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
+fi
+
+if [ ${stage} -le 51 ] && [ ${stop_stage} -ge 51 ]; then
+ CUDA_VISIBLE_DEVICES= ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit
+fi
```
-By setting these environment variables appropriately, users can customize the behavior of the script to suit their specific needs and resources.
+Ensure that `path.sh` and `cmd.sh` are sourced correctly at the beginning of your scripts to avoid errors related to missing environment variables.
## The Local Variables
-Some local variables are set in the `run.sh` script and can be configured to customize the execution of the script. Here is a detailed explanation of each variable:
+Some local variables are set in the `run.sh` script and within the bash script itself for configuring the experiment. Here's a detailed breakdown of each variable:
-`gpus` denotes the GPU number or numbers you want to use for training or inference. If you set `gpus=` (an empty value), it means you only use the CPU. Multiple GPUs can be specified using a comma-separated list, e.g., `0,1,2,3`.
+`gpus` denotes the GPU number(s) you want to use. If you set `gpus=` (an empty value), it means you only use the CPU. Multiple GPUs can be specified with a comma-separated list, e.g., `0,1,2,3`.
-`stage` denotes the number of the stage you want to start from in the experiments. This allows you to skip certain stages, such as data preparation, if they have already been completed.
+`stage` denotes the number of the stage you want to start from in the experiments. This allows you to skip certain stages if they have already been completed.
-`stop_stage` denotes the number of the stage you want to end at in the experiments. This is useful when you only want to run a subset of the stages.
+`stop_stage` denotes the number of the stage you want to end at in the experiments. This is useful for running only a subset of the stages.
-`conf_path` denotes the path to the configuration file of the model. This file contains all the necessary parameters and settings for the model.
+`conf_path` denotes the config path of the model. This should point to a YAML file containing the model configuration.
-`avg_num` denotes the number K of top-K models you want to average to get the final model. Averaging multiple models can improve the robustness and performance of the final model.
+`avg_num` denotes the number K of top-K models you want to average to get the final model. Averaging multiple models can improve performance and robustness.
-`decode_conf_path` denotes the path to the configuration file used for decoding. This file contains settings related to decoding, such as beam size and language model weight.
+`decode_conf_path` denotes the path to the decoding configuration file, which is used during the testing phase.
-`audio_file` denotes the file path of the single audio file you want to infer in stage 5. This is useful for testing the model on a specific audio file.
+`ips` (not typically set via `run.sh`) allows you to specify IP addresses for distributed training. This variable is commented out by default in the script.
-`ckpt` denotes the checkpoint prefix of the model. This is the name of the directory under which the model checkpoints are saved, e.g., "conformer". Note that you cannot set this variable via the command line; it is derived from the `conf_path`.
+`audio_file` denotes the file path of the single audio file you want to infer in stage 5. This is useful for evaluating the model on a specific audio sample.
-`ips` (optional) can be used to specify the IP addresses of multiple machines for distributed training. This is not typically used in single-machine setups.
+`ckpt` denotes the checkpoint prefix of the model, e.g., "conformer". This prefix is used to identify the model checkpoints during training and evaluation.
-You can set the local variables (except `ckpt`) when you use the `run.sh` script via command-line options. For example, you can set the `gpus` and `avg_num` when you use the following command:
+You can set the local variables (except `ckpt`, which is derived from `conf_path`) when you use the `run.sh` script. The script uses `parse_options.sh` to handle command-line arguments and update the variables accordingly.
+For example, you can set the `gpus` and `avg_num` when you use the command line:
```bash
bash run.sh --gpus 0,1 --avg_num 20
```
-The script uses the `parse_options.sh` utility to parse these command-line options and set the corresponding variables. The script then proceeds to execute the stages specified by `stage` and `stop_stage`, using the configured variables throughout the process.
-## Stage 0: Data Processing
-To use this example, you need to process the data first. Stage 0 in the `run.sh` script is dedicated to this task. The relevant code snippet is shown below:
+The bash script also includes logic to handle different stages of the experiment, such as preparing data, training the model, averaging the best models, testing the averaged model, aligning test data, testing a single `.wav` file, and exporting the model for inference. Each stage is conditionally executed based on the `stage` and `stop_stage` variables.
+## Stage 0: Data Processing and Model Training
+
+If you want to process the data and train the model, you can use stages 0 and 1 in the `run.sh`. The code is shown below:
```bash
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
# prepare data
bash ./local/data.sh || exit -1
fi
-```
-Stage 0 is specifically for processing the data. This stage prepares all necessary datasets and metadata required for subsequent training and evaluation phases.
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ # train model, all `ckpt` under `exp` dir
+ CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
+fi
+```
-If you only want to process the data without proceeding to other stages, you can run the following command:
+To process the data and train the model, you can use the script below to execute stages 0 and 1:
```bash
-bash run.sh --stage 0 --stop_stage 0
+bash run.sh --stage 0 --stop_stage 1
```
-Alternatively, you can manually execute the data processing script in your command line. Ensure you have sourced the necessary environment setup scripts first:
+Alternatively, you can run these scripts in the command line step-by-step. First, source the configuration scripts and process the data:
```bash
-source path.sh
+. ./path.sh
+. ./cmd.sh
bash ./local/data.sh
```
-After successfully processing the data, the `data` directory will be populated with the following structure:
+After processing the data, the `data` directory will look like this:
```bash
data/
@@ -148,76 +186,61 @@ data/
`-- train.meta
```
-This directory structure contains manifests, metadata files, vocabulary files, and mean-std normalization parameters necessary for the training and evaluation of your model. Each file and directory serves a specific purpose and is essential for the pipeline to function correctly.
-## Stage 1: Model Training
-If you want to train the model, you can use stage 1 in the `run.sh` script. The relevant code segment is shown below:
-```bash
-if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
- # train model, all `ckpt` under `exp` dir
- CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
-fi
-```
+To train the model, set the `CUDA_VISIBLE_DEVICES` environment variable to specify the GPUs you want to use (e.g., `0,1,2,3` for multiple GPUs or `0` for a single GPU), and run the training script:
-To train the model, you can use the following command to execute stages 0 and 1:
```bash
-bash run.sh --stage 0 --stop_stage 1
+CUDA_VISIBLE_DEVICES=0,1,2,3 ./local/train.sh ${conf_path} ${ckpt} ${ips}
```
-Alternatively, you can run the necessary scripts directly in the command line. If you only want to use the CPU, you can use:
-```bash
-. ./path.sh
-. ./cmd.sh
-bash ./local/data.sh
-CUDA_VISIBLE_DEVICES= ./local/train.sh conf/transformer.yaml transformer
-```
+If you only have one GPU or want to use CPU, adjust the `CUDA_VISIBLE_DEVICES` accordingly:
-If you want to use GPUs (assuming you have multiple GPUs configured in the `gpus` variable, e.g., `gpus=0,1,2,3`), you can specify the GPU devices as follows:
```bash
-. ./path.sh
-. ./cmd.sh
-bash ./local/data.sh
-CUDA_VISIBLE_DEVICES=0,1,2,3 ./local/train.sh conf/transformer.yaml transformer
+CUDA_VISIBLE_DEVICES=0 ./local/train.sh ${conf_path} ${ckpt} ${ips} # For a single GPU
+# or
+CUDA_VISIBLE_DEVICES= ./local/train.sh ${conf_path} ${ckpt} ${ips} # For CPU (though it's much slower)
```
-Remember to replace `` with the actual IP addresses of the machines if you are running the training in a distributed setting. If you are running on a single machine, you can leave the `ips` variable empty in the `run.sh` script.
-
-By running the above commands, the script will prepare the data (if `stage 0` is included), and then proceed to train the model (stage 1). The trained checkpoints will be saved under the `exp` directory.
-## Stage 2: Top-k Models Averaging
+Replace `${conf_path}`, `${ckpt}`, and `${ips}` with your actual configuration file path, checkpoint name, and IP addresses for distributed training, respectively.
+## Stage 1: Top-k Models Averaging
-After training the model, we need to get the final model for testing and inference. In every epoch, the model checkpoint is saved, so we can choose the best model from them based on the validation loss or we can sort them and average the parameters of the top-k models to get the final model. Averaging the top-k models often leads to a more robust and generalizable final model.
+After training the model in Stage 1, we proceed to average the parameters of the top-k best models to improve robustness and generalization. This technique, often referred to as model averaging or ensemble averaging, combines multiple models to reduce variance and improve overall performance.
-We can use stage 2 to perform this model averaging, and the relevant code snippet is shown below:
+In our pipeline, each epoch saves a model checkpoint. We can sort these checkpoints based on validation loss and select the top-k models for averaging. This process is automated in Stage 1, and the relevant code is shown below:
```bash
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+ # train model, all `ckpt` under `exp` dir
+ CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt} ${ips}
+fi
+
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
# avg n best model
avg.sh best exp/${ckpt}/checkpoints ${avg_num}
fi
```
-Here, `avg.sh` is a script located in the `../../../utils/` directory, which is defined in the `path.sh` script. This script is responsible for averaging the parameters of the top-k models. The `${ckpt}` variable represents the basename of the configuration file (excluding the extension), and `${avg_num}` specifies the number of top models to average.
+Here, the `avg.sh` script is used to average the top-k models. This script is located in the `../../../utils/` directory, which is sourced in the `path.sh` file. The `avg_num` variable specifies the number of top models to average.
-To execute this stage along with the previous stages (stage 0 for data preparation and stage 1 for model training), you can use the following command:
+To execute Stage 1 and proceed with model averaging, you can use the following command:
```bash
-bash run.sh --stage 0 --stop_stage 2
+bash run.sh --stage 1 --stop_stage 2
```
-Alternatively, you can run the individual scripts in the command line (CPU-only) using the following sequence:
+Alternatively, you can run the scripts in the command line manually (using only CPU if desired):
```bash
source path.sh
-. ./cmd.sh
bash ./local/data.sh
-CUDA_VISIBLE_DEVICES= ./local/train.sh conf/transformer.yaml transformer
-avg.sh best exp/transformer/checkpoints 30
+CUDA_VISIBLE_DEVICES= ./local/train.sh ${conf_path} ${ckpt}
+avg.sh best exp/${ckpt}/checkpoints ${avg_num}
```
-In this example, `conf/transformer.yaml` is the configuration file for the transformer model, `transformer` is the basename used for the checkpoints, and `30` is the number of top models to average.
+This will train the model, save the checkpoints, and then average the top-k models specified by `avg_num`. The averaged model will be used in subsequent stages for testing and inference.
+## Stage 2: Model Testing
+
+After averaging the top-k models, we proceed to the testing stage to evaluate the performance of the final averaged model. The testing stage ensures that the model works well on unseen data and provides insights into its accuracy and robustness. The code for the testing stage is shown below:
-Make sure to adjust the `conf_path`, `ckpt`, `gpus`, and `avg_num` variables in the main script according to your specific setup and requirements. The `avg_ckpt` variable will be used in subsequent stages for testing and inference with the averaged model.
-## Stage 3: Model Testing
-The test stage is designed to evaluate the model performance. This stage uses the averaged checkpoint (avg_n) obtained from the previous stage to assess the model's accuracy and reliability. The code snippet responsible for this stage is provided below:
```bash
if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
# test ckpt avg_n
@@ -225,120 +248,139 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
fi
```
-Here's a breakdown of the script:
-- `CUDA_VISIBLE_DEVICES=0` specifies that only the first GPU (with index 0) should be used for testing.
-- `./local/test.sh` is the script that performs the testing.
-- `${conf_path}` is the path to the configuration file that defines the model architecture and other training parameters.
-- `${decode_conf_path}` is the path to the decoding configuration file that includes parameters like beam size and language model weight.
-- `exp/${ckpt}/checkpoints/${avg_ckpt}` specifies the location of the averaged checkpoint to be tested.
+In this script, the `test.sh` script is responsible for evaluating the model. It reads the configuration file specified by `${conf_path}`, decoding configuration file `${decode_conf_path}`, and the checkpoint directory of the averaged model `exp/${ckpt}/checkpoints/${avg_ckpt}`. The script runs the evaluation on the GPU specified by `CUDA_VISIBLE_DEVICES=0`.
+
+If you want to train a model, average the top-k models, and test the final averaged model, you can use the script below to execute stage 0, stage 1, stage 2, and stage 3:
-If you want to train a model from scratch and test it up to stage 3, you can use the following script:
```bash
bash run.sh --stage 0 --stop_stage 3
```
-Alternatively, you can manually run the relevant scripts in the command line. Below is an example using only the CPU (by omitting the `CUDA_VISIBLE_DEVICES` setting):
+Alternatively, you can run these scripts step-by-step in the command line (only using CPU if needed):
+
```bash
-. ./path.sh
-. ./cmd.sh
+source path.sh
bash ./local/data.sh
-# Assuming you have set the `gpus`, `conf_path`, `decode_conf_path`, `avg_num`, and other variables correctly
-# Train the model
-# (Note: This step is omitted here for brevity, but it involves running `./local/train.sh` with appropriate arguments)
-# Average the best models
-avg.sh best exp/${ckpt}/checkpoints ${avg_num}
-# Test the averaged checkpoint
-CUDA_VISIBLE_DEVICES= ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
+CUDA_VISIBLE_DEVICES= ./local/train.sh conf/transformer.yaml transformer xx.xx.xx.xx,xx.xx.xx.xx
+avg.sh best exp/transformer/checkpoints 30
+CUDA_VISIBLE_DEVICES=0 ./local/test.sh conf/transformer.yaml conf/tuning/decode.yaml exp/transformer/checkpoints/avg_30
```
-Remember to customize the paths and parameters according to your specific setup and model configuration.
+This will prepare the data, train the model, average the top-k models, and finally test the averaged model using the `test.sh` script. The output of the test stage will provide you with detailed performance metrics, such as word error rate (WER) or character error rate (CER), depending on your evaluation setup.
## Pretrained Model
-You can get the pretrained transformer or conformer models from [this](../../../docs/source/released_model.md) link.
+You can get the pretrained transformer or conformer models from [this](../../../docs/source/released_model.md) page.
-After downloading the pretrained model, you can use the `tar` command to unpack it. Once unpacked, you can leverage a series of scripts to handle various tasks such as training, testing, aligning, and exporting the model.
+Once you have downloaded the model, you can use the `tar` command to unpack it. After unpacking, you can leverage a series of bash scripts to train, average the best models, test the model, align the test data, test a single `.wav` file, and even export the model for inference.
-Here's a detailed guide on how to use these scripts:
+Here's a step-by-step guide to utilizing these scripts:
-### Unpacking the Pretrained Model
-```bash
-wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
-tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
-source path.sh
-```
+1. **Download and Unpack the Model**:
-### Data Preparation
-If you haven't processed the data and obtained the manifest file, you need to run the following commands to prepare the data:
-```bash
-bash local/data.sh --stage -1 --stop_stage -1
-bash local/data.sh --stage 2 --stop_stage 2
-```
-These steps are optional if you already have the necessary data prepared.
+ ```bash
+ wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
+ tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
+ source path.sh
+ ```
-### Training the Model
-You can train the model using the provided training script. Set the GPU devices, configuration file path, and other parameters as needed:
-```bash
-gpus=0,1,2,3
-conf_path=conf/transformer.yaml
-stage=0
-stop_stage=1
+2. **Prepare Data (if not already processed)**:
-CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt_name} ${ips}
-```
-Replace `${ckpt_name}` with an appropriate name for your checkpoint and `${ips}` with the IP addresses of the nodes (if you're using multiple nodes for distributed training).
+ If you haven't processed your data and generated the manifest file, you need to run the following commands to prepare the data:
-### Averaging Best Models
-After training, you can average the best models to improve performance:
-```bash
-avg_num=30
-stage=2
-stop_stage=2
+ ```bash
+ bash local/data.sh --stage -1 --stop_stage -1
+ bash local/data.sh --stage 2 --stop_stage 2
+ ```
-avg.sh best exp/${ckpt_name}/checkpoints ${avg_num}
-```
+3. **Training the Model**:
-### Testing the Model
-You can test the averaged model using the testing script:
-```bash
-decode_conf_path=conf/tuning/decode.yaml
-avg_ckpt=avg_${avg_num}
-stage=3
-stop_stage=3
+ You can train the model using multiple GPUs by specifying the `gpus` variable. Adjust the `conf_path` to the configuration file you want to use.
-CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt_name}/checkpoints/${avg_ckpt}
-```
+ ```bash
+ gpus=0,1,2,3
+ stage=0
+ stop_stage=1
+ conf_path=conf/transformer.yaml
+
+ . ${MAIN_ROOT}/utils/parse_options.sh
+
+ ckpt=$(basename ${conf_path} | awk -F'.' '{print $1}')
+
+ CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${ckpt}
+ ```
-### CTC Alignment
-To perform CTC alignment on the test data:
-```bash
-stage=4
-stop_stage=4
+4. **Average the Best Models**:
-CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} ${decode_conf_path} exp/${ckpt_name}/checkpoints/${avg_ckpt}
-```
+ After training, you can average the best `n` models to improve performance.
-### Testing a Single WAV File
-You can also test a single WAV file using the provided script:
-```bash
-audio_file=data/demo_002_en.wav
-stage=5
-stop_stage=5
+ ```bash
+ stage=2
+ stop_stage=2
+ avg_num=30
+
+ . ${MAIN_ROOT}/utils/parse_options.sh
+
+ avg.sh best exp/${ckpt}/checkpoints ${avg_num}
+ ```
-CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt_name}/checkpoints/${avg_ckpt} ${audio_file}
-```
+5. **Test the Averaged Model**:
-### Exporting the Model
-Finally, you can export the model for inference:
-```bash
-stage=51
-stop_stage=51
+ Use the test script to evaluate the averaged model.
-CUDA_VISIBLE_DEVICES= ./local/export.sh ${conf_path} exp/${ckpt_name}/checkpoints/${avg_ckpt} exp/${ckpt_name}/checkpoints/${avg_ckpt}.jit
-```
+ ```bash
+ stage=3
+ stop_stage=3
+ decode_conf_path=conf/tuning/decode.yaml
+ avg_ckpt=avg_${avg_num}
+
+ . ${MAIN_ROOT}/utils/parse_options.sh
+
+ CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
+ ```
+
+6. **CTC Alignment of Test Data**:
-The performance of the released models are shown in [this](./RESULTS.md) document.
-## Stage 4: CTC Alignment
+ If you need to align the test data using CTC, you can run:
-This stage is dedicated to obtaining the alignment between the audio and the text using Connectionist Temporal Classification (CTC). CTC is a powerful technique that aligns the input sequence to the target sequence without needing an alignment pre-specified.
+ ```bash
+ stage=4
+ stop_stage=4
+
+ . ${MAIN_ROOT}/utils/parse_options.sh
+
+ CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
+ ```
+
+7. **Test a Single `.wav` File**:
+
+ To test a single audio file, use the `test_wav.sh` script.
+
+ ```bash
+ stage=5
+ stop_stage=5
+ audio_file=data/demo_002_en.wav
+
+ . ${MAIN_ROOT}/utils/parse_options.sh
+
+ CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file}
+ ```
+
+8. **Export the Model**:
+
+ Finally, you can export the model for inference.
+
+ ```bash
+ stage=51
+ stop_stage=51
+
+ . ${MAIN_ROOT}/utils/parse_options.sh
+
+ CUDA_VISIBLE_DEVICES= ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit
+ ```
+
+The performance of the released models are shown in [this](./RESULTS.md).
+## Stage 3: CTC Alignment
+
+The CTC Alignment stage aims to generate the alignment between the audio and the text. This step is crucial for understanding how the model maps audio signals to text characters. The code for this stage is provided below:
```bash
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
@@ -347,95 +389,95 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
fi
```
-To perform CTC alignment, you need to have a trained model and its average checkpoint. The script above will use the specified configuration file (`conf_path`), decoding configuration (`decode_conf_path`), and the average checkpoint (`avg_ckpt`) located in the `exp/${ckpt}/checkpoints/` directory.
+To perform CTC alignment, you need to have already trained and averaged your model. The script above uses the averaged checkpoint (`avg_ckpt`) to generate the alignment.
-If you have already trained a model and wish to obtain the alignment for your test data, you can use the following commands. These commands assume you have already set up your environment and paths correctly:
+If you want to train the model, test it, and perform the alignment in sequence, you can use the following script to execute stages 0 through 4:
```bash
-# Assuming you have already trained the model and obtained the average checkpoint
bash run.sh --stage 0 --stop_stage 4
```
-Alternatively, if you only need to perform the alignment without retraining the model, you can specify the stages directly:
+Alternatively, if you have already trained and averaged your model but skipped the test stage, you can directly proceed to the alignment stage using:
+
+```bash
+bash run.sh --stage 4 --stop_stage 4
+```
+
+Or, you can manually run the necessary scripts in the command line. Below is an example using only the CPU:
```bash
-# Load necessary environment variables
. ./path.sh
. ./cmd.sh
+bash ./local/data.sh
+# Assuming you have already trained and averaged your model
+# CUDA_VISIBLE_DEVICES= ./local/train.sh and avg.sh commands would be run previously
+# Now, proceed to the alignment stage
+CUDA_VISIBLE_DEVICES= ./local/align.sh conf/transformer.yaml conf/tuning/decode.yaml exp/${ckpt}/checkpoints/${avg_ckpt}
+```
-# Assuming `conf_path` and `decode_conf_path` are already set correctly
-CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
+Make sure to replace `conf/transformer.yaml` and other paths with the actual configuration and checkpoint paths you are using. The `decode_conf_path` is also crucial as it contains parameters related to decoding, which might affect the alignment result.
+## Stage 4: Export Model to JIT Format
+This stage involves exporting the trained model to a JIT (Just-In-Time) format, which is optimized for inference. The JIT format allows for faster and more efficient model loading and execution.
+
+```bash
+if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
+ # export ckpt avg_n to JIT format
+ CUDA_VISIBLE_DEVICES= ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit
+fi
```
-In the script, `CUDA_VISIBLE_DEVICES=0` specifies that the alignment process should run on the first GPU. If you do not have a GPU or want to use a different one, you can adjust this setting accordingly.
+If you have successfully trained and averaged your model in the previous stages, you can now export it to the JIT format by running the script above. The script takes the configuration file path, the path to the averaged checkpoint, and the desired output path for the JIT model as arguments.
-Note that the `align.sh` script will generate alignment information for your test data, which can be useful for various applications such as visualization, error analysis, and more.
+Here is an example command to export a model:
-If you encounter any issues during this stage, ensure that all dependencies are correctly installed, and that the paths to the configuration files and checkpoints are correct. Additionally, check the logs for any error messages that might provide insight into the problem.
+```bash
+source path.sh
+./local/export.sh conf/transformer.yaml exp/transformer/checkpoints/avg_30 exp/transformer/checkpoints/avg_30.jit
+```
+
+In this example, `conf/transformer.yaml` is the configuration file used for training, `exp/transformer/checkpoints/avg_30` is the path to the averaged checkpoint, and `exp/transformer/checkpoints/avg_30.jit` is the desired output path for the JIT model.
+
+Make sure to adjust the paths and filenames according to your specific setup. Once the export process is complete, you will have a JIT model ready for inference.
## Stage 5: Single Audio File Inference
-In some situations, you may want to use the trained model to perform inference on a single audio file. This can be accomplished using Stage 5. The relevant code snippet is shown below:
+In some situations, you want to use the trained model to perform inference on a single audio file. You can use stage 5 for this purpose. The code segment related to this stage is shown below:
```bash
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
- # test a single .wav file
+ # Test a single .wav file
CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
fi
```
-Before running this script, you have the option to train the model yourself using the following command:
+To use this script, you have the option to train the model yourself using the following command:
```bash
bash run.sh --stage 0 --stop_stage 3
```
-Alternatively, you can download a pretrained model using the script below. Please note that the configuration file and model checkpoint may vary depending on the specific experiment you are running. For this example, let's assume you are using a Conformer model trained on Librispeech:
+Alternatively, you can download a pretrained model using the script below. Make sure to adjust the URL to match the specific model you want to use:
```bash
wget https://paddlespeech.bj.bcebos.com/s2t/librispeech/asr1/asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
tar xzvf asr1_conformer_librispeech_ckpt_0.1.1.model.tar.gz
```
-You can also download a sample audio file to test the inference:
+You can also download a demo audio file to test the inference:
```bash
wget -nc https://paddlespeech.bj.bcebos.com/datasets/single_wav/en/demo_002_en.wav -P data/
```
-Please ensure that your audio file, whether it's the sample provided or your own, has a sample rate of 16K. To run the inference on the sample audio file, you can use the following command:
+Please ensure that your audio file, whether the demo or your own, has a sample rate of 16K. To run the inference on the demo audio file, use the following command:
```bash
CUDA_VISIBLE_DEVICES= ./local/test_wav.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_20 data/demo_002_en.wav
```
In this command:
-- `conf/conformer.yaml` is the configuration file for the Conformer model.
-- `conf/tuning/decode.yaml` contains decoding parameters.
+- `conf/conformer.yaml` is the configuration file for the model.
+- `conf/tuning/decode.yaml` contains decoding-related configurations.
- `exp/conformer/checkpoints/avg_20` is the path to the averaged checkpoint of the trained model.
-- `data/demo_002_en.wav` is the path to the audio file you want to perform inference on.
-
-Make sure to adjust the paths and filenames according to your specific setup. If the inference runs successfully, you should see the transcription result of the audio file printed in the console or saved to a file (depending on how the `test_wav.sh` script is implemented).
-## Stage 5: Single Audio File Testing
-The single audio file testing stage is designed to evaluate the model's performance on a specific audio file. The code for this stage is shown below:
-```bash
-if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
- # test a single .wav file
- CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
-fi
-```
-To perform the single audio file testing, you can use the script below to execute stages up to and including stage 5:
-```bash
-bash run.sh --stage 0 --stop_stage 5
-```
-Alternatively, you can manually run these scripts in the command line (using only CPU for inference if desired). Below is an example of the full sequence of commands:
-```bash
-source path.sh
-bash ./local/data.sh
-CUDA_VISIBLE_DEVICES= ./local/train.sh ${conf_path} ${ckpt} ${ips}
-avg.sh best exp/${ckpt}/checkpoints ${avg_num}
-CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
-CUDA_VISIBLE_DEVICES=0 ./local/align.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt}
-CUDA_VISIBLE_DEVICES=0 ./local/test_wav.sh ${conf_path} ${decode_conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file}
-```
+- `data/demo_002_en.wav` is the path to the audio file you want to test.
-Remember to replace `${conf_path}`, `${ckpt}`, `${ips}`, `${decode_conf_path}`, `${avg_num}`, and `${audio_file}` with your actual configuration file path, checkpoint name, IP addresses, decode configuration file path, average number of models, and the path to your audio file, respectively.
+Adjust these paths according to your specific setup and the model you are using.