Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

funasr1.0 #1241

Merged
merged 89 commits into from
Jan 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
85cabd3
update with main (#1163)
LauraGPT Dec 11, 2023
d77910e
funasr2
LauraGPT Dec 11, 2023
6745487
funasr2
LauraGPT Dec 11, 2023
806a036
funasr2 paraformer biciparaformer contextuaparaformer
LauraGPT Dec 13, 2023
7012ca2
funasr2 paraformer biciparaformer contextuaparaformer
LauraGPT Dec 13, 2023
298ddd1
funasr2
LauraGPT Dec 15, 2023
53fcccc
修改cnn为合理的包名
yeyupiaoling Dec 19, 2023
0e622e6
funasr2
LauraGPT Dec 19, 2023
d71089d
funasr2
LauraGPT Dec 19, 2023
ea4453c
Merge branch 'yeyupiaoling-move-cnn' into dev_gzf_funasr2
LauraGPT Dec 19, 2023
00ea118
funasr2
LauraGPT Dec 19, 2023
836d57b
update seaco paraformer
R1ckShi Dec 20, 2023
c8bae0e
funasr2
LauraGPT Dec 21, 2023
f920ca6
Merge branch 'dev_gzf_funasr2' of github.com:alibaba-damo-academy/Fun…
LauraGPT Dec 21, 2023
a1b0cd3
rename register tables
LauraGPT Dec 21, 2023
5a8f379
vad + asr
LauraGPT Dec 21, 2023
b482519
funasr1.0
LauraGPT Dec 21, 2023
f2a68d0
funasr1.0
LauraGPT Dec 21, 2023
b66a41f
funasr1.0
LauraGPT Dec 21, 2023
bdc7a17
funasr1.0
LauraGPT Dec 26, 2023
f6b611d
funasr1.0
LauraGPT Dec 27, 2023
523e902
funasr1.0
LauraGPT Dec 27, 2023
c6d6c93
funasr1.0
LauraGPT Dec 27, 2023
b221246
funasr1.0
LauraGPT Dec 27, 2023
d339765
funasr1.0
LauraGPT Dec 27, 2023
9afc917
seaco paraformer inference
R1ckShi Dec 27, 2023
840657e
paper link
R1ckShi Dec 27, 2023
ccb9488
funasr1.0
LauraGPT Dec 27, 2023
fddb28f
Merge branch 'dev_gzf_funasr2' of github.com:alibaba-damo-academy/Fun…
LauraGPT Dec 27, 2023
4719ca4
funasr1.0
LauraGPT Dec 27, 2023
3e3eed1
update bicif, bicif seaco
R1ckShi Dec 28, 2023
c9c95ec
funasr1.0
LauraGPT Dec 28, 2023
f1d86e9
update scripts
R1ckShi Dec 28, 2023
36702d2
funasr1.0
LauraGPT Jan 4, 2024
e82bd3d
funasr1.0
LauraGPT Jan 4, 2024
32905d8
funasr1.0
LauraGPT Jan 5, 2024
276c443
update monotonic aligner
R1ckShi Jan 5, 2024
e9a015e
update demo file
R1ckShi Jan 5, 2024
ab122d5
update code
R1ckShi Jan 5, 2024
8567604
update
LauraGPT Jan 5, 2024
622d799
update
LauraGPT Jan 5, 2024
3e8b21a
update
LauraGPT Jan 5, 2024
e63169b
prepare_data_iterator
LauraGPT Jan 5, 2024
4f98546
load_audio_text_image_video
LauraGPT Jan 5, 2024
e6a7bbe
load_audio_text_image_video
LauraGPT Jan 5, 2024
fb17640
funasr1.0 emotion2vec
LauraGPT Jan 8, 2024
e8590bb
funasr1.0 emotion2vec
LauraGPT Jan 8, 2024
0a53be2
funasr1.0 emotion2vec
LauraGPT Jan 8, 2024
f14f9f8
funasr1.0 infer url modelscope
LauraGPT Jan 8, 2024
d8b586e
funasr1.0 modelscope
LauraGPT Jan 9, 2024
6eaf50a
funasr1.0 paraformer_streaming
LauraGPT Jan 9, 2024
f79d31d
update funasr-onnx
R1ckShi Jan 10, 2024
e30a17c
update funasr-onnx
R1ckShi Jan 10, 2024
2d0c827
update funasr-onnx
R1ckShi Jan 10, 2024
1028a8a
funasr1.0 paraformer_streaming WavFrontendOnline
LauraGPT Jan 10, 2024
d342c64
Merge branch 'funasr1.0' of github.com:alibaba-damo-academy/FunASR in…
LauraGPT Jan 10, 2024
668b830
update cam++ for embed extract
R1ckShi Jan 10, 2024
47088b8
funasr1.0 paraformer_streaming
LauraGPT Jan 10, 2024
78ffd04
Merge branch 'funasr1.0' of github.com:alibaba-damo-academy/FunASR in…
LauraGPT Jan 10, 2024
7037971
update asr with speaker
R1ckShi Jan 11, 2024
a75bbb0
funasr1.0 paraformer_streaming
LauraGPT Jan 11, 2024
c0e72dd
Merge branch 'funasr1.0' of github.com:alibaba-damo-academy/FunASR in…
LauraGPT Jan 11, 2024
487420b
funasr1.0 paraformer_streaming
LauraGPT Jan 11, 2024
f6c82b1
funasr1.0 paraformer_streaming
LauraGPT Jan 11, 2024
d72a449
support oracle num for asr with spk
R1ckShi Jan 11, 2024
cf2f143
funasr1.0 fsmn-vad streaming
LauraGPT Jan 11, 2024
02f580b
Merge branch 'funasr1.0' of github.com:alibaba-damo-academy/FunASR in…
LauraGPT Jan 11, 2024
b00f91e
funasr1.0 fsmn-vad streaming
LauraGPT Jan 11, 2024
247c763
funasr1.0 fsmn-vad streaming
LauraGPT Jan 12, 2024
0143122
funasr1.0 streaming demo
LauraGPT Jan 12, 2024
a0d7781
funasr1.0 streaming demo
LauraGPT Jan 12, 2024
40d1f80
funasr1.0 streaming demo
LauraGPT Jan 12, 2024
bafd056
funasr1.0 streaming
LauraGPT Jan 12, 2024
38524b2
funasr1.0 streaming
LauraGPT Jan 12, 2024
bcb8b0c
update (debugging)
R1ckShi Jan 12, 2024
09a28d1
update
R1ckShi Jan 12, 2024
c3442d9
update device
R1ckShi Jan 12, 2024
0c75e62
update device bug
R1ckShi Jan 12, 2024
c3c78fc
bug fix
R1ckShi Jan 12, 2024
c0b186b
funasr1.0 streaming
LauraGPT Jan 12, 2024
835369d
funasr1.0 fix punc model
LauraGPT Jan 13, 2024
bdfd27b
funasr1.0
LauraGPT Jan 13, 2024
99730b3
funasr1.0 ct-transformer streaming
LauraGPT Jan 14, 2024
8912e06
Resolve merge conflict
LauraGPT Jan 14, 2024
831c48a
download configuration.json
LauraGPT Jan 15, 2024
2a0b2c7
funasr1.0
LauraGPT Jan 15, 2024
c6361cc
funasr1.0
LauraGPT Jan 15, 2024
a035d68
funasr1.0
LauraGPT Jan 15, 2024
97d648c
code optimize, model update, scripts
R1ckShi Jan 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,5 @@ docs/_build
modelscope
samples
.ipynb_checkpoints
outputs*
emotion2vec*
100 changes: 76 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,57 +76,109 @@ FunASR has open-sourced a large number of pre-trained models on industrial data.

<a name="quick-start"></a>
## Quick Start
Quick start for new users([tutorial](https://alibaba-damo-academy.github.io/FunASR/en/funasr/quick_start.html))

FunASR supports inference and fine-tuning of models trained on industrial data for tens of thousands of hours. For more details, please refer to [modelscope_egs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html). It also supports training and fine-tuning of models on academic standard datasets. For more information, please refer to [egs](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html).

Below is a quick start tutorial. Test audio files ([Mandarin](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav), [English]()).

### Command-line usage

```shell
funasr --model paraformer-zh asr_example_zh.wav
funasr +model=paraformer-zh +vad_model="fsmn-vad" +punc_model="ct-punc" +input=asr_example_zh.wav
```

Notes: Support recognition of single audio file, as well as file list in Kaldi-style wav.scp format: `wav_id wav_pat`

### Speech Recognition (Non-streaming)
```python
from funasr import infer
from funasr import AutoModel

p = infer(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", model_hub="ms")
model = AutoModel(model="paraformer-zh")
# for the long duration wav, you could add vad model
# model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc")

res = p("asr_example_zh.wav", batch_size_token=5000)
res = model(input="asr_example_zh.wav", batch_size=64)
print(res)
```
Note: `model_hub`: represents the model repository, `ms` stands for selecting ModelScope download, `hf` stands for selecting Huggingface download.

### Speech Recognition (Streaming)
```python
from funasr import infer

p = infer(model="paraformer-zh-streaming", model_hub="ms")
from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size, "encoder_chunk_look_back": 4, "decoder_chunk_look_back": 1}

import torchaudio
speech = torchaudio.load("asr_example_zh.wav")[0][0]
speech_length = speech.shape[0]

stride_size = chunk_size[1] * 960
sample_offset = 0
for sample_offset in range(0, speech_length, min(stride_size, speech_length - sample_offset)):
param_dict["is_final"] = True if sample_offset + stride_size >= speech_length - 1 else False
input = speech[sample_offset: sample_offset + stride_size]
rec_result = p(input=input, param_dict=param_dict)
print(rec_result)
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.0")

import soundfile
import os

wav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
is_final = i == total_chunk_num - 1
res = model(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
print(res)
```
Note: `chunk_size` is the configuration for streaming latency.` [0,10,5]` indicates that the real-time display granularity is `10*60=600ms`, and the lookahead information is `5*60=300ms`. Each inference input is `600ms` (sample points are `16000*0.6=960`), and the output is the corresponding text. For the last speech segment input, `is_final=True` needs to be set to force the output of the last word.

Quick start for new users can be found in [docs](https://alibaba-damo-academy.github.io/FunASR/en/funasr/quick_start_zh.html)
### Voice Activity Detection (streaming)
```python
from funasr import AutoModel

model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")

wav_file = f"{model.model_path}/example/asr_example.wav"
res = model(input=wav_file)
print(res)
```
### Voice Activity Detection (Non-streaming)
```python
from funasr import AutoModel

chunk_size = 200 # ms
model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")

import soundfile

wav_file = f"{model.model_path}/example/vad_example.wav"
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = int(chunk_size * sample_rate / 1000)

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
is_final = i == total_chunk_num - 1
res = model(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)
if len(res[0]["value"]):
print(res)
```
### Punctuation Restoration
```python
from funasr import AutoModel

model = AutoModel(model="ct-punc", model_revision="v2.0.1")

res = model(input="那今天的会就到这里吧 happy new year 明年见")
print(res)
```
### Timestamp Prediction
```python
from funasr import AutoModel

model = AutoModel(model="fa-zh", model_revision="v2.0.0")

wav_file = f"{model.model_path}/example/asr_example.wav"
text_file = f"{model.model_path}/example/asr_example.wav"
res = model(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)
```
[//]: # (FunASR supports inference and fine-tuning of models trained on industrial datasets of tens of thousands of hours. For more details, please refer to &#40;[modelscope_egs]&#40;https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html&#41;&#41;. It also supports training and fine-tuning of models on academic standard datasets. For more details, please refer to&#40;[egs]&#40;https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html&#41;&#41;. The models include speech recognition &#40;ASR&#41;, speech activity detection &#40;VAD&#41;, punctuation recovery, language model, speaker verification, speaker separation, and multi-party conversation speech recognition. For a detailed list of models, please refer to the [Model Zoo]&#40;https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md&#41;:)

## Deployment Service
Expand Down
128 changes: 95 additions & 33 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,68 +57,130 @@ FunASR开源了大量在工业数据上预训练模型,您可以在[模型许
(注:[🤗]()表示Huggingface模型仓库链接,[⭐]()表示ModelScope模型仓库链接)


| 模型名字 | 任务详情 | 训练数据 | 参数量 |
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
| paraformer-zh <br> ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗]() ) | 语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
| paraformer-zh-spk <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) [🤗]() ) | 分角色语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
| paraformer-zh-online <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) | 语音识别,实时 | 60000小时,中文 | 220M |
| paraformer-en <br> ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
| paraformer-en-spk <br> ([⭐]() [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
| conformer-en <br> ( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
| ct-punc <br> ( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() ) | 标点恢复 | 100M,中文与英文 | 1.1G |
| fsmn-vad <br> ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() ) | 语音端点检测,实时 | 5000小时,中文与英文 | 0.4M |
| fa-zh <br> ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() ) | 字级别时间戳预测 | 50000小时,中文 | 38M |
| 模型名字 | 任务详情 | 训练数据 | 参数量 |
|:------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
| paraformer-zh <br> ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗]() ) | 语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
| paraformer-zh-spk <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) [🤗]() ) | 分角色语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
| paraformer-zh-streaming <br> ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) | 语音识别,实时 | 60000小时,中文 | 220M |
| paraformer-en <br> ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
| paraformer-en-spk <br> ([⭐]() [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
| conformer-en <br> ( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
| ct-punc <br> ( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() ) | 标点恢复 | 100M,中文与英文 | 1.1G |
| fsmn-vad <br> ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() ) | 语音端点检测,实时 | 5000小时,中文与英文 | 0.4M |
| fa-zh <br> ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() ) | 字级别时间戳预测 | 50000小时,中文 | 38M |


<a name="快速开始"></a>
## 快速开始
FunASR支持数万小时工业数据训练的模型的推理和微调,详细信息可以参阅([modelscope_egs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html));也支持学术标准数据集模型的训练和微调,详细信息可以参阅([egs](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html))。

下面为快速上手教程,测试音频([中文](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav),[英文]())

### 可执行命令行

```shell
funasr --model paraformer-zh asr_example_zh.wav
funasr +model=paraformer-zh +vad_model="fsmn-vad" +punc_model="ct-punc" +input=asr_example_zh.wav
```

注:支持单条音频文件识别,也支持文件列表,列表为kaldi风格wav.scp:`wav_id wav_path`

### 非实时语音识别
```python
from funasr import infer
from funasr import AutoModel

p = infer(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", model_hub="ms")
model = AutoModel(model="paraformer-zh")
# for the long duration wav, you could add vad model
# model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc")

res = p("asr_example_zh.wav", batch_size_token=5000)
res = model(input="asr_example_zh.wav", batch_size=64)
print(res)
```
注:`model_hub`:表示模型仓库,`ms`为选择modelscope下载,`hf`为选择huggingface下载。

### 实时语音识别
```python
from funasr import infer

p = infer(model="paraformer-zh-streaming", model_hub="ms")
```python
from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size, "encoder_chunk_look_back": 4, "decoder_chunk_look_back": 1}

import torchaudio
speech = torchaudio.load("asr_example_zh.wav")[0][0]
speech_length = speech.shape[0]

stride_size = chunk_size[1] * 960
sample_offset = 0
for sample_offset in range(0, speech_length, min(stride_size, speech_length - sample_offset)):
param_dict["is_final"] = True if sample_offset + stride_size >= speech_length - 1 else False
input = speech[sample_offset: sample_offset + stride_size]
rec_result = p(input=input, param_dict=param_dict)
print(rec_result)
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.0")

import soundfile
import os

wav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
is_final = i == total_chunk_num - 1
res = model(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
print(res)
```

注:`chunk_size`为流式延时配置,`[0,10,5]`表示上屏实时出字粒度为`10*60=600ms`,未来信息为`5*60=300ms`。每次推理输入为`600ms`(采样点数为`16000*0.6=960`),输出为对应文字,最后一个语音片段输入需要设置`is_final=True`来强制输出最后一个字。

更多详细用法([新人文档](https://alibaba-damo-academy.github.io/FunASR/en/funasr/quick_start_zh.html))
### 语音端点检测(非实时)
```python
from funasr import AutoModel

model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")

wav_file = f"{model.model_path}/example/asr_example.wav"
res = model(input=wav_file)
print(res)
```

### 语音端点检测(实时)
```python
from funasr import AutoModel

chunk_size = 200 # ms
model = AutoModel(model="fsmn-vad", model_revision="v2.0.2")

import soundfile

wav_file = f"{model.model_path}/example/vad_example.wav"
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = int(chunk_size * sample_rate / 1000)

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
is_final = i == total_chunk_num - 1
res = model(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)
if len(res[0]["value"]):
print(res)
```

### 标点恢复
```python
from funasr import AutoModel

model = AutoModel(model="ct-punc", model_revision="v2.0.1")

res = model(input="那今天的会就到这里吧 happy new year 明年见")
print(res)
```

### 时间戳预测
```python
from funasr import AutoModel

model = AutoModel(model="fa-zh", model_revision="v2.0.0")

wav_file = f"{model.model_path}/example/asr_example.wav"
text_file = f"{model.model_path}/example/asr_example.wav"
res = model(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)
```
更多详细用法([示例](examples/industrial_data_pretraining))


<a name="服务部署"></a>
Expand Down
2 changes: 2 additions & 0 deletions data/list/audio_datasets.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{"key": "ID0012W0013", ",prompt": "<ASR>", "source": "/Users/zhifu/funasr_github/test_local/aishell2_dev_ios/wav/D0012/ID0012W0013.wav", "target": "当客户风险承受能力评估依据发生变化时", "source_len": 454, "target_len": 19}
{"key":"ID0012W0014", ",prompt": "<ASR>", "source": "/Users/zhifu/funasr_github/test_local/aishell2_dev_ios/wav/D0012/ID0012W0014.wav", "target": "杨涛不得不将工厂关掉", "source_len": 211, "target_len": 11}
1 change: 0 additions & 1 deletion docs/benchmark/benchmark_libtorch.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/benchmark/benchmark_onnx.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/benchmark/benchmark_onnx_cpp.md

This file was deleted.

16 changes: 0 additions & 16 deletions egs/aishell/bat/README.md

This file was deleted.

1 change: 0 additions & 1 deletion egs/aishell/bat/conf/decode_bat_conformer.yaml

This file was deleted.

Loading
Loading