use of external LM to improve transcriptions #940

iggygeek · 2023-03-09T09:19:16Z

iggygeek
Mar 9, 2023

Could you explain how to used an external LM, N-gram or RNN to improve the trnascriptions ?

Answered by marcoyang1998

Mar 9, 2023

We provide several decoding methods with external language models. Please have a look at https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md#pruned_transducer_stateless7-zipformer.

View full answer

marcoyang1998 · 2023-03-09T09:21:17Z

marcoyang1998
Mar 9, 2023
Maintainer

We provide several decoding methods with external language models. Please have a look at https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md#pruned_transducer_stateless7-zipformer.

0 replies

iggygeek · 2023-03-10T10:53:33Z

iggygeek
Mar 10, 2023
Author

OK, but why do you use only a BPE-based N-gram LM instead of a word-level N-gram with a lexicon.txt of the type word -> BPE tokens ?

0 replies

marcoyang1998 · 2023-03-10T10:56:25Z

marcoyang1998
Mar 10, 2023
Maintainer

We support word level N-gram as well, please have a look at here

icefall/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py

Line 101 in 9ddd811

def fast_beam_search_nbest_LG(

0 replies

iggygeek · 2023-03-10T11:32:42Z

iggygeek
Mar 10, 2023
Author

Well I tried that a word-level ARPA LM with a lexicon.txt of the type word -> BPE tokens using the BPE models used for training.
I validated using validate_lang_bpe.py, but I got empty hypotheses when decoding with fast_beam_search_nbest_LG ...

0 replies

marcoyang1998 · 2023-03-13T06:42:31Z

marcoyang1998
Mar 13, 2023
Maintainer

Could you please share the decoding command you are using?

BTW, using neural network LMs usually gives a larger performance improvement. If you want to have optimal performance, we recommend you use modified_beam_search_LODR.

0 replies

iggygeek · 2023-03-16T19:31:18Z

iggygeek
Mar 16, 2023
Author

I think the problem was that the script generating LG was using a default path for the ARPA LM, whereas I thought that it would looked for it in the lang_bpe_XXX directory...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use of external LM to improve transcriptions #940

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

use of external LM to improve transcriptions #940

iggygeek Mar 9, 2023

Replies: 6 comments

marcoyang1998 Mar 9, 2023 Maintainer

iggygeek Mar 10, 2023 Author

marcoyang1998 Mar 10, 2023 Maintainer

iggygeek Mar 10, 2023 Author

marcoyang1998 Mar 13, 2023 Maintainer

iggygeek Mar 16, 2023 Author

iggygeek
Mar 9, 2023

marcoyang1998
Mar 9, 2023
Maintainer

iggygeek
Mar 10, 2023
Author

marcoyang1998
Mar 10, 2023
Maintainer

iggygeek
Mar 10, 2023
Author

marcoyang1998
Mar 13, 2023
Maintainer

iggygeek
Mar 16, 2023
Author