TLG for k2 #1025

homink · 2023-04-28T05:22:27Z

homink
Apr 28, 2023

Hi, I would like to build TLG for K2. I found ctc_token_fst.py in WeNet using OpenFST below. Since this is not for K2, I think I can refer lexicon_to_fst but I don't know how to deal with the last line showing only single zero. Any suggestions?

import sys

print('0 1 ')
print('1 1 ')
print('2 2 ')
print('2 0 ')

with open(sys.argv[1], 'r', encoding='utf8') as fin:
node = 3
for entry in fin:
fields = entry.strip().split(' ')
phone = fields[0]
if phone == '' or phone == '':
continue
elif '#' in phone: # disambiguous phone
print('{} {} {} {}'.format(0, 0, '', phone))
else:
print('{} {} {} {}'.format(1, node, phone, phone))
print('{} {} {} {}'.format(node, node, phone, ''))
print('{} {} {} {}'.format(node, 2, '', ''))
node += 1
print('0')

Answered by csukuangfj

May 2, 2023

This is really helpful. I would like to share what I am doing. I like to connect wav2vec2 output with TLG in K2.

This wav2vec2 was fine-tuned with the vocab:

Is the model fine-tuned with CTC loss?

If not, then you cannot use TLG or T to decode its output.

If yes, then congratulations you can use either T or TLG to decode its output.

The following is an example to decode a Wav2Vec 2.0 model fine-tuned with CTC loss from torchaudio using a T graph:
k2-fsa/k2#1096 (comment)

We also have a C++ runtime in sherpa to support it. Please see
https://k2-fsa.github.io/sherpa/cpp/pretrained_models/offline_ctc/torchaudio.html

By the way, if you only want to T for decoding, you don't need #0, #1, …

View full answer

csukuangfj · 2023-04-28T06:35:12Z

csukuangfj
Apr 28, 2023
Maintainer

Please refer to
https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/compile_hlg.py
to build HLG, i.e., TLG, for k2.

The T built by
https://github.com/wenet-e2e/wenet/blob/main/tools/fst/ctc_token_fst.py
cannot be used in k2.

5 replies

homink Apr 28, 2023
Author

I guess I need to modify the following one. This would be the number of grapheme in the lexicon for T. Does this include such additional symbols as <eps>, #0, #1, etc?

icefall/egs/librispeech/ASR/local/compile_hlg.py

Line 74 in 61ec3a7

H = k2.ctc_topo(max_token_id)

csukuangfj Apr 29, 2023
Maintainer

In k2, there are no disambiguate symbols or epsilon transitions in T. 0 indicates blank in T.

homink May 1, 2023
Author

Thanks! Can you also advise me how to configure first_word_disambig_id and first_token_disambig_id which should not be in the lexicon for T?

icefall/egs/librispeech/ASR/local/compile_hlg.py

Line 87 in 61ec3a7

first_token_disambig_id = lexicon.token_table["#0"]

icefall/egs/librispeech/ASR/local/compile_hlg.py

Line 112 in 61ec3a7

LG.labels[LG.labels >= first_token_disambig_id] = 0

icefall/egs/librispeech/ASR/local/compile_hlg.py

Line 88 in 61ec3a7

first_word_disambig_id = lexicon.word_table["#0"]

icefall/egs/librispeech/ASR/local/compile_hlg.py

Line 118 in 61ec3a7

LG.aux_labels.values[LG.aux_labels.values >= first_word_disambig_id] = 0

csukuangfj May 1, 2023
Maintainer

Can you also advise me how to configure first_word_disambig_id and first_token_disambig_id which should not be in the lexicon for T?

disambig symbols for tokens and words are not present in lexicon.txt, but they exist in lexicon_disambig.txt.
The L in HLG, i.e., TLG refers to lexicon_disambig.txt, not lexicon.txt.

(I assume you know the difference between lexicon.txt and lexicon_disambig.txt. If not, please have a look at

icefall/egs/librispeech/ASR/local/prepare_lang_bpe.py

Line 31 in 61ec3a7

- lexicon_disambig.txt

)

first_token_disambig_id is the id of #0 from tokens.txt, which is larger than any other token ID. We also assume the IDs of #1, #2, #N are larger than that of #0 in tokens.txt.

first_word_disambig_id is the id of #0 from words.txt, , which is larger than any other word ID. We also assume the IDs of #1, #2, #N are larger than that of #0 in words.txt.

homink May 1, 2023
Author

Thanks. This is really helpful. I would like to share what I am doing. I like to connect wav2vec2 output with TLG in K2. I guess wav2vec2 decoding and K2 are different systems but I am curious if they can be married. It will be great if you can have a look at the followings and give me any advise/suggestion.

This wav2vec2 was fine-tuned with the vocab: {"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "E": 5, "T": 6, "O": 7, "A": 8, "I": 9, "N": 10, "H": 11, "S": 12, "R": 13, "L": 14, "D": 15, "U": 16, "Y": 17, "W": 18, "M": 19, "C": 20, "G": 21, "F": 22, "P": 23, "B": 24, "K": 25, "'": 26, "V": 27, "J": 28, "X": 29, "Q": 30, "Z": 31}

Vocab size 32 is exactly matched with the log softmax probability obtained from the wav2vec2 network output in the following. This is 1995-1836-0000 in librispeech-test-clean.

log_probs
tensor([[[-1.0549e-04, -2.6757e+01, -2.6907e+01, ..., -1.3062e+01,
-1.5125e+01, -1.5634e+01],
[-1.7282e+01, -1.7429e+01, -6.1391e-05, ..., -1.5670e+01,
-1.6259e+01, -1.6909e+01],
[-1.7721e+01, -1.7524e+01, -2.0154e+01, ..., -1.4126e+01,
-1.5876e+01, -1.4641e+01],
...,
[-1.3408e+01, -5.6704e-04, -1.1486e+01, ..., -3.0728e+01,
-3.0644e+01, -3.0883e+01],
[-1.2580e+01, -1.4665e+01, -1.2246e+01, ..., -1.6913e+01,
-9.7508e-05, -2.5803e+01],
[-2.5897e+01, -2.6067e+01, -1.1428e+01, ..., -1.5427e+01,
-1.7136e+01, -1.6742e+01]]], device='cuda:0')
log_probs.shape
torch.Size([1, 32, 446])

I converted the wav2vec2's vocab to the following token.txt.
<blk> 0, <s> 1, </s> 2, <unk> 3, SIL 4, E 5, T 6, A 7, O 8, N 9, I 10, H 11, S 12, R 13, D 14, L 15, U 16, M 17, W 18, C 19, F 20, G 21, Y 22, P 23, B 24, V 25, K 26, ' 27, X 28, J 29, Q 30, Z 31, #0 32, #1 33

For sanity check, I prepared example words.txt like
<eps> 0, ORANGE 1, SCHOOL 2, WINTER 3, #0 4, <s> 5, </s> 6
and I plotted this for L_disambig.svg and L.svg. I am not sure these look OK. My current lexicon has about 200K entries.

I also built TLG by modifying like T = k2.ctc_topo(max_token_id) in compile_hlg.py. I followed the decoding process in

icefall/egs/librispeech/ASR/conformer_ctc3/jit_pretrained.py

Line 324 in b25c234

best_path = one_best_decoding(

But the decoding result gives such garbage result like "IT IS BUSYBOX Q TRIO I MY WAY BOX Q". I tried T = k2.ctc_topo(max_token_id) with different value ofmax_token_id ranging from 28 to 35 but the result doesn't change although ground truth is "THE HON CHARLES SMITH MISS SARAH'S BROTHER WAS WALKING SWIFTLY UPTOWN FROM MISTER EASTERLY'S WALL STREET OFFICE AND HIS FACE WAS PALE."

csukuangfj · 2023-05-02T03:20:52Z

csukuangfj
May 2, 2023
Maintainer

This is really helpful. I would like to share what I am doing. I like to connect wav2vec2 output with TLG in K2.

This wav2vec2 was fine-tuned with the vocab:

Is the model fine-tuned with CTC loss?

If not, then you cannot use TLG or T to decode its output.

If yes, then congratulations you can use either T or TLG to decode its output.

The following is an example to decode a Wav2Vec 2.0 model fine-tuned with CTC loss from torchaudio using a T graph:
k2-fsa/k2#1096 (comment)

We also have a C++ runtime in sherpa to support it. Please see
https://k2-fsa.github.io/sherpa/cpp/pretrained_models/offline_ctc/torchaudio.html

By the way, if you only want to T for decoding, you don't need #0, #1, etc.

the vocab: {"": 0, "~~": 1, "~~": 2, "": 3, "|": 4, "E": 5, "T": 6, "O": 7, "A": 8, "I": 9, "N": 10, "H": 11, "S": 12, "R": 13, "L": 14, "D": 15, "U": 16, "Y": 17, "W": 18, "M": 19, "C": 20, "G": 21, "F": 22, "P": 23, "B": 24, "K": 25, "'": 26, "V": 27, "J": 28, "X": 29, "Q": 30, "Z": 31}

Assume <pad> 0 is equivalent to <blank> 0, then you can pass max_token_id=31 when invoking k2.ctc_topo().

I suggest that you first make it work to decode with a T graph, and then you can build a TLG graph for decoding.

0 replies

homink · 2023-05-02T05:19:55Z

homink
May 2, 2023
Author

Wow, this is PERFECT! Thank you so much!!

As you suggested, I tried to replace the log softmax probability in your example with mine using a T graph. But I encountered the following error. I have used a fine-tuned wav2vec2 model from HuggingFace and it could be different with one from Facebook. Could you please have a look at it? My log softmax probability is here just in case.

[F] /usr/share/miniconda/envs/k2/conda-bld/k2_1669428702383/work/k2/csrc/intersect_dense_pruned.cu:155:void k2::MultiGraphDenseIntersectPruned::Intersect(std::shared_ptrk2::DenseFsaVec&) Check failed: c_->IsCompatible(*b_fsas->Context())

[ Stack-Trace: ]
/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/k2/lib/libk2_log.so(k2::internal::GetStackTrace()+0x47) [0x7f600c2dc4d7]
/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/k2/lib/libk2context.so(k2::internal::Logger::~Logger()+0x5a) [0x7f600c9cdf9a]
/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/k2/lib/libk2context.so(k2::MultiGraphDenseIntersectPruned::Intersect(std::shared_ptrk2::DenseFsaVec&)+0x86) [0x7f600cb897c6]
/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/k2/lib/libk2context.so(k2::IntersectDensePruned(k2::Raggedk2::Arc&, k2::DenseFsaVec&, float, float, int, int, k2::Raggedk2::Arc, k2::Array1, k2::Array1)+0x243) [0x7f600cb592e3]
/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/_k2.cpython-39-x86_64-linux-gnu.so(+0x6eda2) [0x7f6012f14da2]
/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/_k2.cpython-39-x86_64-linux-gnu.so(+0x2505c) [0x7f6012ecb05c]
python3(+0x160055) [0x55e7bc110055]
python3(_PyObject_MakeTpCall+0x316) [0x55e7bc0f69c6]
python3(_PyEval_EvalFrameDefault+0x55c9) [0x55e7bc195689]
python3(+0x139470) [0x55e7bc0e9470]
python3(_PyFunction_Vectorcall+0x336) [0x55e7bc151746]
python3(_PyObject_Call+0x10b) [0x55e7bc11045b]
/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object, _object*)+0xb40) [0x7f607f8c9380]
python3(+0x160078) [0x55e7bc110078]
python3(_PyObject_MakeTpCall+0x316) [0x55e7bc0f69c6]
python3(_PyEval_EvalFrameDefault+0x534c) [0x55e7bc19540c]
python3(+0x139470) [0x55e7bc0e9470]
python3(_PyFunction_Vectorcall+0x336) [0x55e7bc151746]
python3(_PyEval_EvalFrameDefault+0x11de) [0x55e7bc19129e]
python3(+0x139470) [0x55e7bc0e9470]
python3(_PyFunction_Vectorcall+0x336) [0x55e7bc151746]
python3(_PyEval_EvalFrameDefault+0x11de) [0x55e7bc19129e]
python3(+0x139cb0) [0x55e7bc0e9cb0]
python3(_PyFunction_Vectorcall+0x336) [0x55e7bc151746]
python3(_PyObject_Call+0x10b) [0x55e7bc11045b]
python3(_PyEval_EvalFrameDefault+0x2d53) [0x55e7bc192e13]
python3(+0x139cb0) [0x55e7bc0e9cb0]
python3(_PyFunction_Vectorcall+0x336) [0x55e7bc151746]
python3(_PyEval_EvalFrameDefault+0x3b9) [0x55e7bc190479]
python3(+0x139470) [0x55e7bc0e9470]
python3(_PyEval_EvalCodeWithName+0x47) [0x55e7bc1cf6e7]
python3(PyEval_EvalCodeEx+0x39) [0x55e7bc1cf729]
python3(PyEval_EvalCode+0x1b) [0x55e7bc1cf74b]
python3(+0x252059) [0x55e7bc202059]
python3(+0x28cf34) [0x55e7bc23cf34]
python3(+0x114dea) [0x55e7bc0c4dea]
python3(PyRun_SimpleFileExFlags+0x1b2) [0x55e7bc2430a2]
python3(Py_RunMain+0x395) [0x55e7bc243775]
python3(Py_BytesMain+0x39) [0x55e7bc2438c9]
/usr//lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f60b52ed083]
python3(+0x20c861) [0x55e7bc1bc861]

Traceback (most recent call last):
File "/home/users/hkwon/Projects/Bluefish/wfst2/en/scripts/test_T.py", line 96, in
main()
File "/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/users/hkwon/Projects/Bluefish/wfst2/en/scripts/test_T.py", line 74, in main
lattice = k2.get_lattice(
File "/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/k2/decode.py", line 86, in get_lattice
lattice = intersect_dense_pruned(
File "/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/k2/autograd.py", line 718, in intersect_dense_pruned
_IntersectDensePrunedFunction.apply(a_fsas, b_fsas, out_fsa, search_beam,
File "/nobackup/hkwon2/3rdParty/Miniconda3-py39_4.12.0-Linux-x86_64/lib/python3.9/site-packages/k2/autograd.py", line 415, in forward
ragged_arc, arc_map_a, arc_map_b = _k2.intersect_dense_pruned(
RuntimeError:
Some bad things happened. Please read the above error messages and stack
trace. If you are using Python, the following command may be helpful:

  gdb --args python /path/to/your/code.py

(You can use `gdb` to debug the code. Please consider compiling
a debug version of k2.).

If you are unable to fix it, please open an issue at:

  https://github.com/k2-fsa/k2/issues/new

8 replies

csukuangfj May 4, 2023
Maintainer

@homink

please refer to our doc at

If you still have questions after reading the above doc, please let us know.

homink May 5, 2023
Author

Thanks @csukuangfj . I went through these doc but it doesn't describe the lattice format.

I guess K2 performs intersection using a TLG graph (in my case) and log_prob in the form of dense_fsa_vec. Can I assume that 1st & 2nd columns indicate states in a TLG graph and 3rd & 4th columns indicate states in the log_prob?

k2.Fsa: 0 1 0 0 [ ] -0.000103468
0 54 5 5 [ 52242 ] -24.1321
0 2 5 5 [ ] -13.4007
0 3 5 5 [ ] -16.0869
0 54 6 6 [ 174185 ] -19.931
0 54 6 6 [ 174185 ] -25.2158
0 4 6 6 [ ] -11.7282
0 5 6 6 [ ] -14.4144
0 6 6 6 [ 174185 ] -19.3622
0 6 6 6 [ 174185 ] -24.647

csukuangfj May 5, 2023
Maintainer

I went through these doc but it doesn't describe the lattice format.

OpenFst has the following convention:

src_state dst_state ilabel olabel score

Each arc in OpenFst has three attributes:

ilabel
olabel
score (Note it is negative log_prob)

Each arc in k2 can have an unlimited number of attributes. And an attribute value can be of arbitrary type, e.g., int32_t, torch.Tensor, k2.RaggedTensor, a python list, etc.

The text format of an FSA from k2 has the following convention:

src_state dst_state label [aux_label] score

Pay attention to the score here. We use positive log_prob in k2, while OpenFST use negative log_prob.

label and aux_label are just two kinds of attributes of an arc in k2.

aux_label is a list of integers (well, it is actually a 1-D torch.int32 tensor). The size of the list can be 0, 1, or more than 1. If its size is 1 and if you remove the brackets [], then you get the same format of OpenFst (except that we use positive log_prob).

csukuangfj May 5, 2023
Maintainer

I guess 5th column (i.e. []) would be word symbol ID and 6th column

For this TLG graph, aux_label here represents word IDs and label represents token IDs.

homink May 5, 2023
Author

@csukuangfj Thank you so much!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLG for k2 #1025

{{title}}

Replies: 3 comments 13 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

TLG for k2 #1025

homink Apr 28, 2023

Replies: 3 comments · 13 replies

csukuangfj Apr 28, 2023 Maintainer

homink Apr 28, 2023 Author

csukuangfj Apr 29, 2023 Maintainer

homink May 1, 2023 Author

csukuangfj May 1, 2023 Maintainer

homink May 1, 2023 Author

csukuangfj May 2, 2023 Maintainer

homink May 2, 2023 Author

csukuangfj May 4, 2023 Maintainer

homink May 5, 2023 Author

csukuangfj May 5, 2023 Maintainer

csukuangfj May 5, 2023 Maintainer

homink May 5, 2023 Author

homink
Apr 28, 2023

Replies: 3 comments 13 replies

csukuangfj
Apr 28, 2023
Maintainer

homink Apr 28, 2023
Author

csukuangfj Apr 29, 2023
Maintainer

homink May 1, 2023
Author

csukuangfj May 1, 2023
Maintainer

homink May 1, 2023
Author

csukuangfj
May 2, 2023
Maintainer

homink
May 2, 2023
Author

csukuangfj May 4, 2023
Maintainer

homink May 5, 2023
Author

csukuangfj May 5, 2023
Maintainer

csukuangfj May 5, 2023
Maintainer

homink May 5, 2023
Author