Replies: 3 comments
-
>>> reuben |
Beta Was this translation helpful? Give feedback.
-
>>> monakons |
Beta Was this translation helpful? Give feedback.
-
>>> reuben |
Beta Was this translation helpful? Give feedback.
-
>>> monakons
[May 20, 2018, 8:32pm]
I know that this also a tensorfow question, but tensorflow has poor
documentation on this.
This is what I understand from some external documentation.
In general this layer is capable to (1) Implement a softmax layer to
convert the output to a probability distribution over symbols (2)
eliminate repeated characters between the blank symbols that comes from
the acoustic model (3) Implement beam search on a prefix (character)
tree and extract the most probable sequence (4) Optionally this output
can be fed into the language model that is responsible to 'correct' this
output in word level based on known word sequences.
If the above are correct my questions are :
slash (1 slash ) Where is the prefix tree that tf.nn.ctc_beam_search_decoder
includes ? Or it does not ? slash
(2) Is the Language Model indeed used only in word level? Is it possible
to be used in character level? slash
(3) I removed the tf.nn.ctc_beam_search_decoder from the protobuf file
and implement it later on the pipeline, however, the results using
tf.nn.ctc_beam_search_decoder inside and outside the protobuf are
different (why is that happening ?? ) slash
(4) Is the Language model necessary while training ? Isn't the purpose
to run the LM on top of the Acoustic model as a standalone module ??
Thanks in advance !
[This is an archived TTS discussion thread from discourse.mozilla.org/t/how-exactly-the-decoder-and-especially-tf-nn-ctc-beam-search-decoder-works]
Beta Was this translation helpful? Give feedback.
All reactions