Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output #50

Open
chenting0324 opened this issue Jan 22, 2018 · 3 comments
Open

Output #50

chenting0324 opened this issue Jan 22, 2018 · 3 comments

Comments

@chenting0324
Copy link

Hello, I want to know what is the model output? It's output is phoneme or character or word?
And it is an end-to-end model or it is an end-to-end training of acoustic model?

@AMairesse
Copy link
Collaborator

Hi,
The model output is character. It's an end-to-end model, currently having only an acoustic model so the model is end-to-end and the training of the acoustic model is also done end-to-end.

@chenting0324
Copy link
Author

Thanks! I also want to know how can I get the vector of the output character? In which function can I find the character output?prediction or decode or logits? In fact, what I want to get is the vector of the character.

@AMairesse
Copy link
Collaborator

You should look at the _build_base_rnn method :

  • logits : each char probability for each timestep of the input, for each item of the batch
  • decoded : different paths found by the CTC beam search decoder, with _log_prob being the cumulative probability of each path
  • prediction : it's decoded[0] so it's the path with the higher probability

So if you are looking for a list of probabilities for each character of the output it's logits.
Ex. : logits[0] will contain a vector of probabilities for each label for the first chunk of audio, logits[1] will be another vector of probabilities for each label of the second chunk of audio, ...
This can be challenging to use because there's a lot of vectors. One audio file will have for example 2500 chunks of audio for only 200 characters. So you will have a lot of characters repetition in logits. The CTC algorithm take care of it to build the best paths which give the best cumulated probability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants