Replies: 1 comment 2 replies
-
In short, no. The model was trained to predict the start and end of a segment. It also tends to predict the start of the next segment as the end of the current segment. Thus, treating the word token predictions as the end timestamp of the segment, in theory, gives the start timestamp of next word token. This script puts that into practice and uses the most probable predicted timestamp tokens of the predicted word tokens as timestamps for those word tokens. But in the current state of the model, it simply cannot predict word timestamps that can be meaningful start and end timestamps of each word. This is why the script gives one timestamp for a word and interprets it as the end of the current word / start of the following word if there is one. |
Beta Was this translation helpful? Give feedback.
-
Hello! Is that feasible?
Beta Was this translation helpful? Give feedback.
All reactions