predict the accurate end timestamp of a word #43
Replies: 3 comments
-
facing same issue, predicted timestamps are not reliable. |
Beta Was this translation helpful? Give feedback.
-
I have the same issue with longer silences, I tried using vad and slicing the audio file to then transcribe the slices and use the vad timestamps, but the transcription quality got very bad even with using prompt. I'm facing mainly too problems, first being the described issue with longer silences and second that sometimes words from a duration of 10-15 seconds all get the same timestamp. |
Beta Was this translation helpful? Give feedback.
-
Have you tried the default settings in stable-ts? https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L124-L137 from stable_whisper import load_model
model = load_model('base')
results = model.transcribe('audio.mp3', remove_background=True, silence_threshold=1.0) If the gaps are not completely silent you can also try to increase the |
Beta Was this translation helpful? Give feedback.
-
If there is a slightly long silence gap between two words, how can statle-ts get the accurate end timestamp of a word?
Beta Was this translation helpful? Give feedback.
All reactions