predict the accurate end timestamp of a word #43

climber418 · 2022-11-25T07:37:13Z

climber418
Nov 25, 2022

If there is a slightly long silence gap between two words, how can statle-ts get the accurate end timestamp of a word?

sockthem · 2022-11-25T11:48:40Z

sockthem
Nov 25, 2022

facing same issue, predicted timestamps are not reliable.

0 replies

tohe91 · 2022-11-25T13:19:17Z

tohe91
Nov 25, 2022

I have the same issue with longer silences, I tried using vad and slicing the audio file to then transcribe the slices and use the vad timestamps, but the transcription quality got very bad even with using prompt. I'm facing mainly too problems, first being the described issue with longer silences and second that sometimes words from a duration of 10-15 seconds all get the same timestamp.

0 replies

jianfch · 2022-11-25T17:10:07Z

jianfch
Nov 25, 2022
Maintainer

Have you tried the default settings in stable-ts?
If you're still getting this issue, you can change settings to make it more aggressive.

https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L124-L137

from stable_whisper import load_model
model = load_model('base')
results = model.transcribe('audio.mp3',  remove_background=True, silence_threshold=1.0)

If the gaps are not completely silent you can also try to increase the lower_threshold.
You can also try it with different size models.
Also try to avoid using beam search (beam_size) because it makes the gap suppression less effective.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict the accurate end timestamp of a word #43

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

predict the accurate end timestamp of a word #43

climber418 Nov 25, 2022

Replies: 3 comments

sockthem Nov 25, 2022

tohe91 Nov 25, 2022

jianfch Nov 25, 2022 Maintainer

climber418
Nov 25, 2022

sockthem
Nov 25, 2022

tohe91
Nov 25, 2022

jianfch
Nov 25, 2022
Maintainer