Replies: 2 comments
-
You can align the transcript with the audio then compare the alignment results with the ground truth to find differences in timings. Then just edit the audio to offset those differences. You can edit the audio directly in code by manipulating the waveform as a NumPy array or use a high level lib such as pydub. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, this looks like an amazing piece of work.
I wanted to ask for some advice / help on what if I wanted to align the audio itself? Meaning I already have a transcription (which is the ground truth) with segment and word level timestamps, but the original audio is a bit off and needs to speed up, slow down in some places to match back the transcript. The audio may also have a few words different or missing.
Do you know what the term for this is and any libraries / projects you can recommend? Thank you for any help!
Beta Was this translation helpful? Give feedback.
All reactions