Inverse Problem: Align the Original Audio (But Not the Transcript) #345

jimydavis · 2024-04-02T08:01:08Z

jimydavis
Apr 2, 2024

Hi, this looks like an amazing piece of work.

I wanted to ask for some advice / help on what if I wanted to align the audio itself? Meaning I already have a transcription (which is the ground truth) with segment and word level timestamps, but the original audio is a bit off and needs to speed up, slow down in some places to match back the transcript. The audio may also have a few words different or missing.

Do you know what the term for this is and any libraries / projects you can recommend? Thank you for any help!

jianfch · 2024-04-02T18:53:45Z

jianfch
Apr 2, 2024
Maintainer

You can align the transcript with the audio then compare the alignment results with the ground truth to find differences in timings. Then just edit the audio to offset those differences. You can edit the audio directly in code by manipulating the waveform as a NumPy array or use a high level lib such as pydub.

0 replies

jimydavis · 2024-04-03T01:18:06Z

jimydavis
Apr 3, 2024
Author

Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inverse Problem: Align the Original Audio (But Not the Transcript) #345

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Inverse Problem: Align the Original Audio (But Not the Transcript) #345

jimydavis Apr 2, 2024

Replies: 2 comments

jianfch Apr 2, 2024 Maintainer

jimydavis Apr 3, 2024 Author

jimydavis
Apr 2, 2024

jianfch
Apr 2, 2024
Maintainer

jimydavis
Apr 3, 2024
Author