-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Adding Transcription/Subtitle Viewing Support to the Web Player (VTT) #2918
base: master
Are you sure you want to change the base?
Conversation
@@ -116,6 +116,7 @@ | |||
this.router.post('/items/:id/chapters', LibraryItemController.middleware.bind(this), LibraryItemController.updateMediaChapters.bind(this)) | |||
this.router.get('/items/:id/ffprobe/:fileid', LibraryItemController.middleware.bind(this), LibraryItemController.getFFprobeData.bind(this)) | |||
this.router.get('/items/:id/file/:fileid', LibraryItemController.middleware.bind(this), LibraryItemController.getLibraryFile.bind(this)) | |||
this.router.get('/items/:id/file/:fileid/transcript', LibraryItemController.middleware.bind(this), LibraryItemController.getTranscriptionFile.bind(this)) |
Check failure
Code scanning / CodeQL
Missing rate limiting High
a file system access
The placement irks me for some reason. I think that this feature demands something like "Now Playing" screen. |
Great job on the project! For UX improvement, please consider looking into word highlighting in Snipd, as shown in this video: https://www.youtube.com/watch?v=jBi-OId37Uw |
Snipd uses word level timestamps, while such subs are easy to generate the only sane format is ssa/ass (srt can blow into megabytes which is insane) afaik. Which is not natively supported by browsers. |
Look at the WebVTT, which supports something similar to the "Karaoke Style" using :past and :future pseudo-classes. However, VTT files need to be adapted for this as well. I think it's not common to get a VTT file with this information. SSA/ASS and SRT support, I was checking what the best approach is. I was considering parsing to VTT to keep the implementation consistent with how we show the transcriptions, I'm not sure if this is the best way yet |
@mfcar I've used https://github.com/jianfch/stable-ts to generate ass/ssa karaoke style captions with custom style for my podcasts/books. I don't remember if vtt is one of the options. |
In the past, I have used stable-ts to create VTT files. I generated word-level timestamps with Whisper’s base.en model. |
@mfcar Is there any progress made on this? |
I have begun work on adding transcription support to the Web Player.
I've used Whisper to generate transcriptions for some audiobooks and podcasts. Many tools based on Whisper support exports in VTT and SRT formats.
For this pull request, I'm only supporting VTT as it is natively supported by browsers. Support for SRT can be added in a future pull request.
How does it work?
A new endpoint,
api/items/:id/file/:fileid/transcript
, has been created on the backend. This endpoint attempts to return a transcription for each audio track. For instance, if there's an audio file namedadventuresherlockholmes_01_doyle_64kb.mp3
, this endpoint will attempt to return the fileadventuresherlockholmes_01_doyle_64kb.vtt
.On the frontend, when an audio file is set as the source property of the
<audio>
HTML tag, a<track>
is created and linked to that<audio>
. Thesource
property for the<track>
HTML tag is populated with the link to the aforementioned endpoint.What does this PR support?
Demo
Screen.Recording.2024-05-04.at.20.12.35.mov
What is missing for the scope of this PR
Known issues
MediaPlayerContainer.vue
component not reloading theTranscriptionUi
component.Screen.Recording.2024-05-04.at.14.14.37.mov
Related