You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now when Sushi searches for audio substream in the destination audio, it only considers the best match, even though OpenCV calculates the diff value for every possible candidate. There is no reason why we can't use this info for more accurate postprocessing.
The idea is to remember multiple best candidates so that during postprocessing we could check if replacing the selected shift with some of the other candidates would make the value more similar to its surroundings.
We split the entire diff array into 50 ranges, find the best match in all of them and then select 10 best matches from those 50. The best match in the entire array is stored in candidates[0].
While this does find correct shift for some of the test, it still fails on many problematic cases with a lot of silence in the audio stream. Better ways of improving search accuracy might be preferable.
The text was updated successfully, but these errors were encountered:
For problem with silence segment, I think most of cases they are typesetting lines. If I'm not wrong typesetting line are grouped as one search group.
To handle this silence segment, how about the search group shift & snap to nearest end keyframe? Since most typesetting lines end at keyframes. Or to be more safe calculate the interval of frames between the end of search group and the nearest end keyframe, so it can use it for shifting
Basing initial search on keyframes is not feasible for two main reasons:
We might not have keyframes at all
Scene length might change (e.g. different IVTC, redrawing)
Plus it'd require significant changes in our search algo and it's not clear how to pick the appropriate scene (say if you have two candidate scenes each 50 frames long near each other).
I think some audio preprocessing or merging lines into even larger search groups might yield much better results for typesetting, although right now I don't know of any specific way to do so (other than maybe merging all overlapping/adjacent lines ).
Since silence segment mostly indicate slow motion scene so I think it can use nearest keyframe as a last resort when audio search fail. But I think this doesnt apply in some rare cases
Linking search groups together is not bad idea. Every search group hold information about its before & after search group so if it fail in audio search it can refer its before search group to apply the shifts
Right now when Sushi searches for audio substream in the destination audio, it only considers the best match, even though OpenCV calculates the diff value for every possible candidate. There is no reason why we can't use this info for more accurate postprocessing.
The idea is to remember multiple best candidates so that during postprocessing we could check if replacing the selected shift with some of the other candidates would make the value more similar to its surroundings.
Working implementation below.
We split the entire diff array into 50 ranges, find the best match in all of them and then select 10 best matches from those 50. The best match in the entire array is stored in
candidates[0]
.While this does find correct shift for some of the test, it still fails on many problematic cases with a lot of silence in the audio stream. Better ways of improving search accuracy might be preferable.
The text was updated successfully, but these errors were encountered: