Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider multiple candidates when searching for audio substream #6

Open
tp7 opened this issue Nov 2, 2014 · 3 comments
Open

Consider multiple candidates when searching for audio substream #6

tp7 opened this issue Nov 2, 2014 · 3 comments

Comments

@tp7
Copy link
Owner

tp7 commented Nov 2, 2014

Right now when Sushi searches for audio substream in the destination audio, it only considers the best match, even though OpenCV calculates the diff value for every possible candidate. There is no reason why we can't use this info for more accurate postprocessing.

The idea is to remember multiple best candidates so that during postprocessing we could check if replacing the selected shift with some of the other candidates would make the value more similar to its surroundings.

Working implementation below.

splits = np.array_split(result[0], 50)
len_so_far = 0
candidates = []
for split in splits:
    min_index = np.argmin(split)
    candidates.append((min_index + len_so_far, split[min_index]))
    len_so_far += len(split)
candidates.sort(key=lambda x: x[1])
candidates = candidates[:10]

We split the entire diff array into 50 ranges, find the best match in all of them and then select 10 best matches from those 50. The best match in the entire array is stored in candidates[0].

While this does find correct shift for some of the test, it still fails on many problematic cases with a lot of silence in the audio stream. Better ways of improving search accuracy might be preferable.

@shinchiro
Copy link
Contributor

For problem with silence segment, I think most of cases they are typesetting lines. If I'm not wrong typesetting line are grouped as one search group.

To handle this silence segment, how about the search group shift & snap to nearest end keyframe? Since most typesetting lines end at keyframes. Or to be more safe calculate the interval of frames between the end of search group and the nearest end keyframe, so it can use it for shifting

@tp7
Copy link
Owner Author

tp7 commented Apr 6, 2015

Sushi is already doing something like that in keyframe correction section.

Basing initial search on keyframes is not feasible for two main reasons:

  1. We might not have keyframes at all
  2. Scene length might change (e.g. different IVTC, redrawing)

Plus it'd require significant changes in our search algo and it's not clear how to pick the appropriate scene (say if you have two candidate scenes each 50 frames long near each other).

I think some audio preprocessing or merging lines into even larger search groups might yield much better results for typesetting, although right now I don't know of any specific way to do so (other than maybe merging all overlapping/adjacent lines ).

@shinchiro
Copy link
Contributor

Since silence segment mostly indicate slow motion scene so I think it can use nearest keyframe as a last resort when audio search fail. But I think this doesnt apply in some rare cases

Linking search groups together is not bad idea. Every search group hold information about its before & after search group so if it fail in audio search it can refer its before search group to apply the shifts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants