-
-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avfilter/tonemap_opencl: implement tradeoff mode as 3dlut #535
base: jellyfin
Are you sure you want to change the base?
Conversation
This reworks the current tradeoff mode, which used to be a single 1D LUT for linearization, which is not fast enough on slow GPUs. Instead, this makes the entire tonemap process a 3D LUT lookup. This implementation first generates a 65x65x65 LUT on the GPU, which has a similar compute cost to a single 400p frame. Then, it uses tetrahedral interpolation to apply the LUT to the actual frame. The interpolation quality is quite decent, and the interpolation errors are very hard to notice unless in extreme conditions. Signed-off-by: gnattu <[email protected]>
Quite decent perf uplift - 2160p at ~90fps. |
This is a bit input specific. On some inputs I got close to this but on some I only got like ~80. I think it might related to cache hitrate or something. |
Yes.
We can push it even higher if libmali ocl runtime can support AFBC 16x16 modifier in the future. In addition, the drm_prime<->opencl interop also causes some overhead. |
The difference is quite visible though |
This is the interpolation error that is expected to exist in some scene |
This reworks the current tradeoff mode, which used to be a single 1D LUT for linearization, which is not fast enough on slow GPUs. Instead, this makes the entire tonemap process a 3D LUT lookup. This implementation first generates a 65x65x65 LUT on the GPU, which has a similar compute cost to a single 400p frame. Then, it uses tetrahedral interpolation to apply the LUT to the actual frame. The interpolation quality is quite decent, and the interpolation errors are very hard to notice unless in extreme conditions.
Actual benchmark numbers is not yet ready as I'm not yet compiled this for RK3588. I will do the benchmarking once the CI has finished. I hope this implementation would be fast enough to make 4K60 tonemap possible on its slow GPU.The performance exceeded my expectations:
On RK3588 it achieves 4K 78fps tonemapping, which means now 3x4K24 streams is possible.
This implementation is limited to OpenCL currently because this is the API that most slow GPUs are using.
Changes
Issues