Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avfilter/tonemap_opencl: implement tradeoff mode as 3dlut #535

Open
wants to merge 1 commit into
base: jellyfin
Choose a base branch
from

Conversation

gnattu
Copy link
Member

@gnattu gnattu commented Feb 6, 2025

This reworks the current tradeoff mode, which used to be a single 1D LUT for linearization, which is not fast enough on slow GPUs. Instead, this makes the entire tonemap process a 3D LUT lookup. This implementation first generates a 65x65x65 LUT on the GPU, which has a similar compute cost to a single 400p frame. Then, it uses tetrahedral interpolation to apply the LUT to the actual frame. The interpolation quality is quite decent, and the interpolation errors are very hard to notice unless in extreme conditions.

Actual benchmark numbers is not yet ready as I'm not yet compiled this for RK3588. I will do the benchmarking once the CI has finished. I hope this implementation would be fast enough to make 4K60 tonemap possible on its slow GPU.

The performance exceeded my expectations:

frame= 1806 fps= 78 q=-0.0 Lsize=N/A time=00:01:11.40 bitrate=N/A speed=3.08x    

On RK3588 it achieves 4K 78fps tonemapping, which means now 3x4K24 streams is possible.

This implementation is limited to OpenCL currently because this is the API that most slow GPUs are using.

Changes

Issues

This reworks the current tradeoff mode, which used to be a single
1D LUT for linearization, which is not fast enough on slow GPUs.
Instead, this makes the entire tonemap process a 3D LUT lookup.
This implementation first generates a 65x65x65 LUT on the GPU,
which has a similar compute cost to a single 400p frame. Then, it
uses tetrahedral interpolation to apply the LUT to the actual frame.
The interpolation quality is quite decent, and the interpolation
errors are very hard to notice unless in extreme conditions.

Signed-off-by: gnattu <[email protected]>
@gnattu gnattu requested a review from a team February 6, 2025 07:43
@nyanmisaka
Copy link
Member

  Stream #0:0(eng): Video: vp9 (Profile 2), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x2160, SAR 1:1 DAR 16:9, 59.94 fps, 59.94 tbr, 1k tbn (default)
      Metadata:
        DURATION        : 00:04:55.644000000
      Side data:
        Content Light Level Metadata, MaxCLL=1100, MaxFALL=180
        Mastering Display Metadata, has_primaries:1 has_luminance:1 r(0.6780,0.3220) g(0.2450,0.7030) b(0.1380 0.0520) wp(0.3127, 0.3290) min_luminance=0.000000, max_luminance=1000.000000
  Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
      Metadata:
        DURATION        : 00:04:55.661000000
rga_api version 1.10.1_[3]
Stream mapping:
  Stream #0:0 -> #0:0 (vp9 (vp9_rkmpp) -> hevc (hevc_rkmpp))
Press [q] to stop, [?] for help
arm_release_ver: g24p0-00eac0, rk_so_ver: 6
Output #0, mp4, to '/tmp/1.mp4':
  Metadata:
    encoder         : Lavf61.1.100
  Stream #0:0(eng): Video: hevc (Main) (hev1 / 0x31766568), drm_prime(tv, bt709, progressive), 3840x2160 [SAR 1:1 DAR 16:9], q=2-31, 2000 kb/s, 59.94 fps, 19001 tbn (default)
      Metadata:
        DURATION        : 00:04:55.644000000
        encoder         : Lavc61.3.100 hevc_rkmpp
      Side data:
        Content Light Level Metadata, MaxCLL=1100, MaxFALL=180
        Mastering Display Metadata, has_primaries:1 has_luminance:1 r(0.6780,0.3220) g(0.2450,0.7030) b(0.1380 0.0520) wp(0.3127, 0.3290) min_luminance=0.000000, max_luminance=1000.000000
frame=12105 fps= 87 q=-0.0 size= 1004288KiB time=00:03:21.93 bitrate=40741.5kbits/s speed=1.46x

Quite decent perf uplift - 2160p at ~90fps.

@gnattu
Copy link
Member Author

gnattu commented Feb 6, 2025

Quite decent perf uplift - 2160p at ~90fps.

This is a bit input specific. On some inputs I got close to this but on some I only got like ~80. I think it might related to cache hitrate or something.

@nyanmisaka
Copy link
Member

Quite decent perf uplift - 2160p at ~90fps.

This is a bit input specific. On some inputs I got close to this but on some I only got like ~80. I think it might related to cache hitrate or something.

Yes.

  Stream #0:0: Video: av1 (libdav1d) (Main), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x2160, SAR 1:1 DAR 16:9, 59.94 fps, 59.94 tbr, 1k tbn (default)
      Metadata:
        HANDLER_NAME    : ISO Media file produced by Google Inc.
        VENDOR_ID       : [0][0][0][0]
        DURATION        : 00:05:37.704000000
      Side data:
        Content Light Level Metadata, MaxCLL=1100, MaxFALL=180
        Mastering Display Metadata, has_primaries:1 has_luminance:1 r(0.7080,0.2920) g(0.1700,0.7970) b(0.1310 0.0460) wp(0.3127, 0.3127) min_luminance=0.000000, max_luminance=1000.000000
  Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
      Metadata:
        DURATION        : 00:05:37.721000000
rga_api version 1.10.1_[3]
Stream mapping:
  Stream #0:0 -> #0:0 (av1 (av1_rkmpp) -> hevc (hevc_rkmpp))
Press [q] to stop, [?] for help
arm_release_ver: g24p0-00eac0, rk_so_ver: 6
Output #0, mp4, to '/tmp/1.mp4':
  Metadata:
    COMPATIBLE_BRANDS: iso6av01mp41
    MAJOR_BRAND     : dash
    MINOR_VERSION   : 0
    encoder         : Lavf61.1.100
  Stream #0:0: Video: hevc (Main) (hev1 / 0x31766568), drm_prime(tv, bt709, progressive), 3840x2160 [SAR 1:1 DAR 16:9], q=2-31, 2000 kb/s, 59.94 fps, 19001 tbn (default)
      Metadata:
        HANDLER_NAME    : ISO Media file produced by Google Inc.
        VENDOR_ID       : [0][0][0][0]
        DURATION        : 00:05:37.704000000
        encoder         : Lavc61.3.100 hevc_rkmpp
      Side data:
        Content Light Level Metadata, MaxCLL=1100, MaxFALL=180
        Mastering Display Metadata, has_primaries:1 has_luminance:1 r(0.7080,0.2920) g(0.1700,0.7970) b(0.1310 0.0460) wp(0.3127, 0.3127) min_luminance=0.000000, max_luminance=1000.000000
frame= 9894 fps= 75 q=-0.0 size=  935424KiB time=00:02:45.04 bitrate=46428.8kbits/s speed=1.26x

We can push it even higher if libmali ocl runtime can support AFBC 16x16 modifier in the future. In addition, the drm_prime<->opencl interop also causes some overhead.

@nyanmisaka
Copy link
Member

But it does come with a price. Even when outputting 10bit there is a banding artifact in this scene. But it's good enough for most video clips. I may test it on a weak HD630 iGPU later.

up (no 3dlut) - down (3dlut):
band-artifacts

@Shadowghost
Copy link
Contributor

The difference is quite visible though

@gnattu
Copy link
Member Author

gnattu commented Feb 6, 2025

Even when outputting 10bit there is a banding artifact in this scene.

This is the interpolation error that is expected to exist in some scene and can be solved by buying a more powerful GPU so that you don't have to use 3D LUT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants