-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support GFX12 #423
Merged
Merged
Support GFX12 #423
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kiritigowda
requested review from
rrawther and
AryanSalmanpour
as code owners
September 16, 2024 20:53
jeffqjiangNew
approved these changes
Sep 16, 2024
AryanSalmanpour
approved these changes
Sep 17, 2024
vamovsik
pushed a commit
that referenced
this pull request
Nov 19, 2024
* * rocDecode/AV1: Performance improvement: prevent synchronous decode submissions. (#406) - Set the display delay to DECODE_BUF_POOL_EXTENSION (2) to avoid immediate output/display of a decoded frame. * CTest Updates - Fix duplicates (#408) * Test - Fix CTest * CMakeLists - Clang Set * Ctest - support * Readme - Fix and updates * Readme - minor fix * Readme - MS template * Install - Minor instructiion fix * Clang - Added as default CXX compiler * Update CHANGELOG.md Remove unreleased * License - Remove license from dev & test packages (#410) * Added real decode speed report to set it apart from the current output speed report in sample apps (#409) * * rocDecode: Added real decode speed report. - The current decode speed report is actually output/display speed report. - Due to AV1's extensive use of alternate reference frames that are not display, AV1 decoded frame count and output/displayed frame count can be quite different, making the current speed report not an accurate decode speed measurement. - We now added the actual decode speed report, besides the existing speed report, now called output/display FPS. * * rocDecode: Added real decode speed report. - The current decode speed report is actually output/display speed report. - Due to AV1's extensive use of alternate reference frames that are not display, AV1 decoded frame count and output/displayed frame count can be quite different, making the current speed report not an accurate decode speed measurement. - We now added the actual decode speed report, besides the existing speed report, now called output/display FPS. * * rocDecode/Sample script: Added missing changes for sample_mode 0 case. * * rocDecode/Sample script: Sorted the files to enable easy post-procssing of the performance data. (#411) * * rocDecode/Perf: Added resolution and bit rate info into csv output, to speed up performance data post-processing. (#412) * update Doxyfile to strip Read the Docs dir (#418) * Simplified MD5 string compare code and fixed potential incorrect conversion of MD5 string to integers. (#414) * * rocDecode: Fixed potential incorrect conversion of MD5 string to integers. * * rocDecode: Changed a string name. * * rocDecode: Simplified the MD5 string compare code. * * rocDecode: Added minor changed based on review comments. * * rocDecode: Minor changes. * * rocDecode/Sample script: Added units to Bit rate field in csv output. * Support GFX12 (#423) * Added a note pointing users to the official documentation and removed the local build information. This info is in the contribution documentation. (#417) * Modify the videoDecodePerf app to take an argument for memory type (#424) * * rocDecode/Perf: Improved the accuracy of decode performance measurement for the performance sample. We need to wait for the decode completion of the last picture before sampling the end time. (#425) * change clang++ path as suggested by packaging team (#427) * Find rocDecode - Support added (#428) * Find rocDecode - Support added * Find rocDecode - Updates * Find rocDecode - Version fix * Find rocDecode - Version Var * Minor cleanup * Test - Find package updates * CTest - Upgrades * CTest - Enhancements --------- Co-authored-by: Aryan Salmanpour <[email protected]> * Package - dependencies updated (#416) * Package - dependencies updated * Changelog - new format added * Setup - OS specific updates * CMakeList - Cleanup * Version Updates Fix * Add new API rocDecParserMarkFrameForReuse() for Parser (#430) * added new API to release video frame for decoder and parser * removed ReleseFrame() from low level parser classes * Removed rocDecReleaseFrame() from decoder and added in parser * address review comments * revert un-necessary files * minor fix * remove unused function * minor formatting fix * Fix libva requirements for rocdecode (#435) * Fix libva requirements for rocdecode mesa-amdgpu-va-drivers is built with libva 2.16 (VA-API 1.16), so it provides the entry point "__vaDriverInit_1_16". For rocdecode to use mesa, it also needs to make sure it has a high enough requirement on libva to be compatible with this function. Strictly speaking, it doesn't matter what libva is used as long as it's 2.16 or newer, since libva is backwards compatible. An OR conditions is used to favour distro packages when possible to avoid causing issues with existing libraries built against the distro version. For libva dev packages, we can just use libva-amdgpu-dev/el directly. Signed-off-by: Jeremy Newton <[email protected]> * Update to use libva-amdgpu To reflect the package change, update the README, rocDecode-setup.py, and the CHANEGLOG. Putting the minimum VA-API version in the README isn't required as the user is expected to just install the latest libva-amdgpu to match the mesa VA-API version. --------- Signed-off-by: Jeremy Newton <[email protected]> * Find the minimum supported libva version 1.16 when building rocdecode (#437) * Find the minimum supported libva version 1.16 when building rocdecode * Update the changelog * Update the Error message if libva-amdgpu-dev/libva-amdgpu-devel not found * Add missing comma * Allow overriding CMAKE_CXX_COMPILER (#436) Using set as-is doesn't allow the user to set their own rocm path. This is useful for community packagers or debugging. Signed-off-by: Jeremy Newton <[email protected]> * * rocDecode/AV1: Fixed an errror in get Q index function during code inspection. (#438) * Revert "Allow overriding CMAKE_CXX_COMPILER (#436)" (#440) This reverts commit 07ecb5e. * updated the changelog for 6.3 (#439) * VideoDecode samples - Set the default display_delay to 1 (#441) * Setup - Fix status return (#444) The code is full of ERROR_CHECK(os.system("some shell commands")). Unfortunately the return value from os.system is a 16 bit value with the return code in the upper 8 bits and a number of flags related to the traps in the lower 8 bits. The existing code passes this 16 bit value to the os.exit call, which just uses the bottom 8 bits. Unless the child process is killed by a signal these 8 bits will be zero, which is taken as "success", rather than passing on the exit status of the child process. So even something as simple as ERROR_CHECK(os.system("false")) will report a status of 256 in the print statement but will call sys.exit() with a value of 0 in the lower 8 bits. This change folds the top and bottom halves of the 16 bit value into an 8 bit value. This will be non-zero, so a shell script running rocDecode-setup.py will know something has failed an ERROR_CHECK, rather than the current situation where it thinks things are correct. * fix for while loop hang (#447) * set disp_delay to 1 for all samples (#446) * GPU Arch Updates (#448) --------- Signed-off-by: Jeremy Newton <[email protected]> Co-authored-by: jeffqjiangNew <[email protected]> Co-authored-by: Kiriti Gowda <[email protected]> Co-authored-by: Peter Park <[email protected]> Co-authored-by: spolifroni-amd <[email protected]> Co-authored-by: Lakshmi Kumar <[email protected]> Co-authored-by: Rajy Rawther <[email protected]> Co-authored-by: Jeremy Newton <[email protected]> Co-authored-by: Icarus Sparry (work) <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support GFX12 till PR #415 is complete