Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GFX12 #423

Merged
merged 2 commits into from
Sep 17, 2024
Merged

Support GFX12 #423

merged 2 commits into from
Sep 17, 2024

Conversation

kiritigowda
Copy link
Collaborator

Support GFX12 till PR #415 is complete

@kiritigowda kiritigowda added bugfix ci:precheckin run mainline precheckin CI job labels Sep 16, 2024
@kiritigowda kiritigowda added the enhancement New feature or request label Sep 16, 2024
@AryanSalmanpour AryanSalmanpour merged commit eb5f2da into ROCm:develop Sep 17, 2024
9 of 13 checks passed
@kiritigowda kiritigowda deleted the kg/SWDEV-479038 branch September 21, 2024 18:37
vamovsik pushed a commit that referenced this pull request Nov 19, 2024
* * rocDecode/AV1: Performance improvement: prevent synchronous decode submissions. (#406)

- Set the display delay to DECODE_BUF_POOL_EXTENSION (2) to avoid immediate output/display of a decoded frame.

* CTest Updates - Fix duplicates (#408)

* Test - Fix CTest

* CMakeLists - Clang Set

* Ctest - support

* Readme - Fix and updates

* Readme - minor fix

* Readme - MS template

* Install - Minor instructiion fix

* Clang - Added as default CXX compiler

* Update CHANGELOG.md

Remove unreleased

* License - Remove license from dev & test packages (#410)

* Added real decode speed report to set it apart from the current output speed report in sample apps (#409)

* * rocDecode: Added real decode speed report.
 - The current decode speed report is actually output/display speed report.
 - Due to AV1's extensive use of alternate reference frames that are not display, AV1 decoded frame count and output/displayed frame count can be quite different, making the current speed report not an accurate decode speed measurement.
 - We now added the actual decode speed report, besides the existing speed report, now called output/display FPS.

* * rocDecode: Added real decode speed report.
 - The current decode speed report is actually output/display speed report.
 - Due to AV1's extensive use of alternate reference frames that are not display, AV1 decoded frame count and output/displayed frame count can be quite different, making the current speed report not an accurate decode speed measurement.
 - We now added the actual decode speed report, besides the existing speed report, now called output/display FPS.

* * rocDecode/Sample script: Added missing changes for sample_mode 0 case.

* * rocDecode/Sample script: Sorted the files to enable easy post-procssing of the performance data. (#411)

* * rocDecode/Perf: Added resolution and bit rate info into csv output, to speed up performance data post-processing. (#412)

* update Doxyfile to strip Read the Docs dir (#418)

* Simplified MD5 string compare code and fixed potential incorrect conversion of MD5 string to integers. (#414)

* * rocDecode: Fixed potential incorrect conversion of MD5 string to integers.

* * rocDecode: Changed a string name.

* * rocDecode: Simplified the MD5 string compare code.

* * rocDecode: Added minor changed based on review comments.

* * rocDecode: Minor changes.

* * rocDecode/Sample script: Added units to Bit rate field in csv output.

* Support GFX12 (#423)

* Added a note pointing users to the official documentation and removed the local build information. This info is in the contribution documentation. (#417)

* Modify the videoDecodePerf app to take an argument for memory type (#424)

* * rocDecode/Perf: Improved the accuracy of decode performance measurement for the performance sample. We need to wait for the decode completion of the last picture before sampling the end time. (#425)

* change clang++ path as suggested by packaging team (#427)

* Find rocDecode - Support added (#428)

* Find rocDecode - Support added

* Find rocDecode - Updates

* Find rocDecode - Version fix

* Find rocDecode - Version Var

* Minor cleanup

* Test - Find package updates

* CTest - Upgrades

* CTest - Enhancements

---------

Co-authored-by: Aryan Salmanpour <[email protected]>

* Package - dependencies updated (#416)

* Package - dependencies updated

* Changelog - new format added

* Setup - OS specific updates

* CMakeList - Cleanup

* Version Updates Fix

* Add new API rocDecParserMarkFrameForReuse() for Parser (#430)

* added new API to release video frame for decoder and parser

* removed ReleseFrame() from low level parser classes

* Removed rocDecReleaseFrame() from decoder and added in parser

* address review comments

* revert un-necessary files

* minor fix

* remove unused function

* minor formatting fix

* Fix libva requirements for rocdecode (#435)

* Fix libva requirements for rocdecode

mesa-amdgpu-va-drivers is built with libva 2.16 (VA-API 1.16), so it
provides the entry point "__vaDriverInit_1_16". For rocdecode to use
mesa, it also needs to make sure it has a high enough requirement on
libva to be compatible with this function.

Strictly speaking, it doesn't matter what libva is used as long as it's
2.16 or newer, since libva is backwards compatible. An OR conditions is
used to favour distro packages when possible to avoid causing issues
with existing libraries built against the distro version.

For libva dev packages, we can just use libva-amdgpu-dev/el directly.

Signed-off-by: Jeremy Newton <[email protected]>

* Update to use libva-amdgpu

To reflect the package change, update the README, rocDecode-setup.py,
and the CHANEGLOG.

Putting the minimum VA-API version in the README isn't required as the
user is expected to just install the latest libva-amdgpu to match the
mesa VA-API version.

---------

Signed-off-by: Jeremy Newton <[email protected]>

* Find the minimum supported libva version 1.16 when building rocdecode (#437)

* Find the minimum supported libva version 1.16 when building rocdecode

* Update the changelog

* Update the Error message if libva-amdgpu-dev/libva-amdgpu-devel not found

* Add missing comma

* Allow overriding CMAKE_CXX_COMPILER (#436)

Using set as-is doesn't allow the user to set their own rocm path.
This is useful for community packagers or debugging.

Signed-off-by: Jeremy Newton <[email protected]>

* * rocDecode/AV1: Fixed an errror in get Q index function during code inspection. (#438)

* Revert "Allow overriding CMAKE_CXX_COMPILER (#436)" (#440)

This reverts commit 07ecb5e.

* updated the changelog for 6.3 (#439)

* VideoDecode samples - Set the default display_delay to 1 (#441)

* Setup - Fix status return (#444)

The code is full of ERROR_CHECK(os.system("some shell commands")).

Unfortunately the return value from os.system is a 16 bit value with the return code in the upper 8 bits and a number of flags related to the traps in the lower 8 bits. The existing code passes this 16 bit value to the os.exit call, which just uses the bottom 8 bits. Unless the child process is killed by a signal these 8 bits will be zero, which is taken as "success", rather than passing on the exit status of the child process.

So even something as simple as 
    ERROR_CHECK(os.system("false"))
will report a status of 256 in the print statement but will call sys.exit() with a value of 0 in the lower 8 bits.

This change folds the top and bottom halves of the 16 bit value into an 8 bit value. This will be non-zero, so a shell script running rocDecode-setup.py will know something has failed an ERROR_CHECK, rather than the current situation where it thinks things are correct.

* fix for while loop hang (#447)

* set disp_delay to 1 for all samples (#446)

* GPU Arch Updates (#448)

---------

Signed-off-by: Jeremy Newton <[email protected]>
Co-authored-by: jeffqjiangNew <[email protected]>
Co-authored-by: Kiriti Gowda <[email protected]>
Co-authored-by: Peter Park <[email protected]>
Co-authored-by: spolifroni-amd <[email protected]>
Co-authored-by: Lakshmi Kumar <[email protected]>
Co-authored-by: Rajy Rawther <[email protected]>
Co-authored-by: Jeremy Newton <[email protected]>
Co-authored-by: Icarus Sparry (work) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix ci:precheckin run mainline precheckin CI job enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants