-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize sequential draws of the same pipeline #2539
Optimize sequential draws of the same pipeline #2539
Conversation
|
Some(x) => x, | ||
None => return, | ||
}; | ||
.cloned() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another theoretical performance optimization that could be done: remove this clone. It clones the entire DescriptorSetState, which contains a HashMap, and if push descriptors are used each one of those contains another HashMap. Instead it could be some Cow<Arc<_>>
so that if state does not change between draws it isn't cloned again.
But practically it's insignificant, it's done once for every draw/dispatch and 103 draws take just 0.12ms. There's other more significant bottlenecks.
I'm uncertain how secondary cmd buffers are handled. I could imagine a usecase, when they contain many draws with always changing pipelines (which is generally frowned upon anyways) and with very few buffers used, that could maybe be slower with this PR. But I dunno, that may need some testing. Result: secondary cmd buffers have also improved significantly, 40fps to 120fps (meshlet on sponza), but it still is significantly better to just record everything into the primary cmd buffer, where I could reach 550fps. |
You have merge conflicts, but please don't trouble yourself with fixing them. Given that the old command buffer is destined to go the way of the dinosaurs, I would prefer it if all effort was directed to the new command buffer in vulkano-taskgraph, as any change like this always comes with the risk of introducing more bugs. I really appreciate the work you put into this though! ❤️ |
Update documentation to reflect any user-facing changes - in this repository.
Make sure that the changes are covered by unit-tests.
Run
cargo clippy
on the changes.Run
cargo +nightly fmt
on the changes.Please put changelog entries in the description of this Pull Request
if knowledge of this change could be valuable to users. No need to put the
entries to the changelog directly, they will be transferred to the changelog
file by maintainers right after the Pull Request merge.
Please remove any items from the template below that are not applicable.
Describe in common words what is the purpose of this change, related
Github Issues, and highlight important implementation aspects.
Profiling based on my meshlet demo on sponza. Before at ~40fps and fully CPU limited:
![image](https://private-user-images.githubusercontent.com/31222740/344099016-b1f0a81d-9d8c-41b1-83f2-06abea7ac062.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3MDAyMzYsIm5iZiI6MTczOTY5OTkzNiwicGF0aCI6Ii8zMTIyMjc0MC8zNDQwOTkwMTYtYjFmMGE4MWQtOWQ4Yy00MWIxLTgzZjItMDZhYmVhN2FjMDYyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDA5NTg1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdiM2YyZTc4NTFhZDY0YjhlMTgyMDk0M2U5NzFmYmU0ZjBjN2NmYTFmMGQyYjEyOWNhNjU5Yjc3MDI1ODdmNTgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Kx0j8YhQwMECo0hk6zhqt8Cw-3Q5-bzOeB5hy5wTHaY)
With this PR I get ~550fps and am most likely CPU GPU sync limited, due not having a working frame in flight system.
![image](https://private-user-images.githubusercontent.com/31222740/344116587-248887db-06a4-4cc7-b443-70928b66d773.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3MDAyMzYsIm5iZiI6MTczOTY5OTkzNiwicGF0aCI6Ii8zMTIyMjc0MC8zNDQxMTY1ODctMjQ4ODg3ZGItMDZhNC00Y2M3LWI0NDMtNzA5MjhiNjZkNzczLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDA5NTg1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTA2ODdmMWE2YjcxMzQxMThlOGVjOWZmMWE3MDljMDM5NzExYzAzZDY2ZDY3ZTNiMDA1NTllN2JlMjhlZWQzNGEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.5ohulNtH3NeYEJVu2Mpy-SJvISFGyrxtAgzcoXbotIE)
And the best part is: It's absolutely free and still safe!*
*(if I didn't mess up)
Current master evaluates all accessed resources (buffers, images) if you call any draw or dispatch immediately. My change defers the evaluation specifically of descriptor sets for their resources to the end of the cmd buffer recording. This allows me to deduplicate draws and dispatches using the same pipeline, merging their resources and descriptor sets together, and then deduplicate the descriptor sets again before evaluating each unique one for their actual resources. You may even be able to merge more than that, but I'd rather be on the conservative side with different pipelines.
Changelog: