Optimize sequential draws of the same pipeline #2539

Firestar99 · 2024-06-27T22:07:51Z

Update documentation to reflect any user-facing changes - in this repository.
Make sure that the changes are covered by unit-tests.
Run cargo clippy on the changes.
Run cargo +nightly fmt on the changes.
Please put changelog entries in the description of this Pull Request
if knowledge of this change could be valuable to users. No need to put the
entries to the changelog directly, they will be transferred to the changelog
file by maintainers right after the Pull Request merge.

Please remove any items from the template below that are not applicable.
Describe in common words what is the purpose of this change, related
Github Issues, and highlight important implementation aspects.

Profiling based on my meshlet demo on sponza. Before at ~40fps and fully CPU limited:

With this PR I get ~550fps and am most likely CPU GPU sync limited, due not having a working frame in flight system.

And the best part is: It's absolutely free and still safe!*
*(if I didn't mess up)

Current master evaluates all accessed resources (buffers, images) if you call any draw or dispatch immediately. My change defers the evaluation specifically of descriptor sets for their resources to the end of the cmd buffer recording. This allows me to deduplicate draws and dispatches using the same pipeline, merging their resources and descriptor sets together, and then deduplicate the descriptor sets again before evaluating each unique one for their actual resources. You may even be able to merge more than that, but I'd rather be on the conservative side with different pipelines.

Changelog:

### Additions
- Optimized performance of back to back draws/dispatches using the same pipeline significantly

Firestar99 · 2024-06-27T22:14:19Z

~~I'm seeing some weird performance behavior in bistro I would like to investigate first before officially submitting this~~ Resolved: attached RenderDoc hates super large cmd buffers of 3000+ draws.

Firestar99 · 2024-06-28T10:27:16Z

vulkano/src/command_buffer/commands/pipeline.rs

-            Some(x) => x,
-            None => return,
-        };
+            .cloned()


Another theoretical performance optimization that could be done: remove this clone. It clones the entire DescriptorSetState, which contains a HashMap, and if push descriptors are used each one of those contains another HashMap. Instead it could be some Cow<Arc<_>> so that if state does not change between draws it isn't cloned again.
But practically it's insignificant, it's done once for every draw/dispatch and 103 draws take just 0.12ms. There's other more significant bottlenecks.

Firestar99 · 2024-06-28T11:09:03Z

I'm uncertain how secondary cmd buffers are handled. I could imagine a usecase, when they contain many draws with always changing pipelines (which is generally frowned upon anyways) and with very few buffers used, that could maybe be slower with this PR. But I dunno, that may need some testing.

Result: secondary cmd buffers have also improved significantly, 40fps to 120fps (meshlet on sponza), but it still is significantly better to just record everything into the primary cmd buffer, where I could reach 550fps.

marc0246 · 2024-10-21T06:42:40Z

You have merge conflicts, but please don't trouble yourself with fixing them. Given that the old command buffer is destined to go the way of the dinosaurs, I would prefer it if all effort was directed to the new command buffer in vulkano-taskgraph, as any change like this always comes with the risk of introducing more bugs. I really appreciate the work you put into this though! ❤️

Firestar99 added 7 commits June 27, 2024 23:15

add deferred UsedResources to collect resources in descriptor sets later

009b5fb

pipe UsedResources through AutoSyncState

b25064c

merge CommandInfos before adding resources

4caf6a8

add debug derives

9bf10b7

merging with unassigned pipelines

7e59f5a

don't accumulate empty direct arrays

074b0da

deduplicate deferred DescriptorSetStates

edf5e1e

Firestar99 changed the title ~~Optimize sequential draws~~ Optimize sequential draws of the same pipeline Jun 27, 2024

fix clippy lints

51c12b9

Firestar99 commented Jun 28, 2024

View reviewed changes

resource error reporting uses correct cmd name, like previously

d61b399

Firestar99 marked this pull request as ready for review June 29, 2024 10:12

marc0246 closed this Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize sequential draws of the same pipeline #2539

Optimize sequential draws of the same pipeline #2539

Firestar99 commented Jun 27, 2024 •

edited

Loading

Firestar99 commented Jun 27, 2024 •

edited

Loading

Firestar99 Jun 28, 2024 •

edited

Loading

Firestar99 commented Jun 28, 2024 •

edited

Loading

marc0246 commented Oct 21, 2024

Optimize sequential draws of the same pipeline #2539

Optimize sequential draws of the same pipeline #2539

Conversation

Firestar99 commented Jun 27, 2024 • edited Loading

Firestar99 commented Jun 27, 2024 • edited Loading

Firestar99 Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

Firestar99 commented Jun 28, 2024 • edited Loading

marc0246 commented Oct 21, 2024

Firestar99 commented Jun 27, 2024 •

edited

Loading

Firestar99 commented Jun 27, 2024 •

edited

Loading

Firestar99 Jun 28, 2024 •

edited

Loading

Firestar99 commented Jun 28, 2024 •

edited

Loading