Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__debug function performance #282

Open
TurkeyMan opened this issue Oct 22, 2024 · 26 comments
Open

__debug function performance #282

TurkeyMan opened this issue Oct 22, 2024 · 26 comments

Comments

@TurkeyMan
Copy link

Are there any opportunities to optimise or improve the performance of these __debug functions?
They are extremely slow. I have a custom string type and if a struct contains a few strings, stepping feels really sluggish, and if there's an array anywhere in view; it takes seconds to 10s of seconds each step.

Why are they so slow? It doesn't seem right that setting up the call should be that much trouble? If I can understand the performance characteristics, maybe I can make improvements on the app side...?

I have NOT enabled the "switch GC" option, since I am confident all my __debug functions are @nogc.

@rainers
Copy link
Member

rainers commented Oct 22, 2024

I shortly looked at the CPU diagnostics of displaying 100 of your Strings and the debugger mostly waits for the call to be performed in the other process. AFAICT this needs multiple inter-process-communication (devenv.exe <-> msvcmon.exe <-> debuggee.exe) so I suspect there is some inefficient waiting going on.
Maybe that can optimized for arrays of objects with __debug methods, but that will get messy e.g. if it is only one field of the array element that has an __debugOverview method.

@TurkeyMan
Copy link
Author

TurkeyMan commented Oct 22, 2024

Hmmm, well, in its current state it's almost unusable. I think I'm going to have to turn the __debug feature off, because it's just too slow :/
Sadly, I don't think it's really reasonable to work without it either; not being able to inspect strings and arrays severely undermines the usefulness of a debug session.

If it's about ipc waiting, then finding any opportunities for batching up requests seems like the way to go... is the code structured in such a way that you can gather the requests rather than resolving them immediately, and then send them all to the debuggee in one batch? That might require a lib compiled into the debuggee offering a function which can receive a bundle of requests from the debugger, and then the debuggee might iterate the bundle and resolve them all in one big go? :/
So, rather than calling each __debug function independently, call one global __debug function from a lib, which would locally iterate the bundle of requests calling each __debug function on each object in the bundle?
If the lib is not linked, fall back on existing semantics. Should be simple to check if a global __debugResolveBundle symbol is present in the binary.
Might help the GC swapping too; just do it once around the bundle...

@TurkeyMan
Copy link
Author

Maybe we could put a function in druntime that's present when building with symbols... at very least, I wouldn't be upset to link a lib.

@rainers
Copy link
Member

rainers commented Oct 23, 2024

The swapped GC is inside a DLL loaded into the target process, so that could contain helper functions. The calls by VS to evaluate locals or expressions are rather fine grained and probably not easily combinable (especially not delayed until more requests come in), but if an enumerator is returned to VS, you can predict that it is pretty likely that it will get enumerated to some extend, so the elements could be evaluated in larger chunks and be cached.
Not a small change, though...

@TurkeyMan
Copy link
Author

The swapped GC is inside a DLL loaded into the target process, so that could contain helper functions. The calls by VS to evaluate locals or expressions are rather fine grained and probably not easily combinable (especially not delayed until more requests come in),

Yeah, this was my concern. I can a Microsoft API being very event/response based... with an expectation of immediate responses :/

but if an enumerator is returned to VS, you can predict that it is pretty likely that it will get enumerated to some extend, so the elements could be evaluated in larger chunks and be cached. Not a small change, though...

I'm not sure I follow, but I'll take your word for it. Can the debugger accept and retain an enumerator object across requests, and how would that have an IPC advantage?

@rainers
Copy link
Member

rainers commented Oct 24, 2024

Yeah, this was my concern. I can a Microsoft API being very event/response based... with an expectation of immediate responses :/

On second look, the API does allow to return results asynchronously, so it could allow bundling multiple requests to the target process.

Can the debugger accept and retain an enumerator object across requests, and how would that have an IPC advantage?

For structs and arrays an enumerator is returned when asked for "children", and VS then calls something similar to GetItems(enumerator, start, count). Even if called individually for each child, these could return cached results filled from reading larger chunks.

On the other hand, calling a function in the process is set up via an abstract stack language that gets compiled into the target process. This currently happens for each evaluation. Maybe it is the compilation that takes most of the time (not the execution) and can be avoided by reusing the same compiled instruction sequence with only the this-pointer exchanged.

@TurkeyMan
Copy link
Author

Okay, it sounds like there's multiple promising paths forwards.

@rainers
Copy link
Member

rainers commented Oct 29, 2024

FYI: I tried executing the same "compiled" instructions multiple times, but that won't help speeding up evaluation as each execution takes 40-50 ms. So that idea can be dismissed.

Then I added partial experimental support for the asynchronous interface of the debugger functions. As a result each function call executed in the target process is added to a work list provided by the debugger host. The work list gets executed in another thread, but it is still sequential and takes about the same time for each operation. My implementation isn't yet complete but I assume that the advantage of this will be that it doesn't block the UI and operations can be cancelled before completion and you can continue stepping through the program without waiting for all debug windows being updated. Fully supporting that seems realistic, but I expect it's still quite some work to do.

I guess that might help for one of the pain points, but execution will still be slow. To avoid that we could reimplement the worklist and combine requests ourselves before sending them to the target process. Not sure how feasible this actually is, though. Anyway, it still needs the same asynchronous evaluation as above, so the changes above would not be lost.

A completely different approach could be the implementation of something similar to C++'s natvis definitions, that allows expressions and some conditions on the debugger side and just has to read target process memory. That might get tricky as well especially with respect to templated structs/classes, though.

@TurkeyMan
Copy link
Author

Where you say 'full support', you mean batching them and sending them to the debuggee all at once?

It's definitely a huge improvement if it stops locking up the UI, even if it still takes a while to populate the window when you stop. The slow stepping speed is definitely a show-stopper.

natvis does seem to be very fast, but it's certainly a weird definition language and it's really hard to implement some data structures. Your idea with __debug functions is definitely a far superior idea, if it can be made to work efficiently.
Implementing something like natvis likely requires some sort of DSL to describe the custom evaluations, and probably a huge implementation effort?

@rainers
Copy link
Member

rainers commented Oct 31, 2024

Where you say 'full support', you mean batching them and sending them to the debuggee all at once?

I meant full support for the asynchronous interface. My partial implementation is currently limited to displaying arrays of structs with __debugOverview definitions, other enumerations (struct members, locals, etc.) are still using synchronous calls. It seems feasible and worthwhile to adapt these similarly, but still a bit of work. I'll try that.

Implementing something like natvis requires a larger effort, indeed. The expression evaluation part is there, though. I can imagine evaluating a static __debugOverviewExpr struct member and evaluating that as if it was a watch expression, but it get's more elaborate if control flow with statements is needed.

@TurkeyMan
Copy link
Author

If you're spending some time with VisualD at the moment, do you also have any theories why go-to definition only works around 50% of the time? I guess the frontend is bailing out and much of the code has no semantic markup? Is there a way we can get a clear error output in cases where the semantic bails out? auto-complete and goto definition are really flakey now... I guess it's from the change to DMD-FE?

@TurkeyMan
Copy link
Author

Actually, just to add to that something I hadn't noticed earlier; I was just playing around trying to identify patterns why completion suggestions weren't working... I half-typed a symbol (an enum key in this case), and then ctrl-space which should auto-complete or pop-up the completion suggestions, but it was doing nothing like usual... and then I got distracted for a moment, and then a good few seconds later the completion suggestions popped up.
So, the semantic did know the proper suggestions, and it did eventually appear, but it took quite a long time for it to appear, and I just thought it wasn't working at all.
When typing quickly, you expect ctrl-space to perform a completion INSTANTLY, same with the suggestion pop-up; it needs to be instantaneous to match the flow of natural typing, so even though it's working, it's just very slow and I always did some operation like move the cursor or click something that disrupted the waiting period.

So, that's encouraging, but what could possibly disrupt the completion from working quickly? It doesn't feel like something that should be involving IPC or any of those latent operations.
Any thoughts?

@rainers
Copy link
Member

rainers commented Nov 5, 2024

While you edit the code you change the semantics at the current caret location, so the new source has to be evaluated first. At least the current file has to be analyzed semantically again. The dmdserver tries to minimize analyzing imported modules, but not sure how well this works.

In my experience, browsing the source with goto definition works quite ok as you don't edit it in that situation, but completion often is rather slow (I also found a few regressions regarding class members and fields). IIRC it is slower than when actually compiling, not sure what's causing that, I'll have to profile that.

I have prepared a version of the dmdserver that can log Visual D's requests and responses, maybe that can shed some insight. I want to complete the async debugger interface before the next release, though.

@rainers
Copy link
Member

rainers commented Nov 5, 2024

You can try the new build of MagoNatCC.dll from https://ci.appveyor.com/project/rainers/mago/builds/50930366/artifacts. This uses the async evaluation for __debugOverview calls, but doesn't work correctly in combination with switching the GC. It seems pretty responsive in my tests, but if I failed to handle completion or cancellation requests, it usually freezes the debugging session at some point, though.

@TurkeyMan
Copy link
Author

Where do I place this DLL? My current instal doesn't seem to have those files to replace:

image

@TurkeyMan
Copy link
Author

They're all under Program Files (x86), does that mean I should take the 32bit dll's?

@rainers
Copy link
Member

rainers commented Nov 6, 2024

Just like the last time, only this one needs to be replaced: "c:\Program Files\Microsoft Visual Studio\2022\Preview\Common7\Packages\Debugger\MagoNatCC.dll"

@TurkeyMan
Copy link
Author

I'm seeing a lot of this sort of thing now:
image

@rainers
Copy link
Member

rainers commented Nov 10, 2024

Too bad. Is this one of the places where you would expect one of your custom strings to be displayed? If something else, could you specify more details about its type?

@rainers
Copy link
Member

rainers commented Nov 10, 2024

Ah, classes instead of structs seem broken. I'll take a look...

@rainers
Copy link
Member

rainers commented Nov 10, 2024

There is a new build here: https://ci.appveyor.com/project/rainers/mago/builds/50957984/artifacts that fixes class display and a few other issues.

rainers added a commit to rainers/visuald that referenced this issue Nov 14, 2024
…nchronously

  * mago: fixed switching GC for funtion evaluation and adapted it to dmd 2.109
  * dmdserver: improve completion for member variables, code in switch statement
  * dmdseerver: added option to display type size and alignment in tool tip
  * added separate version of dbuild.dll linked against Microsoft.Build.CPPTasks.Common for VS 17.12
@rainers
Copy link
Member

rainers commented Nov 14, 2024

I fixed several more issues caused by the conversion to the async interface. It is now released as part of https://github.com/dlang/visuald/releases/tag/v1.4.0-rc3

@TurkeyMan
Copy link
Author

So I've been running with this new code for a while now. Not blocking the debugger is certainly a lot better, but it's still too slow to be practical to actually use in a lot of my code.
While it doesn't lock the UI, it still causes waits before input can be accepted; for instance, if you want to press F10 (step-next) or even F5 (continue), you have to wait the same very long time for everything to respond before the UI seems to respond to input.

@rainers
Copy link
Member

rainers commented Dec 7, 2024

That's not the case for me, I can step instructions immediately while the watch/local windows are still showing "busy". Maybe your use case still hits an expression that causes synchronous calls (tuples come to mind).

@TurkeyMan
Copy link
Author

Interesting. I haven't seen any cases where the pattern is different.
I do use tuples a bit, but definitely not in every scope. Any other such cases you can think of?

My main issue with tuples is that they always show like an empty struct. Just {} appears.

@rainers
Copy link
Member

rainers commented Dec 14, 2024

I noticed that I must have been looking mostly at locals or arrays of structs with __debugOverview. Expressions in the watch and auto windows could still be evaluated synchronously. https://github.com/dlang/visuald/releases/tag/v1.4.0-rc4 should have that fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants