Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early injection helper #197

Open
valinet opened this issue Jun 6, 2024 · 20 comments
Open

Early injection helper #197

valinet opened this issue Jun 6, 2024 · 20 comments
Labels
enhancement New feature or request

Comments

@valinet
Copy link

valinet commented Jun 6, 2024

Hi

First of all, I have to admit, yeah, I am a really big fan of this project. Good job, and hats off - from UI, mod deployment to coding and injection, everything is state of the art so to say - textbook example. Really glad to see this open sourced and so well maintained, congratulations.

Now, I have a small question: I have read the technicality behind this. Indeed, Windows does not have any proper hook/in time notification mechanism for when a new process is created, short of writing a kernel mode driver. I mean, that's a possibility as well, idk, I am curious to hear your opinion on this but it is not my main question today.

The main question is, so to say, how to help Windhawk more effectively inject these processes that it injects late? For example, since services.exe cannot be injected (it being PPL WinTcb), usually Windhawk fails to inject in time for most services, which is not that great. Now, depending on what we want to inject, there could be workarounds. For one of my needs, I have written a proxy stub forwarder DLL that I place in the same folder, beside the target executable. Because of DLL load order, my custom proxy DLL is loaded, which forwards all exported functions of the original to the original DLL via a symlink in the same folder to the original DLL. Now, my question would be, how to "help" Windhawk properly with this DLL? Right now, what I do is simply Sleep(1000) there, which gives enough time for Windhawk to pick up the new process, inject it and still have the process in a pristine state, since most threads haven't stated yet, while the loader still loads imported DLLs in the executable.

But like, Sleep(1000) is hacky. Do you think a mechanism for somehow notifying Windhawk to trigger its internal "scan for processes/inject new processes" thing would make sense (something simple, like a named event, for example?). Another possibility I think would be to load windhawk.dll in process and call the InjectInit export, I think? Although yeah, the way it is right now, the path to windhawk.dll is not that easily obtainable, maybe add some registry path that specifies it? Otherwise, FindFirstFile & co for listing directories in Windhawk's Engine folder in reverse alphabetical order and picking the first entry? What's your recommmendation on this use case, any advice or direction for making such functionality official?

Again, thank you for this great product.

@valinet

@valinet valinet added the enhancement New feature or request label Jun 6, 2024
@m417z
Copy link
Member

m417z commented Jun 8, 2024

Hi Valentin,

Thank you for the great feedback. I spent a lot of time and effort in Windhawk, I'm happy to see that it's appreciated!

I have to say that I'm also impressed by your projects, mainly Explorer Patcher but others too. I recommended it to some friends, many users, and mentioned it in several blog posts. Also, as you probably know, some bits of your work were ported to Windhawk mods.

Regarding your question:

writing a kernel mode driver. I mean, that's a possibility as well, idk, I am curious to hear your opinion on this but it is not my main question today.

The short answer, from my blog post: "I wanted Windhawk to be able to run even without administrator rights. And in general, I preferred to avoid installing a driver which is too intrusive to my taste and can affect the system's stability". Having to worry about the driver signature is yet another reason to avoid it. With all the downsides, I felt that it's not worth it.

since services.exe cannot be injected (it being PPL WinTcb)

Normally not, but people find workarounds from time to time, and Microsoft fixes them, sometimes quickly, sometimes not so much. Recently, I stumbled upon a new method: https://github.com/Slowerzs/PPLSystem. It's not sustainable for the long term but can be helpful depending on your goal.

Do you think a mechanism for somehow notifying Windhawk to trigger its internal "scan for processes/inject new processes" thing would make sense (something simple, like a named event, for example?)

Yes, I think it's a great idea, allowing for more customization opportunities without much effort or downsides.

The nifty thing about Windhawk is that instead of releasing a new version for testing a small fix or feature, I can create a mod :)
So here's a small mod that adds this functionality, and some simple C++ code that demonstrates the usage of the new event:
https://gist.github.com/m417z/5811d5eda32fc33b86c5053f2b54b16c

Note that I also added a mechanism for waiting for the Windhawk module to initialize by waiting for the CreateProcessInternalW hook to be placed. It's not very elegant, but should be fairly reliable. If I end up adding this feature to Windhawk, it'd probably be a good idea to add an event with the PID in the name that will be signaled when initialization is done.

One downside of this implementation is that the named event won't be available in sandboxed UWP apps. Not sure if the feature is even relevant for them.

Another possibility I think would be to load windhawk.dll in process and call the InjectInit export, I think?

That's possible in theory, but it wouldn't be very convenient as InjectInit expects to receive some data that has to be prepared for it.

the path to windhawk.dll is not that easily obtainable

The relative path to the up-to-date engine folder can be obtained from C:\Program Files\Windhawk\windhawk.ini. The reason it's not a fixed path is that upon an update or reinstall, some DLLs might be locked, e.g. because they're loaded in suspended processes. In this case, instead of asking for a reboot, a different folder is created.

@valinet
Copy link
Author

valinet commented Jun 22, 2024

Hi Michael,

First of all, thank you very much, such a module definitely helps address the issue.

Also, I apologize for the delayed reply, I took the time to play at lengths with this the past few days (weeks), even wrote a paper for a subject I am enrolled into about this late injection dilemma. I have basically tested a bunch of methods (AppInit_DLLs, SetWindowsHookEx, load order hijacking, using the AppVerifier infrastructure, driver) and have looked into a bunch more (API set overrides, shims, job object notifications).

For my initial use case, I was using the load order hijacking method. But that is messy, since one needs to craft custom proxy DLLs and place them besides the targeted executable, which is not always doable (for example, when wanting to hook something specific from System32, like taskmgr.exe, but not everything in there). I have since came to the conclusion that the best methods for the "notifier" part of the solution (the thing that signals the WindhawkScanForProcesses event would be:

1. A custom DLL injected using the AppVerifier infrastructure.

This could work nicely with Windhawk. First of all, indeed, it would require administrative privileges to be configured, but that is expected and okay for a lot of use cases, including mine.

Windhawk could include a section in settings where users write the names of the executables they want injected with this method. Then, all Windhawk has to do is place a universal injection DLL in System32 and SysWOW64 and configure the desired applications set by the user to load that DLL, by specifying appropriate GlobalFlag and VerifierDlls for each of the apps in HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options. For example, if we want to hook test.exe using this, Windhawk has to set:

HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\test.exe:

  • GlobalFlag (REG_DWORD): 256 (0x100)
  • VerifierDlls (REG_SZ): WhSignalLib.dll

That's it, it's one DLL for all such apps the user specifies in the list. Injection is done by Windows and happens very early in the process' lifetime, even before kernel32.dll or kernelbase.dll are loaded (and we don't need those from such a simple DLL if we write everything needed suing only calls from ntdll.dll). All that universal injection DLL then has to do is to signal the WindhawkScanForProcesses event and then wait for the patching to be done and that's it, as you described, it's job is done (ofc, the AppVerifier infrastructure is pretty extensive - Windows can set hooks for any function for us, it has a built in hooking engine when used like this, but yeah, that's another discussion). Busy waiting on that byte from CreateProcessInternal is fine for me, it's not that often anyway and could be calmed down using a bunch of alertable sleeps of some actual time anyway.

With this method, Windhawk subsequently injects the new process by creating a remote thread in it, which is expected, since the process has actually started running.

2. A custom driver that signals the event on receiving notifications for new process creation using PsSetCreateProcessNotifyRoutineEx

The title is self explanatory. This method, kind of expected, works best. Basically, it's a simple driver that subscribes to processes creation notifications and signals the WindhawkScanForProcesses event when the system creates a new process (calls our notification routine in the driver). If the event is reopened on each call, nothing bad can happen - if Windhawk is not running, the driver simply does not signal the event and that's it. The method is error prone, with the driver being lightweight and using only officially sanctioned mechanisms. The driver then does not even wait in the notification routine, it simply lets it flow. I have seen that, in practice, in my tests, notifying Windhawk that early (but we are guaranteed that the user space "sees" the new process when the system calls the notification routine in the driver) is enough for Windhawk to be able to inject the new process using an APC on the main thread even (!), so the process is there but hasn't even started yet, which is optimal if you ask me. In practice this behaves similarly to how Windhawk treats processes which it knows about because of the CreateProcessInternalW hooks. Indeed, here there is no check, no wait for Windhawk to patch before "letting go" in the notification callback, but in practice, in my tests, it did not seem to be an issue - each and every time such processes were then injected by Windhawk using the "APC method", so the window of opportunity there is very good.

An interesting situation could happen where the system is executing the driver notification routine, we opened the event but haven't yet set it and closed it, while on the user land side suppose Windhawk restarts just then. Then, suppose the kernel is still there, slow, hasn't yet closed the event, and Windhawk starts, it will try to create the event and yeah, I think it succeeds, and actually will reuse the old event (since the kernel hasn't yet closed it), which is actually fine, so yeah, don't think it is a problem, just an interesting case to think about. As I said, the mechanism is so simple that I do not see much going wrong with it - we definitely do not want drivers causing havoc, but here it definitely should not be the case.

Ofc, as we know, for drivers we need admin access to setup, plus, the elephant in the room, having it signed. While I personally run it just fine using ssde, it is not doable for everyone. Do you think such a driver has any chance of getting signed by Microsoft? I mean, any exe could come and set up such an event and then be notified about new process creation. Idk what is so bad regarding that (the fact that there is no such user space API is pretty infuriating tbh), but maybe it is against some rules...? Do you happen to own an EV certificate for code signing? I was thinking maybe getting one for myself, but yeah, haven't made up my mind, if you have any experience, I'd definitely appreciate some tips. Yeah, the code signing requirement may definitely bury this idea, yet still, interesting to look at nevertheless.

Examples for the 2 techniques described above are here:
https://gist.github.com/valinet/0b61552b493079de1e3b4762378d352e

Also here a repo with an entire test setup I used for my paper:
https://github.com/valinet/R2Work

Again, thanks for Windhawk, such a great tool, a joy to have on my side each and every day. Looking forward to hearing your input on this.

Valentin

@m417z
Copy link
Member

m417z commented Jul 1, 2024

A custom DLL injected using the AppVerifier infrastructure

I wanted to play with this method a while ago, but @namazso, the SecureUxTheme author, shared some negative experience with it, such as performance issues.

Here are some quotes:

Settings app slow [...] this is caused by app verifier enabling heap debugger

namazso/SecureUxTheme#87

Slow log out/shutdown
[...]
the main performance penalty [...] from the application verifier's initialization
[...]
On my severily resource limited VM lockscreen without SecureUxTheme took 0.716 seconds to appear with background, while 1.200 seconds with SecureUxTheme

namazso/SecureUxTheme#5

That made the method much less appealing.

I also wanted to play with shims, but haven't gotten the chance to do it. It seems like a powerful tool, but I'm not sure how usable is it for this scenario, how stable is it between Windows versions, etc.

A custom driver that signals the event on receiving notifications for new process creation using PsSetCreateProcessNotifyRoutineEx

a simple driver that subscribes to processes creation notifications

Interesting, it's impressive how small it is. When I thought about a driver, I had an injdrv-style driver in mind, which is much more complex.

in practice, in my tests, notifying Windhawk that early [...] is enough for Windhawk to be able to inject the new process using an APC on the main thread

That's great, but it's slightly concerning that it's not guaranteed, as the behavior might differ under a load, or with different hardware or Windows version. Still, I must say that it looks very appealing.

Also, perhaps running the event listener thread in high priority can make it even more likely to get Windhawk notified on time.

but we are guaranteed that the user space "sees" the new process when the system calls the notification routine in the driver

Are you saying it based on tests, or based on your understanding of the flow? Relying on tests here can be dangerous, too, since the behavior might differ if Windhawk is notified very early. If, when in PsSetCreateProcessNotifyRoutineEx, the process and its first thread are already accessible via NtGetNextProcess/NtGetNextThread, it should be fine.

While I personally run it just fine using ssde, it is not doable for everyone

Interesting, but yeah, probably not a method that can be used by a legit app users install.

Do you think such a driver has any chance of getting signed by Microsoft? I mean, any exe could come and set up such an event and then be notified about new process creation.

I can't say for sure, but I see no reason why not. I can't see how it can be misused. If an exploit can use it, it can probably also just busy loop to catch a process in time.

the fact that there is no such user space API is pretty infuriating tbh

Yeah, totally.

Do you happen to own an EV certificate for code signing?

No, only code singing.

Actually, my certificate is about to expire soon, and since there's a new policy that forces using a physical Hardware Security Module, I'm looking into Azure Trusted Signing which simplifies the certificate signing process. Unfortunately, it doesn't issue EV certificates.

I also thought, for such a small driver, perhaps we can ask some open source project to help and sign it for us, for example System Informer.

I'd be happy to integrate such a driver into Windhawk. The main reason Windhawk injects its dll into all processes is the early injection, but it inevitably causes incompatibilities with some programs. You can see some of them in the pinned issues here. Having the driver will allow Windhawk to only inject its dll into processes which are targeted by a mod, which will greatly reduce the likelihood of an incompatibility for most users.

@namazso
Copy link

namazso commented Jul 1, 2024

Re: HSM keys for signing

You can actually use Azure Key Vault and https://github.com/namazso/AzuKI or AzureSignTool for code signing without a HSM

@namazso
Copy link

namazso commented Jul 1, 2024

Also, regarding drivers, I have a “universal” driver signed that I could offer for usage, however it can’t do any callbacks.

@m417z
Copy link
Member

m417z commented Jul 27, 2024

The Global\WindhawkScanForProcesses event is now part of Windhawk in version 1.5.

About the driver, which seems to be the most promising solution so far, any thoughts about moving it forward? Perhaps a good start would be to move WhSignalDrv.c to a dedicated repository, and making a one-click installation package via ssde. That will make it easier for me to start telling users to try it out.

@m417z
Copy link
Member

m417z commented Jul 28, 2024

After a bit of extra thought, these come to mind:

  • WMI. My guess is that it's going to be too slow and/or unreliable, but the code is right there so it should be easy to try.
  • Using an existing driver. Maybe ProcMon's? Will require some reversing, and probably problematic to bundle with Windhawk due to the license, but we can start with it as an option for users to set it up themselves and play with it while we think for a better solution. Maybe System Informer is a better option, both regarding reversing (it's open source) and license (it's MIT, not sure about just bundling their signed driver).

@valinet
Copy link
Author

valinet commented Jul 29, 2024

WMI. My guess is that it's going to be too slow and/or unreliable, but the code is right there so it should be easy to try.

I think I tried it a while ago and while it did notify, it wasn't in time at all times or blocking, so still the target code in the target executable might have always executed.

Using an existing driver. Maybe ProcMon's? Will require some reversing [...]. Maybe System Informer is a better option, both regarding reversing (it's open source) and license (it's MIT, not sure about just bundling their signed driver).

Yeah, idk. Like, the whole point with signed drivers is for the system to be more secure they say. Yet, because of this (unnecessary imo) added friction, instead of having a REALLY simple, targeted driver that does the job and that's it (not much place to introduce bugs when all the driver does is signal a kernel object), we have to resort to hacking together a solution using some driver that's made for a totally different purpose which naturally introduces a much wider attack vector, due to the more complexity it has. And there's always the possibility we find some hack to pull it off, but then the question comes: is that intended behavior of the driver, or really a quirk that should be patched out in the name of security. Personally, I am tired of hacks; I very much prefer a clean room, simple, targeted solution that does whatever job it has to do and nothing more, with easy to audit code etc. It's one of the core reasons I appreciate Windhawk so much, the ease of use and the modularity of it for me matters so much more than anything else. But yeah, that's just my personal preference.

I wanted to play with this method a while ago, but @namazso, the SecureUxTheme author, shared some negative experience with it, such as performance issues. [...]

Okay, so AppVerifier is out.

Interesting, it's impressive how small it is. When I thought about a driver, I had an injdrv-style driver in mind, which is much more complex.

Me too initially, but, before committing to doing a bunch of work, I took a step back and thought about what the problem really is that I am trying to solve: it was not about the class of apps Windhawk is unable to inject, which I took them as a given for the exercise as well, but rather, Windhawk not being notified in time - if it were, the existing mechanism worked just fine. So I went from there, and in the end the solution did not turn up to be that complicated indeed.

Are you saying it based on tests, or based on your understanding of the flow? Relying on tests here can be dangerous, too, since the behavior might differ if Windhawk is notified very early. If, when in PsSetCreateProcessNotifyRoutineEx, the process and its first thread are already accessible via NtGetNextProcess/NtGetNextThread, it should be fine.

No, I am relying on the official documentation and the semi-official documentation that the Windows Internals book is.

Firstly, PsSetCreateProcessNotifyRoutineEx states this:

The operating system calls the driver's process-notify routine at PASSIVE_LEVEL inside a critical region with normal kernel APCs disabled. When a process is created, the process-notify routine runs in the context of the thread that created the new process."

This correlates with what is written in the book. In chapter 3, "Processes and jobs", the entire process creation flow is described. They describe a couple of relevant stages for process creation there:

  • Stage 3: Creating the Windows executive process object
  • Stage 4: Creating the initial thread and its stack and context
  • Stage 5: Performing Windows subsystem-specific initialization
  • Stage 6: Starting execution of the initial thread
  • Stage 7: Performing process initialization in the context of the new process

Skipping through less relevant parts, of interest is "Stage 3F" paragraph 3:

The new process object is inserted at the end of the Windows list of active processes (PsActive - ProcessHead). Now the process is accessible via functions like EnumProcesses and OpenProcess.

This step occurs way before the initial thread is created by the system later (in stage 4) or when that thread actually starts executing (in stage 6). I'd say detecting the process at this stage is too early, as the book also says that in stage 4, PspInsertThread does this:

  1. Checks are made to ensure that the process hasn't already been terminated, that the thread hasn't already been terminated, or that the thread hasn't even been able to start running.

So, if one injects at that point, it is too early. It is safe to assume that the best time to inject is after the system has created and started the initial thread, but before the process has made significant progress (i.e. started executing the main code in the executable), so stage 6 as they call it in the book. Anytime before that one risks perturbating the internal workings of the kernel/loader. The book also mentions this step in stage 4:

  1. If it's the first thread created in the process (that is, the operation happened as part of a CreateProcess* call), any registered callbacks for process creation are called. Then any registered thread callbacks are called. If any callback vetoes the creation, it will fail and return an appropriate status to the caller.

Note that this is before the first thread starts executing , but after it is allocated. They do not mention which callbacks they are talking about here, so one would naturally think about the ones set by PsSetCreateProcessNotifyRoutineEx. But if you read on, in section 7, they say this:

The new thread begins life running the kernel-mode thread startup routine KiStartUserThread. KiStartUserThread lowers the thread's IRQL level from deferred procedure call (DPC) level to APC level and then calls the system initial threa droutine, PspUserThreadStartup. The user-specified thread start address is passed as a parameter to this routine. PspUserThreadStartup performs the following actions: [...]

  1. It lowers IRQL to PASSIVE_LEVEL (0, which is the only IRQL user code is allowed to run at).
  1. It calls DbgkCreateThread, which checks whether image notifications were sent for the new process. If they weren't, and notifications are enabled, an image notification is sent first for the process and the for the image load of Ntdll.dll. Note: This is done in this stage rather than when the images were first mapped because the process ID (which is required for the kernel callouts) is not allocated at that time.

This correlates with the documentation for PsSetCreateProcessNotifyRoutineEx as mentioned earlier. That executes at PASSIVE_LEVEL, and the callback routine offers the process ID as a parameter, which the book explicitly says is available only here:

void PcreateProcessNotifyRoutineEx(
  [_Inout_]           PEPROCESS Process,
  [in]                HANDLE ProcessId,
  [in, out, optional] PPS_CREATE_NOTIFY_INFO CreateInfo
)

It again mentions "notifications that were already sent", which I presume are those in stage 4 which may correspond to to some other, older, process creation notification mechanism? Idk...

Anyway, after some more steps are performed, the book concludes:

Once the function returns, NtContinue restores the new user context and returns to user mode. Thread execution now truly starts. RtlUserThreadStart uses the address of the actual image entry point and the start parameter and calls the application's entry point. The two parameters have also already been pushed onto the stack by the kernel.

I think it is safe to conclude that PsSetCreateProcessNotifyRoutineEx notifications are executed before any of the actual code in the target executable's image has executed, which is what we are after. Also, at that point in time, the process is "visible" from user space, so Windhawk will be able to find it, per the documentation. Furthermore, it is in a suitable state regarding to what Windhawk expects (there is a thread and it hasn't started executing, so the APC method is employed).

The only remaining open question here is whether/how can we be sure that Windhawk injects the newly created process (via an APC) before the actual code in the image starts executing in that system prepared initial thread. I think APCs are executed before execution is handed over to the code in the image, but how can we be sure Windhawk injects the target executable by that time, after receiving a "ping" from a PsSetCreateProcessNotifyRoutineEx notification callback? In my tests, it seemed to work at all times, regardless of the load on the actual system, but I don't necessarily know how to demonstrate it.

One cannot "wait" for Windhawk to inject in the notification routine, since the APC gets a chance to execute only after it. I say "waits" in the sense of expecting something from Windhawk that is done in that APC. But maybe we could wait for a sign from Windhawk, but that sign should come from the main executable that does the injection instead. At this point, what we are interested in is some notification that Windhawk has returned from WaitForSingleObject/friends, and that it finished looping through the process list and has scheduled the APC on the thread. Actually, knowing that it has looped through the process list again is enough - maybe it fails scheduling the APC for some reason, we shouldn't care, execution should carry on and this is an "uninjectable" process and life moves on.

Maybe the driver could be expanded a bit; I suggest this workflow:

  1. In the PsSetCreateProcessNotifyRoutineEx notification routine, open the Global\WindhawkScanForProcesses event (hMainEvent). If open fails, then Windhawk is not running/dead/crashed/not installed/whatever, we bail and simply return from the routine.
  2. If opening succeeds, create an event called Global\WindhawkNewProcess{%d}, where %d is the process ID we simply get as a parameter in the notification routine (hEvent). *
  3. Signal hMainEvent.
  4. WaitForSingleObject on hEvent for 200 ms or so. That's the time we give Windhawk to "wake up", scan for new processes, identify this one and inject it. After it injects it, Windhawk should signal Global\WindhawkNewProcess{%d} (it can get the process ID when it enumerates and identifies the new process) and then close the event.
  5. The WaitForSingleObject on hEvent in the driver returns. Simply bail out of the notification routine, we are done, we are sure the process has been identified by Windhawk and injection has been attempted on it. That it succeeded or not is not the business of this exercise. If we look at the return value, we can see whether Windhawk attempted something (WAIT_OBJECT_0 + 0), or that it crashed/it is not there (anymore) (WAIT_TIMEOUT).

If Windhawk crashes in the new process enumeration stage, process creation will be delayed by 200 ms once, which is not the end of the world, since it will happen only once. If Windhawk takes longer, whatever, we only allow 200 ms. To make it configurable, again, you could employ the registry, or keep it simple and have n named events, like Global\WindhawkNewProcessWait100, Global\WindhawkNewProcessWait200, and Global\WindhawkNewProcessWait300, .... Whichever of these exist, we wait that many seconds. Just a thought.

  • There is also this hypothetical case: Windhawk loops the process list, opens Global\WindhawkNewProcess3, signals it but then it looses the CPU quantum. In the mean time, PID 3 ends, and another process is created and is assigned PID 3. In that case, PsSetCreateProcessNotifyRoutineEx is executed in that new process and it will find the Global\WindhawkNewProcess3 event already exists, as it is held by Windhawk. What to do then? Resignaling hMainEvent will have Windhawk loop through the process list again, but will it detect this new process. If so, then nothing to do there. If not, then we should bail out there is we find the event already opened and in that case we simply fail to inject early. This case should anyway be impossible to hit in the real world.

Ofc, this complicates things a bit, but just a bit honestly. It's not that much more complicated and it's just event ping pong, nothing fancier than that, so once the quirks are ironed out, it should be rock solid. What do you think, do you have some time to test this out on your build and see if it works satisfactorily?

Also, perhaps running the event listener thread in high priority can make it even more likely to get Windhawk notified on time.

Yeah, it should help indeed.

I also thought, for such a small driver, perhaps we can ask some open source project to help and sign it for us, for example System Informer.

Yeah, indeed, that's a good route to take imo as well.

I'd be happy to integrate such a driver into Windhawk. The main reason Windhawk injects its dll into all processes is the early injection, but it inevitably causes incompatibilities with some programs. You can see some of them in the pinned issues here. Having the driver will allow Windhawk to only inject its dll into processes which are targeted by a mod, which will greatly reduce the likelihood of an incompatibility for most users.

Yeah, with such a notifier DLL, Windhawk will get notified for basically each and every new process, and from there it can attempt injecting only processes of interest (those targeted by a mod, as you say). Since it is not required to hook CreateProcessInternal/friends to detect new process creation anymore, indeed it doesn't have to inject all processes anymore. In the long run, I think that's the better approach. I think we are safer the more simple the driver is kept and the more of the grunt work is actually done in user space, leveraging the driver only for the tiny aspects not available from user space and nothing more. A clear workflow and only working with events in a highly controlled way, without other shenanigans, should ensure that the driver turns out mostly bug free.

In the end, looking on the Windows Internals book, I just now realized that yet another mechanism for receiving process creation notifications that I have overlooked is the Debugger entry in HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\procname.exe. Would that be usable and work better than a driver? One disadvantage is that it obviously breaks a natural process creation chain some applications might expect: if A creates a, what would now happens is A creates Windhawk helper which has to figure out a way to call a with all the expectations A might have set for it, but on the other hand, it knows about a before it is created. Idk, definitely a mechanism, but looks tricky to me, what do you think?

There is also the AppCert DLLs method which I haven't really looked into (did not know about it). I can't seem to find much documentation, yet it seems to provide user space notifications precisely for process creation. The link I included says "This technique doesn't work reliably with GUI applications", but yeah, idk what that means in practice.

Looking forward to hearing your thoughts about this.

@Chaoses-Ib
Copy link

There is also the AppCert DLLs method which I haven't really looked into (did not know about it). I can't seem to find much documentation, yet it seems to provide user space notifications precisely for process creation. The link I included says "This technique doesn't work reliably with GUI applications", but yeah, idk what that means in practice.

From microsoft/Detours#210:

I want to inject my dll into every process whenever it is started by user. I found AppCertDLLs does exactly this. I tried this dll and it worked (system did not want to start at first, but in later boots I managed to login, but this time explorer freezed whenever I tried to copy, delete or move file. -I guess it is related to messagebox, since it tries to use gui, windows ( version 21H1 )was problematic- ).

However, I also found a project that uses AppCertDLLs and works: https://github.com/leecher1337/ntvdmx64 . From my cursory look at the issues, AppCertDLLs doesn't seem to cause many compatibility problems. Its DLL only deponds on ntdll.dll and KERNEL32.dll, not user32.dll in the Detours issue, which may be the key to keep it compatible with other processes. Only using ntdll.dll may provide the best compatibility.

@WildByDesign
Copy link

From a base kernel-driver perspective, there is this: (CreateProcessNotifyEx by hazelfazel)

From the README:

The driver registers a callback routine to be called whenever a process is
created or deleted. This driver can be used for process creation monitoring. You
can easily expand the driver to also block process creation attempts for specific
parents invoking new processes.

This is source code for a CreateProcessNotifyEx kernel-driver. So no doubt, there still exists the issue of requiring the compiled driver to be digitally signed (and signed by Microsoft Windows as well).

@namazso
Copy link

namazso commented Jul 29, 2024

I do have a compiled and signed “universal” driver that can implement anything as long as it doesn’t need multithreading or callbacks. But I don’t see an obvious way to avoid need for callbacks in this use case.

@m417z
Copy link
Member

m417z commented Jul 31, 2024

WMI. My guess is that it's going to be too slow and/or unreliable, but the code is right there so it should be easy to try.

I think I tried it a while ago and while it did notify, it wasn't in time at all times or blocking, so still the target code in the target executable might have always executed.

Yeah, that's what I thought, the WMI architecture is probably too optimized with buffering and stuff, so it takes time until consumers see the event.

A friend suggested to check out WNF. After a quick search, I found that it has the WNF_SHEL_APPLICATION_STARTED, WNF_SHEL_DESKTOP_APPLICATION_STARTED events which seem relevant. Might be worth giving it a try.

I very much prefer a clean room, simple, targeted solution that does whatever job it has to do and nothing more, with easy to audit code etc.

I totally agree, it's just that right now all ideas feel either unacceptable or out of reach, so my bar for ideas is low. Indeed, while it can be nice for playing around, I'd be very hesitant to include a third-party driver with Windhawk, surely not as a default option.

No, I am relying on the official documentation and the semi-official documentation that the Windows Internals book is.
[...]
I think it is safe to conclude that PsSetCreateProcessNotifyRoutineEx notifications are executed before any of the actual code in the target executable's image has executed

Yes, looks solid.

Maybe the driver could be expanded a bit; I suggest this workflow:

That's very similar to what I had in mind as well when thinking about it, just not as detailed and formalized.

  • There is also this hypothetical case: Windhawk loops the process list, opens Global\WindhawkNewProcess3, signals it but then it looses the CPU quantum. In the mean time, PID 3 ends, and another process is created and is assigned PID 3. In that case, PsSetCreateProcessNotifyRoutineEx is executed in that new process and it will find the Global\WindhawkNewProcess3 event already exists, as it is held by Windhawk. What to do then? Resignaling hMainEvent will have Windhawk loop through the process list again, but will it detect this new process. If so, then nothing to do there. If not, then we should bail out there is we find the event already opened and in that case we simply fail to inject early. This case should anyway be impossible to hit in the real world.

"event already exists, as it is held by Windhawk. What to do then?" - Just open and signal it as usual. I believe there's no problem here, just as with the previous case of Windhawk relaunching while the kernel holds the main event, it will just get reopened and reused.

"Resignaling hMainEvent will have Windhawk loop through the process list again" - Right.

"will it detect this new process" - It should, since it didn't encounter it in the previous iteration. Windhawk doesn't track PIDs. It's a new process which was appended to the end of the list and it should be returned by NtGetNextProcess.

Let me know if I missed anything in this scenario.

What do you think, do you have some time to test this out on your build and see if it works satisfactorily?

Sure, I can implement the Windhawk side if that's what you meant.

with such a notifier DLL [...] it can attempt injecting only processes of interest (those targeted by a mod, as you say)

There's a bit of extra development here - if a new mod is installed, I need to go over the processes again and inject the engine to processes which became relevant. And, if I want it to be extra slick, to unload it from processes which are no longer targeted when a mod is disabled. But it's solvable.

In the long run, I think that's the better approach [...] leveraging the driver only for the tiny aspects [...]

Agree.

yet another mechanism for receiving process creation notifications [...] Image File Execution Options

I'm familiar with it, but I don't think it's very helpful here. Aside from the downsides you mentioned, it needs to be registered for each executable name, and you can't use a wildcard or something. That is unless it provides additional functionality I'm not aware of.

AppCert DLLs method

I'm not familiar with this one. Might be worth exploring.

@valinet
Copy link
Author

valinet commented Jul 31, 2024

A friend suggested to check out WNF. After a quick search, I found that it has the WNF_SHEL_APPLICATION_STARTED, WNF_SHEL_DESKTOP_APPLICATION_STARTED events which seem relevant. Might be worth giving it a try.

A new bomb has dropped =))) I wasn't aware this infrastructure exists at all, REALLY interesting stuff, I will surely take a look at it. Being undocumented officially, I don't know how robust a solution around this might be - are the structs/functions changing between Windows versions? Like, is this used by any third party or in first- or third-party products not bundled with the OS? (Because that is what usually stabilizes such an API from breaking ABI changes) Open question, I presume you do not necessarily know the answer, just something to be aware of.

I totally agree, it's just that right now all ideas feel either unacceptable or out of reach, so my bar for ideas is low. Indeed, while it can be nice for playing around, I'd be very hesitant to include a third-party driver with Windhawk, surely not as a default option.

Yeah, I'd also refrain from including any third party driver from Procmon/Process Hacker/whatever. A simple driver similar to the one described above, well tested, developed in tandem, kept alongside Windhawk and properly signed by Microsoft I think is an acceptable option. It could live behind a toggle in the advanced program settings, and only then deployed on the user's machine.

That's very similar to what I had in mind as well when thinking about it, just not as detailed and formalized.

That's great to hear. Regarding that scenario I mentioned, yeah, I don't have anything to add, I just wanted to mention it so you give it a thought as well and see if I might have missed something obvious, other than that, it too looks fine to me, no issue there.

Sure, I can implement the Windhawk side if that's what you meant.

Yeah, that would be great. It basically boils down to signaling Global\WindhawkNewProcess{%d} events for each new PID that was found after a scan, after injection was attempted by Windhawk on the respective process. That should set the infrastructure on the Windhawk side, and then I will update and test the driver against this new behavior.

Mass automated deployment with ssde is not an option, due to users having to load their machines with a PK they create, which is very vendor specific, so the final solution involves getting the driver signed by Microsoft. Unless we find anyone willing to sign it, I will probably end up getting an EV certificate for myself and sign it with that. I contemplated this before, yet I don't know which of the providers to choose (keeping a balance between the cost which is already quite high and the smoothness of the certification process). Another sketchier option is to sign the driver with a leaked expired code signing certificate, yet Microsoft's been on a spree lately disabling the load of drivers signed with such certificates altogether in recent Windows versions, so yeah, not really an option.

There's a bit of extra development here - if a new mod is installed, I need to go over the processes again and inject the engine to processes which became relevant. And, if I want it to be extra slick, to unload it from processes which are no longer targeted when a mod is disabled. But it's solvable.

Yeah, in the long run, it is something to look at. For now, a quick patch adding the minimal infrastructure described above would suffice. Even in the future, the old behavior would kind of have to be kept around, for cases when the user doesn't/can't load the driver to enable the more advanced behavior.

AppCert DLLs method

Yeah, I have to look into it personally as well.

@m417z
Copy link
Member

m417z commented Aug 17, 2024

are the structs/functions changing between Windows versions? Like, is this used by any third party or in first- or third-party products not bundled with the OS? (Because that is what usually stabilizes such an API from breaking ABI changes) Open question, I presume you do not necessarily know the answer, just something to be aware of.

Yeah, I don't really know. I saw WIL (the unpublished parts) wrappers for WNF used by the taskbar code. I didn't find much code using it on GitHub, mainly exploitation tools and experimental stuff.

I played a bit with a WNF01.cpp example that I found, that monitors the WNF_SHEL_APPLICATION_STARTED, WNF_SHEL_DESKTOP_APPLICATION_STARTED events. From a quick test, it's not fired quickly enough to be able to inject via APC. Also, it's not fired for some processes, for example StartMenuExperienceHost.exe.

So far, doesn't look so promising.

Actually, while typing this message, I decided to dive a bit deeper. It seems that these WNF events are actually published by the taskbar code, CreativeFramework::Triggers::PublishWnfStateForApplicationEvent in Taskbar.dll, call stack:

taskbar.CreativeFramework::Triggers::PublishWnfStateForApplicationEvent
taskbar.<lambda_768b330fb0fd69b248c2627ffae48033>::operator+80
taskbar.wil::details::functor_wrapper_void<<lambda_768b330fb0fd69b248c2627ffae48033> &>::Run+D
taskbar.wil::details::RunFunctorWithExceptionFilter+2D
taskbar.CreativeFramework::Triggers::LogEventIfNecessary+11B
taskbar.<lambda_c4cbaf1e7368103966e4b98896a32361>::operator+9C
taskbar.wil::details::functor_wrapper_void<<lambda_c4cbaf1e7368103966e4b98896a32361> &>::Run+D
taskbar.wil::details::RunFunctorWithExceptionFilter+2D
taskbar.CreativeFramework::Triggers::LogEventForWin32IfNecessary+91
taskbar.CTaskBand::_HandleItemResolved+586
taskbar.CTaskBand::_HandleWindowResolved+116
taskbar.CTaskBand::v_WndProc+9F0
taskbar.CImpWndProc::s_WndProc+C8
user32.UserCallWinProcCheckWow+2D1
user32.DispatchMessageWorker+1F1
explorer.CTray::_MessageLoop+1AF
explorer.CTray::MainThreadProc+60
shcore._WrapperThreadProc+11D
kernel32.BaseThreadInitThunk+1D
ntdll.RtlUserThreadStart+28

So the events notify when an app appears on the taskbar. That's not very helpful for our use case.

signaling Global\WindhawkNewProcess{%d} events for each new PID that was found after a scan

I gave it some more thought, and there's a potential problem with it and the way Windhawk currently enumerates processes:

Windhawk uses NtGetNextProcess to get new processes since the last enumeration. NtGetNextProcess automatically skips inaccessible processes, which means that if Windhawk doesn't have permissions to access a process, it won't be enumerated and Windhawk won't know about it and won't signal the appropriate event.

I looked for an alternative solution which is still robust, simple and fast. I came up with the following, it's not as simple as I wished, perhaps you'll have a better idea. Basically the idea is to use a shared counter instead of process IDs. Pseudo-code:

int* sharedStartedCounter;
int* sharedEndedCounter;

PsSetCreateProcessNotifyRoutineEx() {
    HANDLE hMainEvent = OpenEvent("WindhawkScanForProcesses");
    if (!hMainEvent) {
        // Windhawk is not available.
        return;
    }

    int lastStarted = InterlockedRead(sharedStartedCounter);
    HANDLE hScanEvent = CreateEvent(sprintf("WindhawkScan%d", lastStarted + 1));

    // Probably will never happen in practice, but just for extra correctness.
    if (InterlockedRead(sharedEndedCounter) >= lastStarted + 1) {
        return;
    }

    SetEvent(hMainEvent);
    CloseHandle(hMainEvent);

    if (WaitForSingleObject(hScanEvent, 200) == WAIT_TIMEOUT) {
        // Give up.
        return;
    }

    CloseHandle(hScanEvent);
}

WindhawkLoop() {
    HANDLE hMainEvent = CreateEvent("WindhawkScanForProcesses");

    while (WaitForSingleObject(hMainEvent) == WAIT_OBJECT_0) {
        int counter = InterlockedIncrement(sharedStartedCounter);

        HandleNewProcesses();

        InterlockedExchange(sharedEndedCounter, counter);

        HANDLE hScanEvent = OpenEvent(sprintf("WindhawkScan%d", counter));
        if (hScanEvent) {
            SetEvent(hScanEvent);
            CloseHandle(hScanEvent);
        }
    }
}

What do you think?

Edit: A slightly more elegant solution with a single counter:

Code

// Each process handling loop increases the counter before the loop starts, and
// then increases it again after the loop finishes, before signaling the event.
// That means that the loop is running when the counter is odd. Example:
// 0 - idle, 1 - started, 2 - finished, 3 - started again, ...
int* sharedCounter;

PsSetCreateProcessNotifyRoutineEx() {
    HANDLE hMainEvent = OpenEvent("WindhawkScanForProcesses");
    if (!hMainEvent) {
        // Windhawk is not available.
        return;
    }

    int nextFinishCounter = (InterlockedRead(sharedCounter) + 3) & ~1;
    HANDLE hScanEvent = CreateEvent(sprintf("WindhawkScan%d", nextFinishCounter));

    // Probably will never happen in practice, but just for extra correctness.
    if (InterlockedRead(sharedCounter) >= nextFinishCounter) {
        return;
    }

    SetEvent(hMainEvent);
    CloseHandle(hMainEvent);

    if (WaitForSingleObject(hScanEvent, 200) == WAIT_TIMEOUT) {
        // Give up.
        return;
    }

    CloseHandle(hScanEvent);
}

WindhawkLoop() {
    HANDLE hMainEvent = CreateEvent("WindhawkScanForProcesses");

    while (WaitForSingleObject(hMainEvent) == WAIT_OBJECT_0) {
        InterlockedIncrement(sharedCounter);

        HandleNewProcesses();

        int counter = InterlockedIncrement(sharedCounter);

        HANDLE hScanEvent = OpenEvent(sprintf("WindhawkScan%d", counter));
        if (hScanEvent) {
            SetEvent(hScanEvent);
            CloseHandle(hScanEvent);
        }
    }
}

@namazso
Copy link

namazso commented Aug 17, 2024

I've re-read the thread, and I think if we're going the kernel driver way, I'd recommend a rather different approach: Just inject everything. Antiviruses have been doing this for forever even despite their horrible code quality, so someone else also doing that is just a drop in the ocean.

While this may still sound scary, there are several ways to soften the blow and make the whole thing much safer than even AV garbage:

When to inject

Probably the most compatible way to ever inject anything is the APCs that get called by NtTestAlert right before running the main entry point. This can be easily achieved by queueing them in PsCreateProcessNotifyRoutine.

What to inject

A minimal ntdll-only client that can notify the main service and determine whether actual injections are supposed to take place. Being ntdll-only should prevent messing up initialization order of everyone else. It also lets us do the "you should never do this" thing: calling LoadLibrary in DllMain without any consequences as initialization order loops are impossible.

How to deal with permissions

I'd recommend making this injection driver semi-universal and reusable across projects, while also making permission management simpler. A way to do this would be requiring software that wants to inject others to register their injection requests with the driver, such as by an IOCTL. This injection would be "live" as long as the requesting process has the handle open (and as a consequence is running). During the PsCreateProcessNotifyRoutine, you could simply just use the token of the process owning the injection that is currently being processed, and impersonate them to do the inject flow (OpenProcess, AllocateVirtualMemory, WriteVirtualMemory, QueueAPC, CloseProcess) from the kernel. This will inherently follow Windows permissions (and anti-cheats' Ob callbacks too I think?) without any extra effort.


I think this proposed design would satisfy about all concerns around permissions, safety, timing, plus it makes reusing easier for other projects.

@m417z
Copy link
Member

m417z commented Aug 17, 2024

AppCert DLLs method

I did a quick test using this example from 2012, and it just worked. Very nice, but:

  • The DLL isn't loaded by protected processes, which makes sense, but since the idea is that it's loaded by the processes that call CreateProcess, it has the same limitation as the current method:

For example, since services.exe cannot be injected (it being PPL WinTcb), usually Windhawk fails to inject in time for most services, which is not that great

  • A process needs to be started after the dll is registered to load it in CreateProcess. That means that after installing Windhawk, it won't affect existing processes, and a reboot is required to get it fully working.
  • While less likely, the DLL can cause incompatibilities in the processes it's loaded in.

All in all, while it's an interesting tool, I'm not sure it brings significant benefits over the current method Windhawk uses.

Just inject everything

If it can be avoided, I prefer to avoid it. That's what Windhawk does now, and it causes incompatibilities. While incompatibilities can be reduced, the best way to not cause incompatibilities is to not inject any code.
Here are several examples, past and present:

As I see it, a search for a better injection method that's explored in this thread has two main goals: ensuring early injection and reducing incompatibilities (if possible, by only injecting code into target processes).

This can be easily achieved by queueing them in PsCreateProcessNotifyRoutine

Do you have an example? Is it still easy with the edge cases, WOW64 etc.? I'm familiar with injdrv but it's not as simple as queueing an APCs that get called by NtTestAlert. It has more ambitious goals such as being called earlier.

If that's indeed easy, perhaps it's a route that can be explored as well. To address the "only inject into target processes" point, perhaps Windhawk can maintain a list of targets in the registry, and the driver can read that list to decide whether the process is an injection target. One downside is that it will be more difficult to make more advanced rules, such as excluding all child processes of x.exe.

What are the upsides compared to the event method, aside from performance?

@namazso
Copy link

namazso commented Aug 17, 2024

Here are several examples, past and present

Every single one of these are caused by Windhawk being present during operation of the program. If the hypothetical injection determines no further injection is needed, it can just return FALSE in DllMain, which will unload the module, before anything happens at all.

In general the whole effect of such an injection can only be observed from threads created in DllMain (not recommended by MSDN btw) of static imports, as as soon as Ldr lock is released, they can execute code that could observe the section being momentarily mapped. (Not in the Ldr list however, as the Ldr lock is held again for the whole duration of LdrLoadDll)

Do you have an example?

Not right now, but it should be fairly easy to create, many antiviruses do this already. (I don't know why since it's wrong as user code can run before that but oh well... That's not my problem to solve)

Is it still easy with the edge cases, WOW64 etc.?

Don't think it should be particularly hard, after all it's just a mirror of a normal usermode injection as far as capabilities are concerned.

It has more ambitious goals such as being called earlier.

Yes, that makes it much more complicated, but I don't think such early inject is really necessary for Windhawk?

One downside is that it will be more difficult to make more advanced rules, such as excluding all child processes of x.exe.

Yeah, plus rule parsing in the kernel just sounds like a bad idea. Same for recovering from issues, with injections being ephemeral and handle-associated just disabling or stopping the service would automagically fix anything.

What are the upsides compared to the event method, aside from performance?

Performance and avoiding race conditions. In the previous pseudocode it seems that timing out may make Windhawk inject at the wrong time. In general calling from kernel to usermode is just a bad idea, no matter how the "call" is made. Users are also very creative with breaking things by messing with process priorities and whatnot.

@m417z
Copy link
Member

m417z commented Aug 17, 2024

Every single one of these are caused by Windhawk being present during operation of the program.

AVs? I don't think so. But I guess they'll be more tolerant to code injection from kernel mode.
Games? I don't know, I didn't research these. If they have some kernel-mode verification (say PsSetLoadImageNotifyRoutine) or other anti-cheat tricks (e.g. self-debugging) they may still be affected.
VirtualBox? The incompatibility is probably due to the hook at CreateProcess, so yes, it'll be solved.

I still think that if it's possible to avoid code injection, it's better to do so.

it should be fairly easy to create, many antiviruses do this already

Well, it's not an indication that it's easy. And I wouldn't say that usermode injection, with all the edge cases, is easy, And of course with a driver, mistakes have more serious consequences.

Yes, that makes it much more complicated, but I don't think such early inject is really necessary for Windhawk?

No, NtTestAlert is great.

In the previous pseudocode it seems that timing out may make Windhawk inject at the wrong time.

Anything specific? Or in case of the 200 ms timeout?

@teknixstuff
Copy link

teknixstuff commented Aug 20, 2024

After a bit of extra thought, these come to mind:

  • WMI. My guess is that it's going to be too slow and/or unreliable, but the code is right there so it should be easy to try.
  • Using an existing driver. Maybe ProcMon's? Will require some reversing, and probably problematic to bundle with Windhawk due to the license, but we can start with it as an option for users to set it up themselves and play with it while we think for a better solution. Maybe System Informer is a better option, both regarding reversing (it's open source) and license (it's MIT, not sure about just bundling their signed driver).

An idea I have would be to use the mimikatz driver. This driver can be used to make windhawk itself a wintcb protected process (the maximum level allowed), which would give windhawk the ability to inject into all PPs and PPLs, with very minimal changes to windhawk itself (just add the required code to load the driver and call the api it has to protect the windhawk service process during it's init). The mimikatz driver is already signed, so that's not an issue.

@m417z
Copy link
Member

m417z commented Aug 21, 2024

@teknixstuff it can be an interesting experiment to make it work, but I won't be bundling a driver that's intended for exploitation with Windhawk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants