Kernel Mode Threats & Practical Defenses: Part 2

In our last post, we described the evolution of kernel mode threats. These remain a prominent mode of compromise for nation-state attackers, as they are difficult to detect and enable robust persistence. Despite advances in platform protections, kernel mode threats continue to evolve and have been employed in many high profile attacks, such as WannaCry and NotPetya. What can organizations do to protect against kernel mode threats? Fortunately, the state of the art in defense also has evolved to counter this impactful attack trend.

Our own offensive tradecraft research informs our approach to improving defenses from kernel threat vectors. We will first cover our own research which evolved from Red versus Blue exercises, as well as detail deeper analysis into evading current platform protections. With the state of the art in offensive tradecraft established, we will then discuss several approaches to defending against kernel mode threats. We will also introduce two open source tools to help defend against kernel mode threats. Some are fast wins, while others involve hunting and establishing real-time protections. And in all cases, we still highly recommend upgrading to the latest Windows 10 and enable as many protections as feasible in your organization.


Offensive Tradecraft Research

To test and push the boundaries of our defenses, it is essential to comprehend the state of the art in offensive tradecraft. During an internal red vs blue, we first explored leveraging kernel mode malware to evade endpoint security products and the most commonly deployed kernel protections (such as Driver Signature Enforcement). Next, we investigated methods for evading the most advanced kernel protections such as Virtualization Based Security (VBS) and Hypervisor Code Integrity (HVCI).


Red vs. Blue

Endgame periodically conducts internal Red vs Blue exercises to test our product and team’s skills. The Red Team is tasked with emulating adversaries of varying sophistication levels. This includes everything from very noisy commodity malware to mid and upper tier APTs. ​Typically, those of us on the Red Team try to stay stealthy with the latest user-mode in-memory techniques. However, our Blue Team is constantly upping their game and and became increasingly efficient at zeroing in our user mode injection techniques. We decided to pursue kernel-mode in memory techniques to raise the bar.

Turla Driver Loader (TDL) was a key piece of our kernel tradecraft. TDL is an open source implementation of the Turla/Uroburos driver loading technique. In a nutshell, it will load a vulnerable VirtualBox driver. From there, the VirtualBox driver is exploited to load and execute arbitrary kernel shellcode. TDL is built with shellcode that leverages a technique like "MemoryModule" in user mode to manually map an arbitrary driver and call its entry point. Using TDL helps the Red Team achieve two objects: evade driver signature enforcement and never write our driver to disk on the target machine.​

​We had some other high level design goals for the implant. First, we wanted to avoid any user mode components. A very typical design is to use a kernel mode component which injects into a user mode process for performing the primary "implant" functions. The kernel mode malware referenced in these two posts have some user mode components. Avoiding user mode is more time consuming for basic features, but we felt it would be worth it. Injecting anything into user mode would have a high chance of getting caught by our Blue Team. ​This required us to do our network command and control from kernel mode. We chose Winsock Kernel (WSK) as our networking choice because it is very well documented, has great sample code, and presents a relatively easy interface for doing network communications from the kernel.​

To further confuse our Blue Team, we did not want a beacon style implant. Beaconing is by far the most popular technique for malware and we knew it was something they would be looking for on the range. Our initial port opening concept unfortunately could be detected easily. We settled on a more stealthy approach by re-using the DoublePulsar function pointer hook trick to hijack some existing kernel socket. However, we didn't want to leverage the same hook point, expecting it to now be monitored by PatchGuard.​ After digging around in various stock network enabled drivers we settled on the srvnet driver which opens port 445. Our driver egg hunts to locate and hook srvnet’s WskAccept function with our own accept function​. This allows our implant to selectively hijack port 445 traffic.



Leveraging TDL meant that our kernel driver would never touch disk, however there was still a high risk that the loader itself could be caught. As a result, we wanted to ensure the loading process itself was as fileless as possible. This meant starting the chain with either PowerShell or JavaScript. ​We opted for JS due to generally less visibility for defenders. Instead of launching cscript/wscript itself, we used the squiblydoo technique to run a scriptlet from a regsvr32 process.​ For our actual Black Hat demo (below), we updated this to use SquiblyTwo and the winrm.vbs evasion technique.

From here, we used DotNetToJS to load/execute an arbitrary .NET executable from JavaScript.​ We could have exploited and loaded our driver from this .NET executable, but the code for doing this was already written in C. The easier option was to use a MemoryModule style .NET loader to then load and execute the native executable.​ The native executable (TDL) would then load the vulnerable VirtualBox driver, and exploit it to load and map our implant driver into memory.​ In this whole process, the only executable that truly touched disk in native form was the legitimate VirtualBox driver.



Demo 1: Fileless Kernel Mode


Evading VBS/HVCI

When we were accepted to present at Black Hat, we wanted to push the bounds of our offensive tradecraft. Currently, Microsoft's Virtualization Based Security (VBS) combined with Hypervisor Code Integrity (HVCI) will block any unsigned code from ever running in the kernel. This includes DoublePulsar and the implant we wrote for our RvB exercise.​

​We first identified a vulnerable driver because the VirtualBox driver from TDL won’t load while HVCI is enabled. There is virtually an unlimited supply of vulnerable drivers. We grabbed a known-vulnerable sample from Parvez Anwar's site. Since we had the option to choose the vulnerability brought to the endpoint, we chose an easy to exploit write-*what-where vulnerability, as opposed to a more difficult one like a static one byte write. The latter would typically involve some pool corruption (like win32k GDI object) to achieve a full read/write primitive. ​Since the vulnerable driver would dereference a user-supplied pointer when doing the overwrite, it also gave us a handy arbitrary read primitive. ​

HVCI prevents unsigned code from running in the kernel. However, it does nothing to protect the integrity of kernel mode data. Tampering with key data structures can significantly compromise system integrity. ​For example, attackers can "NOP" out certain function calls by modifying the IAT. They can disable EDR kernel to user communications​ or security focused kernel Etw providers such as Microsoft-Windows-Threat-Intelligence​. Data corruption attacks also can be leveraged to elevate privileges by modifying tokens or handles;​ many other techniques are possible.

To explore real world implications of this, we examined the sysmon driver’s method for sending events from kernel mode to user mode for logging. We found that if we modified the IoCsqRemoveNextIrp pointer in the Import Address Table (IAT) to a xor rax,rax ret gadget, events would no longer be logged. A real world attacker could selectively drop events to avoid raising suspicion. It's worth pointing out this is not in any way a flaw in sysmon, as likely every security product is vulnerable to data corruption attacks like this. However, Microsoft could potentially expand VBS to protect certain data regions which should not be modified (such as the IAT in this example).

While data corruption attacks do their part keeping us up at night, we wanted to explore if it was possible to still achieve arbitrary code execution. A talk from Microsoft’s Dave Weston at BlueHat IL provided excellent detail into the design of VBS. However, it also clued us into a gap in the current approach with regard to rear edge control flow guard. Essentially, it is still open season for return oriented programming (ROP) attacks in the Windows kernel.

As mentioned by Peter Hlavaty’s 2015 recon talk, a read-write primitive can be abused to perform stack hooking to achieve code execution via ROP. We were interested in weaponizing this technique against a HVCI hardened system. We created a surrogate thread as our hooking target. From there, our PoC would dynamically build a ROP chain based on the number of parameters in the target function. It only required 10 gadgets to achieve a full N-argument function call primitive. In the next step, we exploited the signed and vulnerable driver to corrupt the kernel stack of the surrogate thread in order to execute the generated ROP chain. The end result is our PoC could call any arbitrary kernel mode function. In one example, attackers could leverage this to inject into protected user mode processes and be largely invisible to AV/EDR. The video below demonstrates how to bypass HVCI using this technique.


NTSTATUS WPM(DWORD_PTR targetProcess, DWORD_PTR destAddress, void * pBuf, SIZE_T Size)
    SIZE_T Result;
    DWORD_PTR srcProcess = CallFunction("PsGetCurrentProcess");

    LONG ntStatus = CallFunction("MmCopyVirtualMemory", srcProcess, 
        (DWORD_PTR)pBuf, targetProcess, destAddress, Size, KernelMode, (DWORD_PTR)&Result);

    return ntStatus;



Demo 2: Evading Hypervisor Code Integrity



Defending Against Kernel Mode Threats

First and foremost, the simplest thing you can do to defend your enterprise from kernel mode threats is to ensure that you are eventing on driver loads. There are readily-available tools to do this, including SysInternals Sysmon and Windows Defender Application Control in audit mode. You should be looking for low-prevalence and known-exploitable drivers. It’s important to build a baseline if possible. As many of you know, you have to understand your company's assets and infrastructure before you can begin looking for adversaries. The same applies for understanding the exposure of kernel modules throughout your organization​.

If possible, defenders should deploy hypervisor code integrity policies to block most legacy drivers. Ideally, you should be whitelisting driver publishers, but maintaining effective whitelists can be very hard. At a minimum, you should mandate WHQL signatures on all drivers. To get a WHQL signature, you have to upload your driver to Microsoft. This theoretically mitigates the threat of stolen certificates because attackers can no longer stealthily sign their malware. However, WHQL is not a panacea. For example, the driver we exploited to evade HVCI is WHQL signed.

Additionally, defenders can supplement code integrity policies with blacklisting of known-exploitable drivers. Starting with Windows 10 Redstone 5, Microsoft will block many known-exploitable drivers by default if HVCI is enabled. This is great for some users, but it doesn’t help those who are running earlier versions of Windows, or who are on Redstone 5 but can’t enable HVCI.​


Open Source Defense: Kernel Attack Surface Reduction (KASR)

To help mitigate the risk of these forever-days for the rest of the world, we released KASR, a free tool which blocks a list of known-exploitable drivers. KASR adds a roadblock for unsophisticated attackers re-using known exploits.  We understand that blacklisting does not scale. It will not prevent attackers who know how to find and exploit vulnerabilities in kernel drivers, but we hope that it will at least stop script kiddies from rampaging around in the kernel.​ Microsoft is in the process of finalizing their RS5 driver blacklist.  When it is complete, we will incorporate it into a future version of KASR.​



Kernel Hunting

Looking back at our RvB, we realized that we needed a better way to hunt for kernel mode threats like DoublePulsar and our fileless implant. Traditional forensic-style techniques involve full memory acquisition and offline analysis, which are both time and bandwidth-intensive. This approach doesn’t scale. To address this problem, we leveraged the same techniques to acquire kernel memory, but instead did the analysis on the endpoint, similar to what "blackbox" rootkit scanners have done for years. This means we can complete a scan in milliseconds.​


There are several techniques available to read physical memory on a Windows machine, including the PhysicalMemory device and MDL-based APIs. One of our favorites was Page Table Entry Remapping (shown above) due to its simplicity and performance.​


Our goal was to generically detect DoublePulsar as it laid dormant, without signatures. We could have scanned through kernel pool/heap memory looking for shellcode-like memory blobs. Unfortunately on Windows 7, the entire NonPagedPool is executable, leaving a fairly large search space, which seemed prone to false positives.​ Instead, we focused on identifying the function pointer hook. The first trick is to identify where function pointers exist in memory. Function pointers are absolute addresses, which means that they need to be relocated if the image is relocated. Thus, to find function pointers, we walk the PE relocation tables of all loaded drivers. Next, we check to see if the relocated value points to an executable section of the driver in the original on-disk copy. Then we check if it is outside of any loaded driver in memory. Finally, if the unbacked memory region it points to is executable, we consider this a hit. ​This technique detects both DoublePulsar and the socket handler hook installed by our kernel mode implant. We released Marta, a free tool that leverages this technique to scan all drivers on the system (typically in milliseconds) and will identify any active infections. We named it Marta after Marta Burgay, the first astronomer to discover a real life double pulsar. Demo 3 in the following section shows Marta quickly identifying a dormant DoublePulsar infection.


Realtime Protections

On-demand scans are great, but we wanted to take this a step further and see if we could catch, and potentially stop, these types of attacks in real time (before any damage is done).​ Having worked on Endgame’s HA-CFI™ product, we are familiar with the Performance Monitoring Unit, or PMU, present on most modern CPUs.  The PMU is a component of the CPU that can be programmed to count the number of times specific low-level events occur on each core.  In this case, we're using indirect near call branch mispredictions. When one of these events occurs, the PMU generates an interrupt, which executes our interrupt service routine.  In this routine, we have a chance to validate and enforce a policy. ​The video below demonstrates our real-time approach to detecting DoublePulsar as a system is infected.


Demo 3: Detecting DOUBLEPULSAR


​To detect unbacked code execution, we keep a list of memory ranges corresponding to the loaded drivers, and validate that the instruction pointer resides within one of those ranges.​ However, our proof of concept suffers from a few weaknesses, including the fact that PatchGuard itself uses unbacked pages in an attempt to hinder reverse engineering.  While it’s fun to catch PatchGuard, this false positive would need to be addressed in a reliable and robust manner, which is difficult given the fact that PatchGuard is undocumented and subject to change at any time.​ Another weakness is that kernel code has the ability to program the PMU. An attacker with knowledge of this system could reprogram the PMU or disable interrupts.​ Finally, as with all kernel drivers, this unbacked detection driver is vulnerable to data attacks, such as IAT patching or attacks on our policy structures. 

As we mentioned earlier, there are currently no kernel protections against ROP (rear flow CFG). Microsoft’s plan to defend against ROP requires Intel Control-flow Enforcement Technology (CET). While promising, CET doesn’t exist in any production processor today.

To cover this gap, we propose a PMU-based protection system that can detect rear flow control flow policy violations.​ We can configure the CPU’s Last Branch Recorded (LBR) mechanism to record every return in the kernel into a circular buffer. We can generate a control flow policy by scanning all the loaded drivers and identifying call instructions. Immediately after these call instructions are their corresponding return sites. The policy we generate is a bitmap listing these valid return sites. We generate policy at startup, and update it as new drivers are loaded.​

​Generating an interrupt for every return instruction is too costly. Instead we exploit the fact that ROP tends to generate a lot of branch mispredictions. We program the PMU to only generate interrupts for mispredicted branches. When the interrupt fires, we validate every return address that was recorded in the LBR.  If any of them are not in the aforementioned policy (aka not call preceded), we consider that a control flow violation.​ If you don’t tune these systems correctly, they can generate too many interrupts and adversely affect system performance. As the demo below shows, with proper tuning we saw roughly a 1% reduction in the JetStream browser benchmark score, while still maintaining 100% detection rate against our exploit.​ The final demo below walks through kernel mode ROP detection.


Demo 4: Detecting Kernel ROP




Windows platform security has greatly improved over the last decade, but kernel mode threats are still a big concern.​ To leverage the latest defenses from Microsoft, you should upgrade to the latest Windows 10 and enable as many protections as feasible in your organization (Secure Boot, VBS, HVCI, etc). Virtualization Based Security is the single largest pain point from a kernel mode attacker’s perspective, but unfortunately it does come with many compatibility issues. Ensure that you are collecting telemetry on the drivers being loaded across your endpoints. Leverage this data to spot anomalous or vulnerable drivers being loaded. Finally, leverage tools that allow you to hunt and detect kernel mode malware that may already be present in your network. Though this may seem like a large endeavor, we hope these two blogs and our two open source tools help raise awareness of kernel mode threats and facilitate protections as these threats evolve.