Disarming Control Flow Guard Using Advanced Code Reuse Attacks
Advanced exploitation is moving away from ROP-based code-reuse attacks. Over the last two years, there has been a flurry of papers related to one novel code-reuse attack, Counterfeit Object-Oriented Programming (COOP). COOP represents a state of the art attack targeting forward-edge control-flow integrity (CFI), and caught our attention in 2016 as we were integrating our CFI solution (HA-CFI) into our endpoint product. COOP largely remains in academia, and has yet to show up in exploit kits. This may be because attackers migrate towards a path of least resistance. In the case of Microsoft Edge on Windows 10 Anniversary Update, protected by Control Flow Guard (CFG), that path of least resistance is the absence of backward-edge CFI. But what happens when Return Flow Guard (RFG) emerges and the attacker can no longer rely on corrupting a return address on the stack?
We were curious to evaluate COOP against modern CFI implementations. This not only is a useful exercise to keep us on top of cutting-edge research in academia and the hacking community, but it also allows us to measure the effectiveness, alter the design, or generally improve upon our own mitigations when necessary. This first of our two-part blog series covers our adventures evaluating COOP function-reuse attacks against Microsoft’s CFG and later our own HA-CFI.
Microsoft Control Flow Guard
There have been a number of papers, blogs, and conference talks already discussing Microsoft’s Control Flow Guard (CFG) at length. Trail of Bits does an excellent job of comparing Clang CFI and Microsoft CFG in two recent posts. The first post focuses on Clang, the second emphasizes Microsoft’s implementation of CFI, while additional research provides further detail on the implementation of CFG.
Bypassing CFG has also been a popular subject at security conferences the past few years. Before we cite some notable bypasses it is first important to note that CFI can be further broken down into two types: forward-edge and backward-edge.
- Forward-Edge CFI: Protects indirect CALL or JMP sites. Forward-edge CFI solutions include Microsoft CFG and Endgame’s HA-CFI.
- Backward-Edge CFI: Protects RET instructions. Backward-edge CFI solutions would include Microsoft Return Flow Guard, components of Endgame’s DBI exploit prevention, as well as other ROP detections including Intel’s CET.
This categorization helps delineate what CFG is designed to protect – indirect call sites – and what it’s not meant to protect – the return stack. For instance, a recent POC ended up in exploit kits targeting Edge using a R/W primitive to modify a return address on the stack. This is not applicable to CFG, and should not be considered a weakness of CFG. If anything, it demonstrates CFG successfully pushing attackers to hijack control flow somewhere other than at indirect call sites. Examples that actually demonstrate flaws or limitations of CFG include: leveraging unprotected call sites, remapping read-only memory regions containing CFG code pointers and changing them to point to code that always passes a check, a race condition with the JIT encoder in Chakra, and using memory-based indirect calls. COOP or function-reuse attacks in general are an acknowledged limitation for some CFI implementations and noted as out-of-scope for Microsoft’s bypass bounty due to “limitations of coarse-grained CFI”. That said, we are not aware of any public domain POCs that demonstrate COOP to specifically attack CFG hardened binaries.
CFG adds a __guard_fids_table to each protected DLL, composed of a list of RVAs of valid or sensitive targets for indirect call sites within the binary. An address is used as an index into a CFG bitmap, where bits can be toggled depending upon whether the address should be a valid destination. An API also exists to modify this bitmap, for example, to support JIT encoded pages: kernelbase!SetProcessValidCallTargets which invokes ntdll!SetInformationVirtualMemory before making the syscall to update the bitmap.
A new enhancement to CFG in Windows 10 Creators Update enables suppression of exports. In other words, exported functions can now be marked as invalid target addresses for CFG protected call sites. The implementation of this requires using a second bit for each address within the CFGBitmap, as well as a flags byte in the __guard_fids_table for each RVA entry when building the initial per process bitmap.
For 64-bit systems, bits 9-63 of the address are used as an index to retrieve a qword from the CFG bitmap, and bits 3-10 of the address are used (modulo 64) to access a specific bit within the qword. With export suppression, the CFG permissions for a given address are represented by two bits in the CFG bitmap. Additionally, __guard_dispatch_icall_fptr in most DLLs is now set to point to ntdll!LdrpDispatchUserCallTargetES where a valid call target must omit ‘01’ from the CFG bitmap.
Implementing this new export suppression feature becomes a bit complicated when you factor in dynamically resolving symbols, since using GetProcAddress implies subsequent code may invoke the return value as a function pointer. Control Flow Guard handles this by changing the corresponding two-bit entry in the CFG bitmap from ‘10’ (export suppressed) to ‘01’ (valid call site) as long as the entry was not previously marked as sensitive or not marked valid at all (e.g. VirtualProtect, SetProcessValidCallTargets, etc.). As a result, some exports will begin as invalid indirect call targets on process creation, but eventually become a valid call target due to code at runtime. This is important to remember later in our discussion. For reference, a sample call stack when this occurs looks as follows:
Schuster et al. identified counterfeit object-oriented programming (COOP) as a potential weakness to CFI implementations. The attack sequences together and reuses existing virtual functions in order to execute code while passing all forward-edge CFI checks along the way. In a similar manner to ROP, the result is a sequence of small valid functions that individually perform minimal computation (e.g. load a value into RDX), but when pieced together perform some larger task. A fundamental component of COOP is to leverage a main loop function, which might iterate over a linked-list or array of objects, invoking a virtual method on each object. The attacker is then able to piece together “counterfeit” objects in memory, in some cases overlapping the objects, such that the main loop will call valid virtual functions of the attacker’s choosing in a controlled order. Schuster et al. demonstrated the approach with COOP payloads targeting Internet Explorer 10 on Windows 7 32-bit and 64-bit, and Firefox on Linux 64-bit. The research was later extended, demonstrating that recursion or functions with many indirect call invocations could also be used instead of a loop, and extended yet again into targeting the Objective-C runtime.
This prior research is extremely interesting and novel. We wanted to apply the concept to some modern CFI implementations to assess: a) the difficulty of crafting a COOP payload in a hardened browser; b) whether we could bypass CFG and HA-CFI; and c) whether we could improve our CFI to detect COOP style attacks.
- Hardcoded offsets are allowed, as dynamically finding gadgets at run-time is out of scope for this particular effort.
- All of Microsoft’s latest mitigations in Creators update are enabled (e.g. ACG, CIG, CFG with export suppression).
- The attacker must not bypass or avoid CFG in any way other than using COOP.
For our initial research, we leveraged a POC from Theori for Microsoft Edge on Windows 10 Anniversary update (OS build 14393.953). However, we designed our payload with Creators update mitigations in mind, and validated our final working COOP payload on Windows 10 Creators update (OS build 15063.138) with export suppression enabled.
An ideal POC would execute some attacker shellcode or launch an application. A classic code execution model for an attacker is to map some controlled data in memory as +X, and then jump to shellcode in that newly modified +X region. However, our real goal is to generate COOP payloads that execute something meaningful while protected by forward-edge CFI. Such a payload provides data points with which we can test and refine our own CFI algorithms. Further, attacking Arbitrary Code Guard (ACG) or the child process policy in Edge is slightly out of scope. We decided an acceptable end goal for our research on Windows 10 Creators Update was to use COOP to effectively disable CFG, opening up the ability to then jump or call any arbitrary location within a DLL. We thus ended up with two primary COOP payloads:
- For Windows 10 Anniversary Update, and a lack of ACG, our payload maps data we control as executable, and then jumps into that region of controlled shellcode after disabling CFG.
- For Windows 10 Creators Update, our end goal was to simply disarm CFG.
Finding COOP Gadgets
Following the blueprint left by Schuster et al., our first order of business was to agree upon a terminology for the various components of COOP. The academic papers refer to each reused function as a virtual function gadget or vfgadget, and when describing each specific type of vfgadget an abbreviation is used such as ML-G for a main loop vfgadget. We opted to name each type of gadget in a more informal way. Terms you find in the remaining post are defined here:
- Looper: the main loop gadget critical to executing complex COOP payloads (ML-G in paper)
- Invoker: a vfgadget which invokes a function pointer (INV-G in paper)
- Arg Populator: a virtual function which preps an argument, either loading a value into a register (LOAD-R64-G in paper), or moving the stack pointer and/or loading values on the stack (MOVE-SP-G in paper)
- Function present on __guard_fids_table
- Contain a loop with exactly 1 indirect call taking 0 arguments
- Loop must not clobber argument registers
Of all the classes of vfgadgets, the search for loopers was the most time consuming. Many potential loopers have some restrictions that make it hard to work with. Our hunt for invokers turned up not only vfgadgets for invoking function pointers, but also many vfgadgets that can very quickly and easily populate up to six arguments at once all from a single counterfeit object. For this reason, there are shortcuts available for COOP when attempting to invoke a single API, which completely avoid requiring a loop or recursion, unless a return value is needed. Numerous register populators were found for all argument registers on x64. It is worth mentioning that a number of the original vfgadgets proposed in the Schuster et al. COOP paper from mshtml can still be found in edgehtml. However, we added a requirement to our effort to avoid reusing any of these and instead find all new vfgadgets for our COOP payloads.
For brevity, I will only step through one of our simplest payloads. A common theme to all of our payloads is the requirement to invoke VirtualProtect. Since VirtualProtect and the eshims APIs are marked as sensitive and not a valid target for CFG, we have to use a wrapper function in Creators Update. As originally suggested by Thomas Garnier, a number of convenient wrappers can be found in .NET libraries mscoree.dll and mscories.dll such as UtilExecutionEngine::ClrVirtualProtect. Because Microsoft’s ACG prevents creating new executable memory, or changing existing executable memory to become writable, an alternate approach is required. Read-only memory can be remapped as writable with VirtualProtect, so I borrow the technique from a BlackHat 2015 presentation, and remap the page containing chakra!__guard_dispatch_icall_fptr as writable, then overwrite the function pointer to point to an arbitrary place in chakra.dll that contains a jmp rax instruction. In fact, there already exists a function in most DLLs, __guard_dispatch_icall_nop, which is exactly that – a single jmp rax instruction. As a result, I can effectively disable CFG since all protected call sites within chakra.dll will immediately just jump to the target address as if it passed all checks. Presumably one could take this a step further to explore function-reuse to attack ACG. To accomplish this mini-chain, the following is required:
- Load mscoree.dll into the Edge process
- Invoke ClrVirtualProtect +W on a read-only memory region of chakra.dll
- Overwrite __guard_dispatch_icall_fptr to always pass check
As seen from the list of vfgadgets above, edgehtml is an important library for COOP. Thus, the first order of business is to leak the base address for edgehtml as well as any other necessary components, such as our counterfeit memory region. This way the payload can contain hardcoded offsets to be rebased at runtime. Using the info leak bug in Theori’s POC, we can obtain all the base addresses we need.
//OS Build 10.0.14393
var chakraBase = Read64(vtable).sub(0x274C40);
var guard_disp_icall_nop = chakraBase.add(0x273510);
var chakraCFG = chakraBase.add(0x5E2B78); //_guard_dispatch_icall...
var ntdllBase = Read64(chakraCFG).sub(0x95260);
//Find global CDocument object, VTable, and calculate EdgeHtmlBase
var [hi, lo] = PutDataAndGetAddr(document);
CDocPtr = Read64(newLong(lo + 0x30, hi, true));
EdgeHtmlBase = Read64(CDocPtr).sub(0xE80740);
//Rebase our COOP payload
rebaseOffsets(EdgeHtmlBase, chakraBase, ntdllBase, pRebasedCOOP);
var hijackedObj = new Array(0);
[hi, lo] = PutDataAndGetAddr(hijackedObj);
var objAddr = new Long(lo, hi, true);
Object.seal(hijackedObj); //Trigger initial looper
Hijack #1: Invoking LoadLibrary
As previously stated, my end goal was bypassing CFG on Edge on Win10 Creators update with export suppression enabled. Looking at the various LoadLibrary calls exported in kernel32 and kernelbase, it turns out loading a new DLL into our process is rather easy even with the latest CFG feature in place. The reason for this is two-fold. First, LoadLibraryExW is actually marked as a valid call target in the __guard_fids_table within kernel32.dll.
Second, the rest of the LoadLibrary calls within both kernel32 and kernelbase start out as suppressed, but in Edge they eventually become valid call sites. This appears to stem from some delayed loading in MicrosoftEdgeCP!_delayLoadHelper2, which eventually results in GetProcAddr being called on the LoadLibraryX APIs. As foreshadowed earlier, this demonstrates the difficulty of making all function exports invalid call targets. Even if these other LoadLibrary call gates remained suppressed or were only opened temporarily, for our purposes we can just use kernel32!LoadLibraryExW since it’s initialized as a valid target.
To get our desired VirtualProtect wrapper loaded into the Edge process, we need to invoke LoadLibraryExW(“mscoree.dll”, NULL, LOAD_LIBRARY_SEARCH_SYSTEM32). We could cut corners here, and leverage one of the aforementioned invokers to populate all of our parameters at once, but instead let’s create a traditional COOP payload using a looper vfgadget to iterate over four counterfeit objects.
Our first iteration will populate r8d with 0x800. CHTMLEditor::IgnoreGlyphs is a nice vfgadget to populate r8d as seen in the assembly below. Our parameter 0x800 (LOAD_LIBRARY_SEARCH_SYSTEM32) will be loaded from this + 0xD8h. Recall that the next pointer in our counterfeit objects must be at +0x80h. We could create four contiguous counterfeit objects in memory to each be of size greater than 0xD8h, or we could treat the next pointer to be located at the end of our object. I chose the latter. In this case, we will have an overlapping object so we must be careful that the offset of this + 0xD8 does not interfere with the vfgadget from our second iteration that operates on the second object in memory. The first counterfeit object for populating r8d looks as follows:
Upon return from this vfgadget, the looper then iterates over our fake linked list and must now invoke another vfgadget this time to populate rdx with a value of 0x0 (NULL). To achieve this I use Tree::ComputedRunTypeEnumLayout::BidiRunBox::RunType(). We can load our value (0x0) from our counterfeit object + 0x28h.
Now that we have populated parameters 2 and 3 for our API call, we need to populate the first argument, a pointer to our ‘mscoree.dll’ string, and then invoke a function pointer to LoadLibraryExW. A perfect invoker vfgadget exists for this purpose, Microsoft::WRL::Details::InvokeHelper::Invoke(). The assembly and corresponding third counterfeit object are as follows:
//Retrieve loadlibrary return val from coop region
var mscoreebase = Read64(pRebasedCOOP.add(0x128));
alert("mscoree.dll loaded at: 0x" + mscoreebase.toString(16));
Hijack #2: Invoking VirtualProtect Wrapper
Having successfully completed our first COOP payload, we can now rebase a second COOP payload to invoke ClrVirtualProtect on the read-only memory region that contains chakra!__guard_dispatch_icall_fptr in order to make it writable. Our objective is to call ClrVirtualProtect(this, chakraPageAddress,0x1000,PAGE_READWRITE,pScratchMemory). This time we will demonstrate a COOP payload that does not make use of a loop or recursion by using a single counterfeit object to populate all arguments and invoke a function pointer. We’ll use the same invoker vfgadget as before, only this time it is primarily used to move a counterfeit object into rcx.
Notice that our fake object will load all the required parameters for ClrVirtualProtect, as well as populate the address of ClrVirtualProtect into rax by resolving index +0x100h from another fake vtable. Upon completion, this will map the desired page in chakra.dll to be writable.
//Change chakra CFG pointer to NOP check
//trigger hijack to 0x4141414141414141
As the WinDbg output below illustrates, with CFG now disabled our hijack was successful and the process crashes when trying to jump to the unmapped address 0x4141414141414141. It’s important to point out that we could have made this hijack jump to anywhere in the process address space due to CFG being disabled. By comparison, with CFG in place an exception would have been thrown since 0x4141414141414141 is not valid in the bitmap, and we would have seen the original CFG routine that we swapped out, ntdll!LdrpDispatchUserCallTargetES, in our call stack.
In this post, I discussed COOP, a novel code-reuse attack proposed in academia, and demonstrated how it can be used to attack modern Control-Flow Integrity implementations, such as Microsoft CFG. Overall, COOP is fairly easy to work with, particularly when breaking up payloads into smaller chains. Piecing together vfgadgets is not unlike the exercise of assembling ROP gadgets. Perhaps the most time consuming portion is finding and labeling candidate vfgadgets of various types within your target process space.
Microsoft’s Control Flow Guard is considered a coarse-grained CFI implementation and is thus more vulnerable to function reuse attacks such as described here. By comparison, fine-grained CFI solutions are able to validate call sites beyond just the target address considering elements such as expected VTable type, validating number of arguments, or even argument types, for a given indirect call. A key tradeoff between the two approaches is performance, as introducing too much complexity into a CFI policy can add significant overhead. Nonetheless, mitigating advanced code-reuse attacks is important moving forward as applications become hardened with some form of forward-edge and backward-edge CFI.
To offset some of the limitations of CFG, Microsoft appears to be focused on diversifying its preventions such as protecting critical call gates like VirtualProtect with export suppression in CFG and Arbitrary Code Guard. However, one important takeaway from this post should be the challenges of designing and enforcing mitigations from user-space. As we saw with EMET a couple of years ago, researchers were able to disarm EMET by reusing code inserted by EMET itself. Further, as was originally demonstrated at BlackHat 2015, here we are similarly taking advantage of critical CFG function pointers residing in user-space to alter the behavior of CFG.
By comparison, Endgame’s HA-CFI solution is implemented and enforced entirely from the kernel and uses hardware features that even if vulnerable to function reuse attacks, make it more difficult to tamper with because of the privilege separation. In the second part of this series, I will discuss the COOP adventures with our own HA-CFI and ongoing research, and how our detection logic evolves to account for advanced function reuse attacks.