Defeating the Latest Advances in Script Obfuscation
As the security research community develops newer and more sophisticated means for detecting and mitigating malware, malicious actors continue to look for ways to increase the size of their attack surface and utilize whatever means are necessary to bypass protections. The use of scripting languages by malicious actors, despite their varying range of limited access to native operating system functionality, has spiked in recent years due to their flexibility and straightforward use in many attack scenarios.
Scripts are frequently leveraged to detect a user’s operating system and browser environment configuration, or to extract or download a payload to disk. Malicious actors also may obfuscate their scripts to mask the intent of their code and circumvent detection, while also deterring future reverse engineering. As I presented at DerbyCon 6 Recharge, many common obfuscation techniques can be subverted and defeated. Although they seem confusing at first glance, there are a variety of techniques that help quickly deobfuscate scripts. In this post, I’ll cover the latest advances in script obfuscation and how they can be defeated. I’ll also provide some practical tips for quickly cleaning up convoluted code and transforming it into something human-readable and comprehensible.
When discussing malicious scripts, obfuscation is a technique attackers use to purposefully obscure their source code. They do this primarily for two purposes: subverting antivirus and intrusion detection / prevention systems and deterring future reverse engineering efforts.
Obfuscation is typically employed via an automated obfuscator. There are many to choose from, including the following freely available tools:
- Stunnix (Multiple Languages)
- Crunchcode (VBA)
- Code Protection (VBA)
- Vbad (VBA)
- ISESteroids (PowerShell)
- Scripts Encryptor (Multiple Languages)
Since obfuscation does not alter the core functionality of a script (superfluous code may be added to further obscure a script’s purpose), would it be possible to simply utilize dynamic malware analysis methods to determine the script’s intended functionality and extract indicators of compromise (IOCs)? Unfortunately for analysts and researchers, it’s not quite that simple. While dynamic malware analysis methods may certainly be used as part of the process for analyzing more sophisticated scripts, deobfuscation and static analysis are needed to truly know the full extent of a script’s capabilities and may provide insight into determining its origin.
Tips for Getting Started
When beginning script deobfuscation, you should keep four goals in mind:
- Human-readability: Simplified, human-readable code should be the most obvious realized goal achieved through the deobfuscation process.
- Simplified code: The simpler and more readable the code is, the easier it will be to understand the script’s control flow and data flow.
- Understand control flow / data flow: In order to be able to statically trace through a script and its potential paths of execution, a high level understanding of its control flow and data flow is needed.
- Obtain context: Context pertaining to the purpose of the script and why it was utilized will likely be a byproduct of the first three goals.
Prior to starting the deobfuscation process, you should be sure you have the following:
- Virtual machine
- Fully-featured source code editor with syntax and function / variable highlighting
- Language-specific debugger
It should also go without saying that familiarity with scripting languages is a prerequisite since you’re trying to (in most cases) understand how the code was intended to work without executing it. The following scripting language documentation will be particularly useful:
- JScript and VBScript
- Windows PowerShell Reference
- Office VBA Language Reference
Online script testing frameworks provide a straightforward means for executing script excerpts. These frameworks can serve as a stepping-stone between statically evaluating code sections and setting up a full-fledged debugging session. The following frameworks are highly recommended:
Before you begin, it is important to know that there is no specific sequence of steps required to properly deobfuscate a script. Deobfuscation is a non-linear process that relies on your intuition and ability to detect patterns and evaluate code. So, you don’t have to force yourself to go from top to bottom or the perceived beginning of the control flow to the end of the control flow. You’ll simply want to deobfuscate code sections that your eyes gravitate towards and that are not overly convoluted. The more sections you’re initially able to break down, the easier the overall deobfuscation process will be.
Code uniformity is crucial to the deobfuscation process. As you’re deobfuscating and writing out your simplified version of the code, you’ll want to employ consistent coding conventions and indentation wherever possible. You’ll also want to standardize and simplify how function calls are written where possible and how variables are declared and defined. If you take ownership of the code and re-write it in a way that you easily understand, you'll quickly become more familiar with the code and may pick up on subtle nuances in the control or data flow that may otherwise be overlooked.
Also, as previously mentioned, simplify where possible. It can't be reiterated enough!
Obfuscators will sometimes throw in superfluous code sections. Certain variables and functions may be defined, but never referenced or called. Some code sections may be executed, but ultimately have no effect on the overall operation of the script. Once discovered, these sections may be commented out or removed.
In the following example, several variables within the subroutine are defined and set to the value of an integer plus the string representation of an integer. These variables are not referenced elsewhere within the code, so they can be safely removed from the subroutine and not affect the result.
Arguably the most common technique associated with obfuscation is the use of overly complicated variable and function names. Strings containing a combination of uppercase and lowercase letters, numbers, and symbols are difficult to look at and differentiate at first glance. These should be replaced with more descriptive and easier to digest names, reinforcing the human-readable goal. While you can use the find / replace function provided by your text editor, in this case you’ll need to be careful to avoid any issues when it comes to global versus local scope. Once you have a better understanding of the purpose of a function or a variable later on in the deobfuscation process, you can go back and update these names to something more informative like “post_request” or “decoding_loop.”
In the above example, each variable and function that is solely local in scope to the subroutine is renamed to a more straightforward label describing its creation or limitation in scope. Variables or function calls that are referenced without being declared / defined within the subroutine are left alone for the moment. These variables and functions will be handled individually at the global scope.
Indirect Calls and Obscured Control Flow
Obscured control flow is usually not evident until much later in the deobfuscation process. You will generally look for ways to simplify function calls so they’re more direct. For instance, if you have one function that is called by three different functions, but each of the those functions transform the input in the exact same way and call the underlying function identically, then those three functions could be merged into one simple function. Function order can also come into play. If you think a more logical ordering of the functions that matches up with the control flow you are observing will help you better understand the code, then by all means rearrange the function order.
In this case, we have five subroutines. After these subroutines are defined, there is a single call to sub5. If you trace through the subroutines, the ultimate subroutine that is executed is sub2. Thus, any calls outside of this code section to any of these five subs will result in a call to sub2, which actually carries out an operation other than calling another subroutine. So, removing sub1, sub3, sub4, and sub5 and replacing any calls to those subs with a direct call to sub2 would be logically equivalent to the original code sequence.
When it comes to hard-coded numeric values, obfuscators may employ simple arithmetic to thwart reverse engineers. Other than doing the actual math, it is important to research the exact behavior of the scripting language implementation of the mathematical functions.
In the line above, the result of eight double values which are added and subtracted to / from each other will pass into the ASCII character function. Upon further inspection, the obfuscator likely threw in the "86" values, as they ultimately cancel each other out and add up to zero. The remaining values add up to 38 which, when passed into the character function, results in an ampersand.
While the code section above may look quite intimidating, it is easily reduced down to one simple variable definition by the end of the deobfuscation process. The first line initializes an array of double values which is only referenced on the second line. The second line declares and sets a variable to the second value in the array. Since the array is not subsequently referenced, the array can be removed from the code. The variable from the second line is only used once in the third line, so its value can be directly placed inline with the rest of the code, thus allowing us to remove the second line from the code. The code further simplifies down as the Sgn function calls cancel each other and the absolute value function yields a positive integer, which will be subtracted from the integer value previously defined in the variable from the second line.
Obfuscated String Values
As for obfuscated string values, you’ll want to simplify the use of any ASCII character functions, eliminate any obvious null strings, and then standardize how strings are concatenated and merge together any substrings where possible.
This line of code primarily relies on the StrReverse VBA function which, you guessed it, reverses a string. The null strings are removed right off the bat since they serve no purpose. Once the string is reversed and appended to the initial “c” string, we’re left with code which invokes a command shell to run the command represented by the variable and terminate itself.
A common technique employed in malicious VBA macros is dropping and invoking scripts in other scripting languages. The macro in this case builds a Windows batch file, which will later be executed. While it is quite evident that a batch file is being constructed, the exact purpose of the file is initially unclear.
If we carry out string concatenations, eliminate null strings, and resolve ASCII characters, we can see that the batch file is used to invoke a separate VBScript file located in a subdirectory of the current user’s temp directory.
Okay, so you tried everything and your script is still obfuscated and you have no idea what else to do…
Well, in this case you’ll want to utilize a debugger and start doing some more dynamic analysis. Our goal in this case is to circumvent the obfuscation and seek out any silver bullets in the form of eval functions or string decoding routines. Going back to the ”resolve what you can first” approach, you also might want to start out by commenting out code sections to restrict program execution. Sidestepping C2 and download functions with the aid of a debugger may also be necessary.
If you follow the function that is highlighted in green in the above example, you can see that it is referred to several times. It takes as input one hexadecimal string and one alphanumeric string with varying letter cases, and returns a value. Based off of the context in which the function is called (as part of a native scripting language function call in most cases), the returned value is presumed to be a string. Thus, we can hypothesize that this function is a string decoding routine. Since we are using a debugger, we don’t need to manually perform the decoding or reverse engineer all of its inner workings. We can simply set breakpoints before the function is called and prior to the function being returned in order to resolve what the decoded strings are. Once we resolve the decoded strings, we can replace them inline with the code or place them as inline comments as I did in the sample code.
Script deobfuscation doesn't require any overly sophisticated tools. Your end result should be simple, human-readable code that is logically equivalent to the original obfuscated script. As part of the process, rely on your intuition to guide you, and resolve smaller sections of code in order to derive context to how they’re used. When provided the opportunity, removing unnecessary code and simplifying code sections can help to make the overall script much more readable and easier to comprehend. Finally, be sure to consult the official scripting language documentation when needed. These simple yet effective tips should provide a range of techniques next time you encounter obfuscated code. Good luck!