Friday, February 7, 2014

IDAScope v1.1: YARA scanning

Today I integrated something in the master branch of IDAscope that I myself liked to have available for quite some time: Seamless scanning with YARA signatures from within IDA for the win!

Late December 2013 YARA's author Victor M. Alvarez made us a christmas gift with his YARA 2.0 release.
I read the release notes but didn't realize the implications of "YARA has experiencied an almost complete rewrite for version 2.0" at that point in time.

Around mid December, I tasked one of my student assistants (Christopher Kannen) with developing a minimum / pure-python version of YARA. The goal of this task would have been to enable its use in IDA and work arond the issues experienced in the past.
Last week, he finished the code for loading YARA rules into convenient objects. When he was about to start with implementing basic matching, I became aware that importing YARA in the IDA Python shell no longer failed.
Happy day, this meant that the desired functionality could be immediately integrated into IDAscope with full native support for YARA rule files, avoiding any side-effects.

Here is a screenshot of the widget in action:

I used some binary (BISCUIT, 268eef019bf65b2987e945afaf29643f) from @snowfl0w's Contagio Malware Dump collection of APT1 stuff and the signatures as provided by AlienVaultLabs for developing/testing and the demo screenshot. Keep up the good work!

However one would assume Christopher's work is now useless and was wasted time. ;)
No! It comes in handy as I will outline in the description up next.

Fiddling with YARA in Python

Everyone who has ever played with YARA and Python is probably familiar with its basic usage, like (examples taken from YARA's manual):
import yara rules = yara.compile(filepath="/path") suspicious = "some data to be scanned" rules.match(data=suspicious)
Since YARA is intended to be fast, the "rules" object potentially contains multiple signatures from a single file compiled into one object.
I always missed the ability here to inspect those signatures loaded in detail, like having access to their names and strings of individual signatures. Maybe it's possible, I just never managed to do so.
A probably lesser known but cool feature of YARA are match callbacks. It comes in pretty handy as a workaround in this context:

import yara def cb(data): print data yara.CALLBACK_CONTINUE rules = yara.compile(filepath="/path") suspicious = "some data to be scanned" rules.match(data=suspicious, callback=cb)
Each time the callback is fired, we receive a dictionary "data" like this one:
{ 'tags': ['foo', 'bar'], 'matches': True, 'namespace': 'default', 'rule': 'my_rule', 'meta': {}, 'strings': [(81, '$a', 'abc'), (141, '$b', 'def')] }
As you can see, we can derive from that data, which signatures from the rules object have just been run against the target input and what their individual outcome is.
We can also derive information about partial matches by checking the "matches" and "strings" entries.

This is basically what I now use in the IDAscope widget to derive the scores and detailed views for signatures.
Christopher's rule loader additionally allows to read the signatures as given in the source file, thus comparing which of the strings from the callbacks are matched and which are not.

Combining all of those parts results in the extension just added to IDAscope.

If you want to use it, make sure to install YARA Python first and adjust the paths specified in ./idacope/ to your local collection of signature files.

Should you find any errors, please blame here or via mail.

Here is IDAscope v1.1 (mirror) as a downloadable snapshot from the repository (commit f3d58ad). If the latest extensions should prove to be stable and usable, I might not even need to push another version like last time, lol.

Development Plans

I'm not entirely sure if I am going to push this IDAscope widget further than its current functionality.

Instead, a full-blown interactive YARA editor (plugin) seems more attractive to me right now. So basically something independent from IDAscope, since the other tabs may be of less interest to a signature writer.
If it is not going to be too heavy code-wise, I might opt later on to integrate it back in IDAscope. I'm open for feedback in this matter.

But first: enjoy YARA in your IDA! :)

Wednesday, January 29, 2014

PyBox Relaunch

A much too long time has passed since I blogged the last time. I guess the main reason for this is that I've been pretty busy with #DAYJOB for the last half year and while I did several things I considered blog-worthy, I just didn't put in the extra effort to go for an appropriate write-up.

This is not going to be a late new year resolution but I sincerely want to be more active again in terms of releases. I'll be likely going for smaller, incremental posts (like I did during the main IDAscope development) as these are easier to bring to an acceptable level of quality. If there is interest, I might also start covering more concrete malware analysis content but I have been reluctant towards this so far.

Today marks a milestone for an old project of mine. I have migrated the repository of PyBox from googlecode to bitbucket since googlecode has been more or less killed by disabling new downloads earlier this month (Git > SVN anyway).
I wanted to do the migration for half a year, but now I finally had the time needed to accompany it with a little story of what PyBox is and how we got there. In the same run, I compiled the DLL and pydasm for the two most recent versions of Python 2.7.
I hope that there is the one or the other interesting aspect in the code that might find usage elsewhere. Maybe PyBox by itself is interesting enough to know about as well. ;)

History of PyBox

Back in 2010 when Felix Leder was still at University of Bonn, he thought it would be great to have a highly customizable analysis framework / sandboxing tool for daily malware analysis. I guess he was inspired by the outcome of the Project Honeynet Google summer of code (GSOC) project Cuckoo sandbox for which he was an advisor at that time. :)
The idea behind Cuckoo has always been to inject a DLL into a target process and have that DLL serve as a platform for monitoring. In Cuckoo, the DLL is setting up a number of hooks for interesting Windows API functions. Later during execution, when hooks are triggered, the logging results in a sequence of calls to their target functions including the respective parameters.
PyBox is based on the same idea. As with Cuckoo, a DLL is injected into a target process to serve as a platform. However, upon injection, the PyBox DLL starts a fully fledged Python interpreter within the target process, allowing the execution of arbitrary Python scripts within the context of that process. Since Python is a great language for rapid prototying, this approach allowed us to quickly design analysis modules, e.g. custom sandboxes, tailored to certain aspects of chosen malware families.
Lately I've noticed that a similar concept is being realized by Frida, but using Javascript instead of Python.

PyBox architecture

In the following, I'll explain the architecture of PyBox when being used as a sandbox, its original intended use case.


As mentioned earlier, the core of PyBox is being injected as a DLL (./DLL/PyBox.cpp) into a target process, so we first need an injector. It is located at ./src/ and the approach realized here is pretty straightforward. If the target process does not exist yet, start it (optionally suspended). For injection, first get a handle to the process (kernel32!OpenProcess), allocate some memory in it (kernel32!VirtualAllocEx) to store a string holding the path to our PyBox DLL (kernel32!WriteProcessMemory). Finally, use our good old friend kernel32!CreateRemoteThread to start a thread within the target process with kernel32!LoadLibraryA and the PyBox DLL path as argument.
For some side tasks, we use the module ProcessRigger (./src/ For example, in order to easily perform follow-up tasks, it's nice to grant our injector and the target process the privilege SE_DEBUG. A more interesting functionality implemented in ProcessRigger is its ability to execute an arbitrary API call in the context of the target remote process. For this, we dynamically generate and write a short shellcode to the target process, consisting of the expected number of push instructions as arguments (either immediates or pointer to strings / structures) and the desired call to the target API as well as a consecutive call to kernel32!ExitThread. Nothing new, but useful.
For PyBox, we only use this to set some environment variables in the context of the target process in which PyBox is injected, but since I think the concept has potential for more, here is sample code and a diagram:


When the DllMain() of PyBox is loaded in the target process, it will first check the presence of said environment variables, proceed to open a file for logging and then initialize the Python interpreter. The PyBox DLL additionally makes itself available to the interpreter environment as an embedded module, enabling easy access to some native system functionality, like access to the process environment block (PEB), enumeration of exports for other DLLs, hook/callback handling and emergency termination. Finally, the PyBox DLL will hand over control to the target "box" starter script (example: ./src/ which then executes the desired analysis functionality.


As just mentioned, PyBox is intended to be used with independent "boxes" that are specialized for certain purposes. These boxes are powered by the functionality provided through the PyBox API.
First, there is MemoryManager (./src/pybox/, granting access to memory manipulation functions in the process address space via Python ctypes. A bunch of convenience functions automatically handles read/write permissions of given memory to enable read/write operations.
Next, there is the ModuleManager (./src/pybox/, which enumerates all other loaded DLL files (= executable modules) in the target process' address space in preparation of hooking. The enumeration is done through the embedded module provided through the PyBox DLL itself, in order to speed up this procedure.
The PyHookManager (./src/pybox/ provides an interface to PyBox' hooking functionality: add and remove hooks, check if an address is already hooked (similar addresses can be hooked with multiple hooks, which are then executed in chain), and selecting the appropriate hook through its function find_and_execute(). The reason for this last function is that all hooks are first pointing to the same callback address in the PyBox DLL which handles mutual exclusion (as well as the Global Interpreter Lock (GIL) of Python) prior to transferring control to the hook code implemented in Python.
There exist three classes of hooks: PyFunctionEntryHook, PyReturnAddressHook, and PyHookClone. When a new PyFunctionEntryHook is created for a target address (e.g. an Windows API function), up to 20 bytes of memory are read and disassembled via pydasm. The reason for this is that we need to overwrite 5 bytes for a jmp instruction to our hook trampoline (PyTrampoline) while preserving the integrity of the modified code. If more than 5 bytes are taken, the rest is padded with NOPs { 90 }. For most Windows API functions, we run into no trouble as they usually start with "move edi, edi; push ebp; mov ebp, esp" which sums up to exactly 5 bytes but this may not the case for arbitrary other functions. A PyReturnAddressHook is realized by overwriting the original return address of a function with a PyTrampoline address. A PyHookClone is used when hooking one address with multiple hooks and references the original first hook.
The PyTrampoline is a dynamically generated shellcode, preparing the call of a hooking function. It is optionally prefixed with a "jmp self" { EB FE }, which turns out useful when writing a box for unpacking. It can be used with the intention to intercept the control flow before OEP is reached (in order to attach a debugger and proceed manually). Next the current register state is saved (PUSHAD { 60 }) and an identifier for our hook is pushed (this allows to differ multiple hooks on the same address). Hooks can optionally be used with their own parameters, so these are pushed { 68 11223344 } now. Finally, a call to the hook function is made { E8 ca11bacc }. When the hook returns, the register state is restored { 61 } and any original opcode bytes saved during overwriting the target address when setting up the hook are executed. Finally we jump back to the instruction behind the originally hooked address { E9 001dc0de }.
In summary, hook execution looks like this:
When a hook is called, it can access and modify the current function execution context (return addr, stack) and register context (from EAX to EDI) through two respective objects passed to it as argument.
Besides these modules, there is also an unfinished dumper module and a ProcTrack module, which hooks API functions for spawning new processes and injects the current box into these when triggered.

Box Scripts

I have included two sample boxes in the version pushed to the repository.

The first example is the standard sandbox (stdbox), which hooks a range of interesting API functions one would be interesting in when tracing the execution of malware. I have gone for a harmless example and traced the creation of a new file on disk through notepad.exe (uploaded here).
Most of the lines in the log file are noise, important are these:

2014-01-29 12:05:59,878 - INFO - kernel32.dll.CreateFileW(\
C:\Documents and Settings\redacted\Desktop\test.txt, 0x80000000, 3, 0, 3, 0x00000080,0
2014-01-29 12:05:59,898 - INFO - kernel32.dll.CreateFileW(\

C:\Documents and Settings\redacted\Desktop\test.txt, 0xc0000000, 3, 0, 4, 0x00000080,0)
2014-01-29 12:05:59,898 - INFO - kernel32.dll.WriteFile(\

0x00000138, 0x000e0db8, 0x0000000c, 0x0007faf0, 0x00000000)

As you can see filename is "test.txt" on the desktop. The first call to CreateFileW() with GENERIC_READ access and CREATE_ALWAYS+CREATE_NEW flags create the file. The second call with GENERIC_READ+GENERIC_WRITE access and OPEN_ALWAYS flags opens the file for writing. This is ultimately followed by WriteFile() putting the bytes "just a test" in there.

The second example is the more useful RWX box. This box will only log calls that being made from RWX memory. The idea behind this would be e.g. PyBox injection into explorer.exe in order to monitor the behaviour of malware self-injecting into that process. Once again here is an example, this time of Citadel injecting into explorer.exe.
  • In lines 8-813 you can observe Citadel creating its dynamic import table within the context of explorer.exe.
  • In lines 817-1137, Citadel creating hooks itself for a range of Windows API functions.
  • In lines 1138-1271 you can see Citadel starting some threads and guess about their intention (ObtainUserAgentString -> mimic target system's browser, WABOpen -> crawl address book for email addresses).
  • From line 1272 on, you can observe Citadel searching for other processes to inject into.
  • Finally, from line 1372 on, Citadel found a target and injects itself.

I have included a startup batch file for both boxes so you can easily try them out.
These boxes are rather generic and simple, but it should be easy to imagine that more powerful use cases for automation can be covered (conditional controlling and patching of "interesting" memory).
More advanced stuff was shown by Felix in his talk at Troopers conference in 2011, e.g. how to intercept network payloads before they enter an SSL connection (FireFox) or how to analyze obfuscated PDF exploits by pyboxing Acrobat.


So that's my short intro to the PyBox framework. Notice that it only works up to WinXP since we are practicting evil process/memory voodoo here and there that does not like modern memory protection mechanisms.
Currently, there is also no intention to further pursue development of this framework unless there will be unexpectedly huge interest in this. ;)
Hopefully some of the design or implementation details may be of interest to some of you so the PyBox spirit can live on!