Phoenix Testing Facility: crypto

Showing posts with label crypto. Show all posts

Saturday, October 27, 2012

Online WinAPI & hack.lu slides

This is just a short update, in order to capture some of the things that otherwise would have been just noted on my twitter. If you only follow the blog, sorry to serve this a bit late but I just didn't find it worth blogging about 3-4 lines and a little modification of an already existing feature.

Online WinAPI lookups

Since October 16th, 2012, the WinAPI lookup widget of IDAscope also supports online lookups.

Why is this interesting?
Well, as some of you might have already experienced, the offline mode didn't capture all of MSDN, mainly CRT functions and Kernel functions were missing. I'm quite happy that they are covered as well now.
But in my opinion more importantly, the online lookup cancels the need to download that huge data blob from the Windows SDK and makes IDAscope usable without any prerequisites.

Online mode is enabled by default (./config.json):

"winapi": {
        "search_hotkey": "ctrl+y",
        "load_keyword_database": true,
        "online_enabled": true
        }

As always, the update can be found in the repository.

hack.lu (slides + new feature)

On a different note, this week I've been to hack.lu in Luxembourg. It was a great conference with a lot of nice people and many interesting presentations.

On the second day, I took my chance and gave a lightning talk on IDAscope. It basically gave a walkthrough of all the currently implemented features as well as some on plans I have for the future. The slides are contain mostly screenshots in order to bring you as close to a demo as possible and can be found here.

Furthermore, during the conference I took the time to implement another addition to the crypto widget: Public Key Cryptography Standards (PKCS) detection.
While having a lot of signatures integrated already, IDAscope did not support the detection of PKCS elements, such as public/private keys and certificates until now. However, I believe this is useful, as some of the modern botnets are using asymmetric crypto in order to verify authenticity of commands and updates. Moreover, it might be interesting to have this feature available when analyzing memory dumps of software where you assume the presence of such keys and certificates.
The detection is already integrated in my local version and I want to add the functionality to directly dump those elements from a binary to disk. When I've tested it some more, I'll push the updates to the repo. Of course there will be another more technical blog post in the next days to cover this new addition in detail.
Thanks to mortis for pointing me to the idea for this feature. ;)

Tuesday, September 18, 2012

IDAscope: fixed

Today I learned that I am bad at releasing code.
The initial release had a stupid string conversion bug that I introduced when trying to make the console output a bit prettier.
It's fixed now, the release link should now point to the new version.

Loop Awareness in Crypto Identification

However, you might be interested in downloading the new version I just pushed to the repo. It contains a new check box in Crypto Identification that allows to choose only those basic blocks that are contained in loops.

This new option again reduces dramatically false positives and supports a faster finding of potential crypto routines.

The following screenshot shows the option in action:

Loop awareness in the Crypto Identification widget.

Previously I would have considered the settings shown in the sliders as rather "wide" and thus giving lots of misleading / uninteresting blocks. In fact, without the additional limiting to looped blocks it would have shown 194 instead of the 28 blocks it shows now.

Tarjan's Algorithm

The calculation of the blocks in loops is also amazingly fast. Luckily it didn't introduce a noticeable overhead. That would have surely been the case if I had tried to implement this naively. ;)

Thankfully there is Tarjan's algorithm known from graph theory for detecting strongly connected components.

A function graph can be treated as a directed graph (N, V) with N being the nodes and V (N x N) being the vertices. N is represented by all basic blocks of the function. The set of vertices can be collected by looping over all blocks (e.g. accessible via idaapi.FlowChart()) and obtaining their successors generated by idaapi.BasicBlock.succs().

Tarjan's algorithm is executed once on every node of the graph, treating it as a potential "root" node. The paths leaving the root node are explored via depth-first search (DFS), meaning successive nodes are immediately explored as well as long as they have not been explored earlier.

While traversing the graph, two steps are performed per encountered node. First, the node is assigned both a "DFS index" that represents the depth in which the node has been found and a "low link", which is the smallest "DFS index" reachable among this node's successors. Second, the node is stored on a stack.

When no more nodes can be explored, the stack is walked backwards. Whenever a node is encountered that has an equal "DFS index" and "low link", this node is the root of a strongly connected component and all elements on the stack above it are members of its component.

The algorithm has linear complexity in nodes and vertices: O(|N| + |V|) and thus is very efficient.

There is a nice visualization explaining the algorithm in Wikipedia (German).

The algorithm does not allow to differ single, unlooped blocks from single, trivially looped blocks (blocks with a loop by pointing to themselves). Therefore,

the set of successors of a block has to be additionally checked for the identical block.

I will try to provide some theoretical background on the implementation details of IDAscope once in a while and hope it's not too boring. :)

Wednesday, August 15, 2012

IDAscope update: Crypto Identification

After being quiet for almost three weeks, today I want to share with you my latest additions to IDAscope.

Focus of this post will be a new widget that I call Crypto Identification.
Now you may say "oh no, yet another crypto detection tool?" Well, yes, but before you stop reading let me introduce you to an approach you might find useful.

Heuristics-based crypto detection by code properties

About 2 years ago, during literature research on network protocol reverse engineering, I came across an interesting paper called "Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering" by Juan Caballero et al. Besides the description of an approach on how to identify and dissect message buffers into protocol fields, it contains a section on automated detection of cryptographic routines ("Detecting Encoding Functions", p. 10).
The main idea is pretty straight forward:

Evaluate the ratio of arithmetic/logic instructions related to all instructions in a function.
Assumption: Cryptographic functions usually consist mainly of
arithmetic/logic instructions, thus they should have a higher ratio.
If the function has a size of 20 or more instructions, flag the function as encoding function.

While the approach described in the paper is applied to dynamically achieved instruction traces, there is no reason why not to employ it in static code analysis. So my goal for today is to show you how to make "academic things" practically usable. ;)

I use the following set of arithmetic/logic instructions, please tell me if I missed something:

["add", "and", "or", "adc", "sbb", "and", "sub", "xor", "inc", "dec", "daa",
"aaa", "das", "aas", "imul", "aam", "aad", "salc", "not", "neg", "test", "sar",
"cdq"]

The following screenshot shows the widget in action:

IDAscope: Crypto Identification widget

The functionality I just described is located in the upper part of the widget. There are three double-sliders that can be used to adjust the following parameters:

Range of Arithmetic/Logic Rating: The above mentioned ratio of
arithmetic/logic instruction to total instructions, but calculated on basic
block level instead of function level.
Considered Basic Block Size: Only blocks having a size within the
boundaries are taken into concern.
Allowed Calls in Function: Number of calls allowed from the function
containing the analyzed basic block to any other code location. This is
based on the assumption that most actual cryptographic/compression
functions are "leafs" in the overall program flow graph, not having any
child functions.

With these filter functions, we can greatly narrow down the number of suspicious basic blocks to those really containing interesting crypto or compression algorithms. Once the initial scanning has been performed (sample with 700 functions, less than one second), the sliders update the visualization in real-time. Qt only chokes when viewing all 9500 basic blocks at once, but that's not what you want anyway.

The two checkboxes give further ways to refine the search:

Exclude zeroing Instructions: This can be used to reduce false positives
that may distort the rating. You will often find instructions like xor eax, eax
or sbb eax, eax being used to clear register contents. However, they would
normally be included in the calculation of the rating because XOR is in the
set of arithmetic/logic instructions.
Group Results by Functions: This is just an alternative display method,
giving a better overview on how many suspicious blocks are contained in
the same functions.

Here is a use case for this widget: When I am trying to identify cryptography in malware samples, I often have problems finding compact but frequently used crypto algorithms such as RC4 that usually do not carry constants with them (which would allow to spot them by simple signature matching).
In the above screenshot (from a current Citadel sample with 724 functions) you can see that the candidate blocks have been reduced to 23 out of 9526 basic blocks. The filters are set to show only blocks with a rating of above 30%, with a size of 10 or more instructions and 1 or 0 call instructions. 23 blocks is a number small enough for me to look at in just a few minutes, identifying the relevant parts in a very short amount of time.

Among the 23 blocks is the following one:

Citadel's modified stream cipher.

containing the modified stream cipher that is used in Citadel. In addition to the normal XOR/substitutions, Citadel also XORs against the characters of a static hash contained in the binary, which is considered one of the "advancements" from its predecessor Zeus 2.
While this may be a weak example because the block is easily identified by searching for exactly this hash, you probably get the idea on how to use the widget.
The heuristic also successfully identifies all the other crypto parts in the sample like the AES and CRC32 algorithms.

If you wonder about how you get double-sliders in Qt (because it is not a standard widget): The idea and code of this widget called "BoundsEditor" is adapted from Enthought's TraitsUI, which luckily is open-source software. I took the code and reduced it back to a standard Qt widget, having a great and compact control element to adjust my parameters.

Signature-based crypto identification

The second part of the widget does what you might have expected in the first place. It simply uses a set of constants in order to find well-known cryptographic algorithms. It's basically inspired by tools like the IDA findcrypt plugin or the KANAL plugin for PEiD. It does the same job, except being directly coupled to IDA and allowing to instantly jump to the code locations referencing the identified constants.
The following screenshot (from an old but gold conficker sample) shows both types of matches:

[black] referenced by: constant somewhere (e.g. data section), referenced by code.
[red] referenced by: constant immediately used in code, just as shown in the basic block to the left.

The currently supported algorithms are (with ingredients from Ilfak Guilfanov's findcrypt, Felix Gröbert's kerckhoff's, a crypto detection implementation by Felix Matenaar from his Bachelor thesis, and some of my own adaptions):

ADLER 32
AES
Blowfish
Camellia
CAST256
CAST
CRC32
DES
GOST
HAVAL
IDEA
MARS
MD2, MD5, MD6
MD5MAC
PKCS (various initialization values)
RawDES
RC2, RC5
Rijndael
Ripe-MD160
SAFER
SHA224
SHA256
SHA384
SHA512
SHARK
SKIPJACK
SQUARE
Serpent
Square/SHARK
TIGER
Twofish
WAKE
Whirlpool
Zlib

The only thing missing right now is renaming / tagging those functions based on the signatures, maybe I will implement that, too.

Other changes to IDAscope

To conclude this post, I want to briefly discuss some more changes I did to IDAscope since the last post.

In my last post, I mentioned that the WinAPI widget only worked against the
offline data from the Windows SDK. This is no longer the case, as it now
supports doing online lookups (controllable by a checkbox) in the case it
does not find local information. This is great because by that, the missing
documentation of CRT and NTDLL functions are now also covered. Parsing
of the MSDN webpage can be optimized but works for now.
Hotkey support for widgets. As an example, [CTRL+Y] will now look up the
currently highlighted identifier (in IDA View) in the WinAPI widget and
change focus to this widget.
More changes under the hood, data structures, refactoring, etc. I feel that
the code is better organized and easier to understand now.
Experimental code for visualizing the function relationship starting by
Thread start addresses (cmp. Alex' last blog post).

Next to come is the integration of Alex' latest scripts into widgets.