The initial release had a stupid string conversion bug that I introduced when trying to make the console output a bit prettier.
It's fixed now, the release link should now point to the new version.
Loop Awareness in Crypto Identification
However, you might be interested in downloading the new version I just pushed to the repo. It contains a new check box in Crypto Identification that allows to choose only those basic blocks that are contained in loops.
This new option again reduces dramatically false positives and supports a faster finding of potential crypto routines.
The following screenshot shows the option in action:
Loop awareness in the Crypto Identification widget. |
Previously I would have considered the settings shown in the sliders as rather "wide" and thus giving lots of misleading / uninteresting blocks. In fact, without the additional limiting to looped blocks it would have shown 194 instead of the 28 blocks it shows now.
Tarjan's Algorithm
The calculation of the blocks in loops is also amazingly fast. Luckily it didn't introduce a noticeable overhead. That would have surely been the case if I had tried to implement this naively. ;)
Thankfully there is Tarjan's algorithm known from graph theory for detecting strongly connected components.
A function graph can be treated as a directed graph (N, V) with N being the nodes and V (N x N) being the vertices. N is represented by all basic blocks of the function. The set of vertices can be collected by looping over all blocks (e.g. accessible via idaapi.FlowChart()) and obtaining their successors generated by idaapi.BasicBlock.succs().
Tarjan's algorithm is executed once on every node of the graph, treating it as a potential "root" node. The paths leaving the root node are explored via depth-first search (DFS), meaning successive nodes are immediately explored as well as long as they have not been explored earlier.
While traversing the graph, two steps are performed per encountered node. First, the node is assigned both a "DFS index" that represents the depth in which the node has been found and a "low link", which is the smallest "DFS index" reachable among this node's successors. Second, the node is stored on a stack.
When no more nodes can be explored, the stack is walked backwards. Whenever a node is encountered that has an equal "DFS index" and "low link", this node is the root of a strongly connected component and all elements on the stack above it are members of its component.
The algorithm has linear complexity in nodes and vertices: O(|N| + |V|) and thus is very efficient.
There is a nice visualization explaining the algorithm in Wikipedia (German).
The algorithm does not allow to differ single, unlooped blocks from single, trivially looped blocks (blocks with a loop by pointing to themselves). Therefore,
I will try to provide some theoretical background on the implementation details of IDAscope once in a while and hope it's not too boring. :)