biodiff: introduction


biodiff is a new binary diffing tool by @8051enthusiast. You can use it to identify additions, removals, and changes between two files, even if they’re not text files. I anticipate this to be useful during malware analysis, such as to highlight configuration changes among variants of the same malware family.

In the past, I’ve used 010 Editor for this; however, it requires a lot of pointy-clicky GUI interactions. biodiff may be a better fit for strictly diff-ing within a terminal.

I’ve also heard other analysts use vbindiff, though u/northbound-goat shares:

(unless I was using it wrong all the time) it can realistically only compare headers (or beginnings of manually prepared files), as the comparison breaks after the first variable length part that’s different in the two files.

biodiff uses algorithms from bioinformatics (think DNA sequencing) to find better alignment within files. Let’s see how it works in an example situation.

example: finding malware configuration changes

Imagine you’ve recovered two similar-looking malware payloads during an incident:

filename size (bytes) magic
virus.exe 7168 PE32+ executable (GUI) x86-64, for MS Windows
hacker.exe 7168 PE32+ executable (GUI) x86-64, for MS Windows

You do a quick triage and see that they have a similar structure by using objdump to enumerate sections, etc. and comparing the reports:

diff objdump outputs

This shows two differences:

  1. the PE header checksum, and
  2. the first section name.

However, diff objdump doesn’t explain if/how the code differs.

You can approximate a content comparison by diffing the strings of the two files, but there are obvious reasons why this doesn’t work well: it only considers data with a human-readable representation, etc. Still you can try it, though the results are not very useful:

diff strings output

This is a perfect time to use biodiff! Here’s what it looks like:

biodiff output

biodiff displays hex dumps of the two files side-by-side and highlights changes. It found three differences:

  1. PE header checksum (known from objdump)
  2. the second section’s name (known from objdump)
  3. six code changes around offset 0x1810 (new!)

Let’s triage (3) the code changes to see if any are meaningful. Here is what biodiff shows:

biodiff output 2

Red indicates content that has changed, while green indicates content that has been added (and empty space is content that’s been removed). Note that biodiff seamlessly handles cases where the alignment falls out of sync, such as when the length of a region changes. This makes it easier to use than diff <(xxd virus.exe) <(xxd hacker.exe).

Of these code changes, the first five are instruction reorderings that we can ignore:

ida showing instructions reordered

and the final is a meaningful change of an immediate constant:

ida showing immediate constants

With a bit more inspection (or perhaps by recognizing that these are Metasploit payloads) you can realize that these constants are the sin_port and sin_addr fields of a struct sockaddr_in:

struct sockaddr_in {
    short   sin_family;
    u_short sin_port;
    struct  in_addr sin_addr;
    char    sin_zero[8];

Which means that this final difference corresponds to the C2 server used by these backdoors. You can decode them with a bit of python:

11 5C C0 A8 01 21:

In [1]: import struct, socket

In [2]: "{1}:{0}".format(
   ...:     struct.unpack_from(">H", bytes.fromhex("115C"))[0],
   ...:     socket.inet_ntoa(bytes.fromhex("C0A80121")),
   ...: )
Out[2]: ''

5B 26 2F 5F 0D A8:

In [1]: import struct, socket

In [2]: "{1}:{0}".format(
   ...:     struct.unpack_from(">H", bytes.fromhex("5B26"))[0],
   ...:     socket.inet_ntoa(bytes.fromhex("2F5F0DA8")),
   ...: )
Out[2]: ''

So, biodiff does a good job of identifying changes between a pair of binary files. While using it to identify rearranged instructions may be a bit overkill (probably use bindiff/diaphora instead), I think I’ll use it to find configuration changes in malware and perhaps when reversing file formats. Overall, I suspect its alignment algorithms may work better than some other tools (though I don’t have any examples yet). Thanks @8051enthusiast for the new tool!

update: comparison with other tools

Recall, this is what biodiff shows:

biodiff output 2

and for comparison, here is the output from 010 Editor:

010 editor view

and with vbindiff:

vbindiff view

and with rz-diff:

rz-diff view