biodiff is a new binary diffing tool by @8051enthusiast. You can use it to identify additions, removals, and changes between two files, even if they’re not text files. I anticipate this to be useful during malware analysis, such as to highlight configuration changes among variants of the same malware family.
In the past, I’ve used 010 Editor for this; however, it requires a lot of pointy-clicky GUI interactions. biodiff may be a better fit for strictly diff-ing within a terminal.
I’ve also heard other analysts use vbindiff, though u/northbound-goat shares:
(unless I was using it wrong all the time) it can realistically only compare headers (or beginnings of manually prepared files), as the comparison breaks after the first variable length part that’s different in the two files.
biodiff
uses algorithms from bioinformatics (think DNA sequencing) to find better alignment within files. Let’s see how it works in an example situation.
example: finding malware configuration changes
Imagine you’ve recovered two similar-looking malware payloads during an incident:
filename | size (bytes) | magic |
---|---|---|
virus.exe | 7168 | PE32+ executable (GUI) x86-64, for MS Windows |
hacker.exe | 7168 | PE32+ executable (GUI) x86-64, for MS Windows |
You do a quick triage and see that they have a similar structure by using objdump
to enumerate sections, etc. and comparing the reports:
This shows two differences:
- the PE header checksum, and
- the first section name.
However, diff objdump
doesn’t explain if/how the code differs.
You can approximate a content comparison by diffing the strings of the two files, but there are obvious reasons why this doesn’t work well: it only considers data with a human-readable representation, etc. Still you can try it, though the results are not very useful:
This is a perfect time to use biodiff
!
Here’s what it looks like:
biodiff
displays hex dumps of the two files side-by-side and highlights changes.
It found three differences:
- PE header checksum (known from
objdump
) - the second section’s name (known from
objdump
) - six code changes around offset 0x1810 (new!)
Let’s triage (3) the code changes to see if any are meaningful.
Here is what biodiff
shows:
Red indicates content that has changed, while green indicates content that has been added (and empty space is content that’s been removed).
Note that biodiff
seamlessly handles cases where the alignment falls out of sync, such as when the length of a region changes.
This makes it easier to use than diff <(xxd virus.exe) <(xxd hacker.exe)
.
Of these code changes, the first five are instruction reorderings that we can ignore:
and the final is a meaningful change of an immediate constant:
With a bit more inspection (or perhaps by recognizing that these are Metasploit payloads)
you can realize that these constants are the sin_port
and sin_addr
fields of a struct sockaddr_in
:
struct sockaddr_in {
short sin_family;
u_short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
Which means that this final difference corresponds to the C2 server used by these backdoors. You can decode them with a bit of python:
11 5C C0 A8 01 21
:
In [1]: import struct, socket
In [2]: "{1}:{0}".format(
...: struct.unpack_from(">H", bytes.fromhex("115C"))[0],
...: socket.inet_ntoa(bytes.fromhex("C0A80121")),
...: )
Out[2]: '192.168.1.33:4444'
5B 26 2F 5F 0D A8
:
In [1]: import struct, socket
In [2]: "{1}:{0}".format(
...: struct.unpack_from(">H", bytes.fromhex("5B26"))[0],
...: socket.inet_ntoa(bytes.fromhex("2F5F0DA8")),
...: )
Out[2]: '47.95.13.168:23334'
So, biodiff
does a good job of identifying changes between a pair of binary files.
While using it to identify rearranged instructions may be a bit overkill
(probably use bindiff/diaphora instead),
I think I’ll use it to find configuration changes in malware and perhaps when reversing file formats.
Overall, I suspect its alignment algorithms may work better than some other tools (though I don’t have any examples yet).
Thanks @8051enthusiast for the new tool!
update: comparison with other tools
Recall, this is what biodiff
shows:
and for comparison, here is the output from 010 Editor:
and with vbindiff
:
and with rz-diff: