Evan Pratten
Evan Pratten
Software Developer

During a computer science class today, we were talking about embedding code and metadata in jpg and bmp files. @exvacuum was showing off a program he wrote that watched a directory for new image files, and would display them on a canvas. He then showed us a special image. In this image, he had injected some metadata into the last few pixels, which were not rendered, but told his program where to position the image on the canvas, and it's size.

This demo got @hyperliskdev and I thinking about what else we can do with image data. After some talk, the idea of converting application binaries to images came up. I had seen a blog post about visually decoding OOK data by converting an IQ capture to an image. With a little adaptation, I did the same for a few binaries on my laptop.

Program design

Like all ideas I have, I wrote some code to test this idea out. Above is a small sample of the interesting designs found in the gimp binary. The goals for this script were to:

If you would like to see how the code works, read "check out the script".

A note on data wrapping

By using a generator, and the range function's 3rd argument, any list can be easily split into a 2d list at a set interval.

# Assuming l is a list of data, and n is an int that denotes the desired split location
for i in range(0, len(l), n):
    yield l[i:i + n]

Binaries have a habit of not being rectangular

Unlike photos, binaries are not generated from rectangular image sensors, but instead from compilers and assemblers (and sometimes hand-written binary). These do not generate perfect rectangles. Due to this, my script simply removes the last line from the image to "reshape" it. I may end up adding a small piece of code to pad the final line instead of stripping it in the future.

Other file types

I also looked at other file types. Binaries are very interesting because they follow very strict ordering rules. I was hoping that a wav file would do something similar, but that does not appear to be the case. This is the most interesting pattern I could find in a wav file:

Back to executable data, these are small segments of a dll file:

Segment 1

Segment 2

Check out the script

This script is hosted on my GitHub account as a standalone file. Any version of python3 should work, but the following libraries are needed: