Where do you begin if you think a competitor has infringed your patented and/or copyrighted source code
or misappropriated your software trade secrets? Might extracting a binary executable from the Flash/ROM
memory at the heart of the competitor’s product be a worthwhile investment? What, if anything, can be
learned from a binary alone? That is the subject of this article on digital forensics and embedded software.
I don’t mean to suggest that extracting the binary contents from a physical chip is easy. Indeed, it can
sometimes be very difficult to extract the binary code and data out of a chip, and the known techniques
are worthy of a full-length article in their own right. Suffice it to say that a frequently possible efficient
first step is to search the Internet or certain Web sites for binaries that are being distributed as “firmware
updates” to said competitor’s existing customers. And that in some situations, it is impossible to to do anything with this or physically ripped bits due to scrambling or encryption. For the rest of this article, I presume you’ve been able to somehow obtain an (unscrambled) copy of your competitor’s binary executable.
Reverse Engineering Analysis Techniques
A February 2013 article by Daniel Cabezas and Bram Mooij in DFI News ( www.dfinews.com/
articles/2013/02/detecting-source-code-re-use-through-binary-analysis-hybrid-approach) described a so-called “hybrid approach” to comparing binaries to detect “plagiarism”. Unfortunately, the techniques of
binary differencing and cryptographic hashing described therein generally only work on binary executables
from computers that run a version of the Microsoft Windows or Linux operating system. These analysis
techniques are of little to no use when the binary executables are from embedded systems. The complication is that most embedded systems do not use dynamically linked libraries, are each built around micro-controllers from different processor families, interface to custom electronics, and don’t reliably contain any
operating system at all.
Knowledge of the target processor’s model number, and thus binary opcodes, is generally necessary to
begin reverse engineering the code. One step that can be undertaken when this is known is the use of a
disassembler, such as the popular IDA Pro ( www.hex-rays.com/products/ida/). A tool like IDA Pro makes
the process of disassembly manageable by allowing the analyst to assign his own names to variables and
subroutines and insert other notes, as he tries to make sense of the binary. Progress can generally be made,
albeit in an iterative manner. For example, if it is known that a DSP algorithm of interest must involve use
of a particular peripheral IC then the physical memory addresses of that chip’s registers can be located in
read and write instructions in the binary.
The disassembly and reverse engineering approach is most cost effective when only a portion of the
code needs to be analyzed and that portion can be located quickly. In this way it may be possible to, for
example, establish patent infringement sufficiently to bring litigation with confidence and even to make a
prima facie showing to the Court to defeat defense dismissal motions and trigger production of the actual
Litigants should be warned that it can be very time consuming to fully reverse engineer a binary via
disassembly techniques alone. Furthermore, even a perfect tool used by a perfect expert can never reproduce important source code elements such as the human-readable subroutine names, variable names, and
comments that ordinarily litter source code. Sometimes the code cannot even be fully separated from the
data without at least some guesswork.
String-Centered Analysis Techniques
A surprisingly powerful and less costly binary analysis technique, which does not require reverse engineering, is a review of the character strings contained in the executable. These strings might include, in an
ATM machine, words like:
Binary Executable Analysis Techniques
for Embedded Systems