Creating software that is secure against reverse engineering, tampering, and malicious modification is an ongoing research problem. Reverse engineers employ a variety of attack methods to understand how software works, steal intellectual property, or inject their own malicious code. Among many reverse engineering attack methods is dynamic symbolic execution (DSE): a method which executes a program with concrete and symbolic inputs, then determines what the inputs need to be in order to reach certain paths of a program. This can be used to learn what inputs are required to bypass checks present in code, like a license. Coupled with other dynamic and static attack methods, this proves to be a powerful kit for reverse engineering. Reverse engineering is protected against by two main methods: obfuscation and encryption with many sub-categories under each. Encryption locks the contents of a program, and it can only be unlocked by a key that is kept secret from the reverse engineer. Obfuscation transforms a program's instructions/data to be functionally equivalent but much harder for a human or automated tool to analyze and understand. With each new attack method devised by reverse engineers, like DSE, a new protection is developed to counter it. In the case of DSE, obfuscation transforms have been developed which exploit a key weakness of the DSE attack method: branch explosion. SymbexSplit and SymbexFor are two obfuscation transforms which introduce additional branches into code for the dynamic symbolic execution engine to evaluate. By constructing these branches intelligently, the number of constraints that the engine must solve explodes exponentially. This thesis presents ObfuscatorDynamic-LLVM (OD-LLVM), a prototype obfuscation tool which injects the SymbexSplit and SymbexFor protections into code at compile time. We leverage the LLVM compiler infrastructure and implement these protections as an optimization pass of the Clang front-end compiler. For portability and scalability, we implement SymbexSplit and SymbexFor to transform LLVM Intermediate Representation (IR). LLVM IR is the programming language that sits between a program's high level language (like C/C++), and it's target architecture language (like x86/ARM). The LLVM project views a compiler as a collection of modular components that form a full compiler when strung together properly, and the LLVM optimizer operates only on LLVM IR. By transforming LLVM IR to apply the SymbexSplit and SymbexFor protections, it allows our approach to be agnostic of source language and target architecture. These protections are made available for free as an open-source contribution to the LLVM project to foster further research. We conduct experiments to evaluate OD-LLVM's protection capabilities versus a state-of-the-art obfuscator. We evaluate our obfuscator by mimicking a reverse engineering attack scenario with the KLEE dynamic symbolic execution engine. KLEE is also built using the LLVM compiler infrastructure, so its use is very natural for this work. From our evaluation, we find that OD-LLVM is able to successfully induce timeouts in the KLEE DSE engine while not dramatically increasing the run-time overhead or code size of the protected program.
Metrics
62 File views/ downloads
113 Record Views
Details
Title
ObfuscatorDynamic-LLVM
Creators
Daniel Dinu
Contributors
Anup Das (Advisor)
Awarding Institution
Drexel University
Degree Awarded
Master of Science (M.S.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Number of pages
xi, 58 pages
Resource Type
Thesis
Language
English
Academic Unit
College of Engineering (1970-2026); Electrical (and Computer) Engineering [Historical]; Drexel University