Hi All, I'm collecting little tricks that will stymie a disassembler (that is, prevent it from disassembling the code correctly) to use in a book project I'm working on ("The Art of Disassembly"). I've collected a bunch of tricks over the years (OhMyGosh, it's getting to be decades now), but chances are pretty good that I've missed some pretty good ones.
Here are some of the ideas I'm using in the book:
- Burying data in the code stream
- Placing code in the middle of data objects (a variant of [1]).
- Arithmetic expressions involving two relocatable addresses (e.g., lbl1-lbl2)
- Burying instructions within the opcodes of other instructions
- Using alignment operations in code and data
- Writing code that does not have well-defined procedure/function boundaries
- Overlapping data tables and, in general, making data boundaries fuzzy.
- Using unions and variant types to make it difficult to infer a data object's type
- Writing interpreters that allow a mixture of 80x86 and interpretive code in the code stream
- Using the breakpoint (int 3) and trace flag facilities within the application
- Using the machine instructions that correspond to a copyright notice (or other string) do useful computations within the program.
- Using the data at some location as both program data and executable
machine code (a generalization of [11]). This includes, for example, self-modifying code.
- Using lots of dynamically-linked libraries to make it difficult (or
even impossible) for a disassembler to infer much about the external code.
- Creating wrappers for system APIs to make it difficult for heuristic analysis to make any headway processing those calls.
My interest in this subject is duomorphic. I want to be able to discuss
how to overcome these problems when using (or writing) a disassembler; I also want to discuss how to help obfuscate object code to make it difficult to disassemble. Any and all constructive comments, suggestions, and examples are welcome. Cheers, Randy Hyde