8052 emulator in C

J

joolzg 15 years ago

Anybody got a simple 8052 emulator in C source, im trying to reverse engineer some code and would like to emulate/simulate the code to get a better understanding as it looks like it was written in C and compiled by a very bad compiler

joolz

--------------------------------- --- -- - Posted with NewsLeecher v5.0 Beta 6 Web @ http://www.newsleecher.com/?usenet ------------------- ----- ---- -- -

Vote

C

Chris H 15 years ago

In message , joolzg writes

What is the target MCU? The 51 family is huge (over 600 variants) and whilst the cores are similar there are some big differences.

Why do you want the source of the simulator?

How do you know the binary was written in C?

How big is the binary?

What is it supposed to do?

\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

Vote

H

hamilton 15 years ago

You don't what a emulator, you want a de-compiler or reverse compiler.

An emulator will just execute the binary code as the real hardware would.

Using the binary to get the C back is impossible !!!!

Except for very simple programs.

Even if you have the compiler sources and understood the compile process, you still would not be able to get the binary -> C conversion to work.

But, have fun and good luck.

hamilton

Vote

D

D Yuniskis 15 years ago

Actually, for some simple-minded compilers, you can often reverse engineer the code to get much of the "C" source (neglecting variable names, some expressions, etc.). This is especially true of old/early compilers that didn't do much optimization.

I was able to recreate C source for a client's libraries from binaries using this approach. Though it required a fair bit of "organic computing" to recognize the "patterns" in the code (a decompiler wasn't available). Of course, familiarity with the product (application) goes a long way -- especially when it comes to annotating the sources!

Note that the "organic" method can be painfully slow -- I was only able to decompile a few KB per week. :< But, the alternative is to recreate the sources from the *specification*...

[if you've never done this, it can be a really fun problem! Sure beats crossword puzzles!]

Vote

H

hamilton 15 years ago

For years I have heard that story.

I have always asked to show me any links with the compiler in question, So I will ask if you have any links to this "simple compiler" ?

I took a compiler class 30 years ago, and my professor at the time stated that it was not possible. With the better compiler available today it would be even more impossible.

Being familiar with the code is the only way to get back the C code. But the OP seems to have no knowledge of the application.

I have lost sources in disk crashes and have had to re-create the C sources by watching the operation of the application.

reverse-engineering is always easier when you have a good idea of what is suppose to happen.

Yes, building a spec for functions code is not bad, but as you say very slow.

A few years ago I had a company needing to reverse engineer their legacy assembly 68hc11 product.

I was able to recreate most of the application in C, but some of the algorithms were so convoluted that I could never understand the dis-assembly.

So, we repackaged the assembly into a C in-line assembly function and everything still worked.

Lucky !!!

Vote

A

Anders.Montonen 15 years ago

Have you looked at eg. Hex-Rays? From what I've seen of it, it's pretty good at what it does.

-a

Vote

A

Anders.Montonen 15 years ago

There's the Daniel's s51 simulator[1] which is used in the SDCC[2] debugger.

-a

[1] [2]

Vote

J

joolzg 15 years ago

In reply to "Chris H" who wrote the following:

Analog Devices ADuC84x

So i can add in a serial driver, also the output display, you know make the simulator behave like the real thing with inputs and outputs

I can tell from the way the code is written!! cant you tell the differnece between human and machine created code

64k but not all used

cant say

--------------------------------- --- -- - Posted with NewsLeecher v5.0 Beta 6 Web @ http://www.newsleecher.com/?usenet ------------------- ----- ---- -- -

Vote

J

joolzg 15 years ago

In reply to "hamilton" who wrote the following:

Ive got that already, i want to SIMULATE THE CODE and give the code real inputs so i can validate my findings

I will be rewriting it for another cpu as well so want to find out as much

joolz

--------------------------------- --- -- - Posted with NewsLeecher v5.0 Beta 6 Web @ http://www.newsleecher.com/?usenet ------------------- ----- ---- -- -

Vote

J

joolzg 15 years ago

In reply to " snipped-for-privacy@kapsi.spam.stop.fi.invalid" who wrote the following:

i want to simulate the code, hence the question!!!!! i alread have the binary and disassembly

joolz

--------------------------------- --- -- - Posted with NewsLeecher v5.0 Beta 6 Web @ http://www.newsleecher.com/?usenet ------------------- ----- ---- -- -

Vote

J

joolzg 15 years ago

In reply to " snipped-for-privacy@kapsi.spam.stop.fi.invalid" who wrote the following:

Vote

U

upsidedown 15 years ago

I guess "simple compiler" refers to some 1970's compilers for PDP-11, Intel Intellecs and Motorola Exorcisers.

Writing compilers for these platforms was problematic due to the 64 KiB address space limit. Overlay loading helped a lot (each compilation phase in a separately loaded overlay branch), but you still had to reserve space for the symbol table, that had to be kept constantly in memory. Overlay loading with floppies was also very slow, thus, much optimization could not be done. For this reason, getting assembly output from a compiler was not the standard situation.

I once wrote an object code disassembler for PDP-11. Compared to ordinary disassemblers, the object code disassembler can also display the global symbols defined in this module as well as displaying any external function names (including library function names) in plain text.

I analyzed quite a few object codes generated by Fortran, Pascal and C compilers and I was capable of detecting by "organic matching" how each compiler will generate code. After this, it was quite easy to reverse engineering some algorithms.

These days with good compilers, it is much harder to reverse engineering things based on purely the executable code.

Vote

A

Anders.Montonen 15 years ago

The DOS version was written in Pascal, the Unix version is written in C++ as you would have noticed if you'd downloaded the source code.

-a

Vote

C

Chris H 15 years ago

In message , joolzg writes

Use the Keil simulator

\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

Vote

C

Chris H 15 years ago

I doubt it will work.

\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

Vote

C

Chris H 15 years ago

In message , joolzg writes

This is NOT a true 8051/52 core. Read the documentation it is "based on" an 8052. Not all they 8051 simulators will handle the non standard

8051 parts like this one.

Then use the Keil Simulator that can do this already.

Yes... However you can not tell which HLL was used.

Use the Keil Sumulator.

\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

Vote

W

Walter Banks 15 years ago

You are going to a lot of work to reverse engineer an application. Why is this needed?

w..

Vote

A

Anders.Montonen 15 years ago

Of course you do.

-a

Vote

D

David Brown 15 years ago

It shouldn't be too hard to write a simulator yourself for a processor like this. It's quite an effort if you want it to be fast, or to accurately simulate interrupts and peripherals, but the core itself is easy - you have an array to hold "ram", and array for "flash", a struct holding the registers, and a huge switch statement interpreting each instruction.

Vote

D

D Yuniskis 15 years ago

It's relatively easy to disprove a negative. :> I'll drag out some examples and post them here. I think you;ll see that most of these early compilers were pretty "straightforward" in the way they emitted code. You could look at stanzas and deduce from what they were created (of course, you couldn't tell "a == b" from "b == a" -- though sometimes you could distinguish "a > b" from "b With the better compiler available today it would be even more impossible.

A lot depends on the code being compiled, the level of optimization used, the optimizations *available* and the actual target itself. E.g., older "single register" machines required lots of shuffling to get arguments into an accumulator where they could be operated on.

Also, older devices didn't have niceties like "MUL" or (gasp!) "DIV". So, the repertoire of "helper functions" gave you lots of insight into what the code was actually doing. And, those helpers didn't have "short-circuits" where the compiler could do a "partial" operation, etc.

I disagree. You can get back code that will recompile into the same binary. You can further embelish that with some ideas as to what the code is *likely* doing. As far as the ultimate application...

If you have the compiler (and binary libraries) available, you have a huge headstart. You can feed it test cases to see what the code looks like for various C constructs. You can see which helper functions get dragged in and, thus, start giving those real "names".

If you have the hardware available (or at least the memory map), you have known starting points for the code -- instead of picking a spot "at random".

Chances are, it uses some part of the standard libraries. These are relatively easy to recognize. So, you can put names on their entry points and back-annotate all references to them as they are encountered.

It's trivial to identify the strings in most applications (though some might go to some lengths to protect or hide them -- but that is rare and starts competing with the compiler since *it* has a notion of what constitutes a "string"). So, library functions that use strings (e.g., printf et al.) can be identified. Also, strings often give you information about the data *referenced* there -- "%d records processed.\n").

Finally, most older processors used in embedded systems were small. Few systems could afford gobs of (EP)ROM for multimegabyte images. Likewise, tens of KB of RAM was a lot. It's not like trying to reverse engineer MSBloatware...

Sure. But it isn't a necessary prerequisite.

There are (big name) firms whose businesses are based on reverse engineering other people's products -- e.g., to make something "compatible" with a closed system.

In the process, one can often find obvious "mistakes" or opportunities for improvement that the original designers overlooked.

One of my first jobs was at a firm that designed marine navigation equipment (among other things). I recall the "excitement" when a Japanese firm expressed an interest in one of our RADAR sets. I think they purchased 25 of them "for evaluation".

Some time later, *they* produced a similar product. It was very obvious that it was "heavily inspired" (avoiding the term "copied") by our set.

My boss grumbled at the lost business and having been "suckered". In the next breath, he pointed out how the "competing design" had lots of little changes that were incredibly obvious after-the-fact... but, that had been omitted in our design!

E.g., the antenna (rotor) emitted rotational pulses to tell the display which way it was pointed. This allowed the sweep in the display to be synchronized (angularly) to the antenna's position. Of course, this was done by mounting an optointerrupter and encoder wheel (slotted disc) on the antenna's shaft. I think the encoder had perhaps 1 degree azimuth resolution -- or something like that. It was relatively costly to manufacture the disc since it was done photographically, etc.

The competing product had a crude disc with perhaps 9 (!) slots cut in it. It looked like something that a child would fashion out of cardboard. *But*, the disc was mounted on the high side of the reducing gearbox that drove the antenna shaft. So, it rotated 40 times faster than the antenna! (i.e., same sort of information coming from the antenna but much lower manufacturing costs).

Without seeing "our design" with that modification made to it, I doubt it ever would have occurred to anyone!

Vote

8052 emulator in C

Join the Discussion

Didn't find your answer?