AI and decompilation?

The programs were reading our files. We already had record layouts for those files.

--
Dan Espen
Reply to
Dan Espen
Loading thread data ...

Yep.

One place I was working they had a lost source code program reconstructed from object code and they were complaining no one could work on it because of the variable and routine names.

Seemed easy enough to me and I fixed it up in a day or 2.

--
Dan Espen
Reply to
Dan Espen

Why would you do that instead of reading a reference manual for the target architecture?

--
https://www.greenend.org.uk/rjk/
Reply to
Richard Kettlewell

The documentation for the GPU on the RPi has not been published, he seeks to reverse engineer it from the binary code that implements a published API on it.

--
Steve O'Hara-Smith                          |   Directable Mirror Arrays 
C:\>WIN                                     | A better way to focus the sun 
The computer obeys and wins.                |    licences available see 
You lose and Bill collects.                 |    http://www.sohara.org/
Reply to
Ahem A Rivet's Shot

Yes. I am certain that certain compilers and certain languages leave a fingerprint, Always THAT resister, used to do THAT job, always that particular sequence of assembly to mimic that high level construct. I cut my teeth on microprocessor assembly. The C. Some things that are neat in assembler are ugly as sin in C. Take a call table. In assembler, you set up a range of memory whose contents contain the addresses of subroutines. You load the accumulator with a number, left shift it once, add it to the content of a register set to point to the base of that memory block, and use that register as pointing to an address whose contents are the address you want to 'call' Simple, efficient and provided you ensure nothing out of bounds is in the accumulator, bomb proof.

Now try that in C, you need an array of pointers to functions, and a simple check on the index you engage, followed by a declaration to call the function whose address is in the array of pointers to functions. I never ever managed to get an 8 bit compiler to actually do that. People just don't call the contents of an array of pointers to functions.

Its easier by far to set up a switch statement, which takes care of out of bounds defaults, and ends up producing a chain of if..else if.. else conditional calls to hardwired functions.

That's how you write it, because its pretty much as fast on a pipelined processor, RAM is cheap and comprehensibility beats programming elegance hands down in the real world.

I've examined a lot of compiled machine code and its pretty easy to tell what language it is, and what roughly it was written as. Stack based variables is a bit of a give away pointing to C or a similar langauge. highly optimised compilers of course automatically obfuscate things, but that's the fun isn't it?

I gave up writing assembler for *86 CPUs when the Gnu compiler was patently doing a better job than I would in assembler, and the ability to write something long winded and easy to understand and have the compiler completely rearrange it and turn it into three lines of incomprehensible assembler, was to be respected.

I think it is up to a limited point entirely possible to make an AI that could replace machine code with editable and compilable source code. But there will always be the Problem Of Induction. Many many possible constructs in source using an infinite number of random variable and function names, could compile to the same object code. And there is no way to reinstate the comments either, so it becomes an exercise ultimately in hand editing and reinstating the comments manually - almost as big a job as writing from scratch.

I suspect this is how Linux writers write freeware drivers for proprietary hardware. Disassemble the manufacturers drivers, and at least mimic the program flow, if not the actual source code.

--
     ?I know that most men, including those at ease with problems of the  
greatest complexity, can seldom accept even the simplest and most  
obvious truth if it be such as would oblige them to admit the falsity of  
conclusions which they have delighted in explaining to colleagues, which  
they have proudly taught to others, and which they have woven, thread by  
thread, into the fabric of their lives.? 

     ? Leo Tolstoy
Reply to
The Natural Philosopher

+1001
--
"First, find out who are the people you can not criticise. They are your  
oppressors." 
      - George Orwell
Reply to
The Natural Philosopher

Yes, I understand how you can disassemble a simple program. I did it myself in the 1980s.

However modern programs are much more complex. They are built upon many levels of indirection, libraries, composition, inheritance, function pointers, events, etc, etc... We use structure, design patterns and such like to allow us to recognise complex ideas quickly. That gets lost in compilation.

I just can't see how I would reverse engineer an understanding of anything but the most simple disassembly in any reasonable time frame.

Reply to
Pancho

Well you have hints. From what the code does...lets say you have code that loads data from two stack based memory locations adds them together and used then to access what is clearly an array, - that gives a strong hint that the original variables can be integers, and the index one is simply a temporary way to get a value into that array, so you call that 'i' or 'arrayIndex' pro tem...

Then once you have an idea as to what data that array holds, you can update it and the index to something more meaningful.

The whole process is actually covered in philosophy: It is the problem of induction. How do you work back from results to causes?

Given that the answer to Life The Universe and Everything was '42', what in fact was the question? (40+2)? (6x7)?

There are an infinite number of expressions that give that answer, and an infinite number that don't.

This is where Karl Poppers philosophy of science steps in. Instead of regarding there to be One True Reason why science works, namely that scientists are in the business of discovering the Truth, he pointed out that just because stuff worked (and 6x7 does indeed give 42) that was no reason to suppose that some other completely different construct might not work equally as well, and that had indeed happened with relativity and Newtonian gravity.

The Problem of Induction is that many theories can give the same predicted result. Sherlock Holmes is a sham. The Dog That Didnt Bark in the Night didn't bark, allegedly, because it knew the thief. Why? It might have been abducted by aliens, drugged, actually out hunting rabbits, in a soundproof box, or the Russians did it using a robot. or just too plumb wore out with old age to care.

The truth is not provable. All we have is stuff that works. Given running machine code, there are an infinite number of source codes that might have produced it, and an infinite number that did not.

We aren't there, ultimately, to reproduce *the* exact source, but to arrive at *an* editable source, that we can use. Like science, and religion, it doesn't have to be true, to be useful, and like science, and religion, its ultimate content will be forever truth-indecidable.

--
"First, find out who are the people you can not criticise. They are your  
oppressors." 
      - George Orwell
Reply to
The Natural Philosopher

I was under the impression it was a VideoCore IV, which appears to be sufficiently documented for GNU toolchain port.

formatting link
formatting link

--
https://www.greenend.org.uk/rjk/
Reply to
Richard Kettlewell

If that became possible, it would not be a far step for an AI machine to self-analyse itself or another AI machine. It could make clones and unwittingly modify them.

Who knows where that could lead, or what mutations could happen? Life?

The Chinese would be very interested in you.

I'm sure some of the architecture is provided in layers, some public like frame buffers and some not like acceleration features. So your machine code experiments could be done on the former, to learn to walk first. Or choose another more open graphics chipset if you need more documentation to get to first base. Perhaps there is on a low end mobile phone?

Here's a manual way of reverse engineering random chinese hardware.

[016] IT9919 Hacking - part 1 - Reading firmware with flashrom
formatting link

Your AI solution would have to replicate the ability of the human.

--
Adrian C
Reply to
Adrian Caspersz

ISTR that my attack on the executable started by seeking out lines of code that might be subroutine calls, "JSR PC, address" in the PDP11 code. This served to create a number of identifiable and separate blocks from which to proceed.

Of course, this was much easier as it was a stand-alone paper tape program with no operating system underneath to muddy the water.

Reply to
gareth evans

+1
--
--   
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie

Indeed!

I've discussed this before (And probably too often according to my biographers and stalkers! but I'm interested in computers for themselves, as wonderful complex machines, and not interested in what you can use them for.

My frustration lies with the Raspberry Pi series that come, for very little outlay of pennies, with a multi processor graphics chip which is believed to exceed the capabilities of the associated ARM processor but about which no detailed information is forthcoming.

Reply to
gareth evans

Because no such manuals are available. The BroadCom GPUs are a closely guarded proprietary secret to hoi polloi.

Reply to
gareth evans

That's an interesting and thought-provoking aside!

Reply to
gareth evans

The first of those does not produce anything.

Does the second describe the GPU in some detail and describe the instruction set such that I might produce my own binary blob to do something completely different?

Also, AIUI, a different GPU has been incorporated into the

64-bit RPis.

Anyway, thanks for your input.

Reply to
gareth evans

Because there are features not described in the reference manual.

Reply to
J. Clarke

One of my former colleagues did a Ph.D. on it:

formatting link

--
Using UNIX since v6 (1975)... 

Use the BIG mirror service in the UK: 
 http://www.mirrorservice.org
Reply to
Bob Eager

The Natural Philosopher schrieb:

One thing that is hard to do with C is to have different entries to the same function, something like:

bar: .cfi_startproc ... do something foo: ... do something else

ret

and then either call foo or bar.

Reply to
Thomas Koenig

Adrian Caspersz schrieb:

The solution to the halting problem :-)

Reply to
Thomas Koenig

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.