Not sure what you're after, but one weakness of the GNU tools is the lack of proper listings. The way you get one is via objdump, but that means the code has to assemble in the first place (i.e., you can't get a listing showing where your errors are).
example:
pi1> cat argh1.s
ldr r0,=0x20200000 mov r1,#1 str r1,[r0,#4]
mov r1,#1 lsl r1,#16 str r1,[r0,#40]
loop$: b loop$
pi1> cc -c argh1.s pi1> objdump --disassemble-all argh1.o
There is some rather interesting stuff going on. The assembler will change the LDR into a MOV if possible. Otherwise it inserts it as a load from memory (from a literal pool - which is usually close to that instruction).
Note in ARM assembler the second operand can be 12 bits long at most. It actually doesn't store the second operand as a 12 bit number thought. It is expressed as an 8 bit number and the last 4 bits encodes the number of Rotate Rights to do. Also it only does an even number of RORs - so the number of rotate rights is 2* the number encoded in the last 4 bits.
This is apparently more efficient.
BTW - it is all documented.
--
Andy Leighton => andyl@azaal.plus.com
"The Lord is my shepherd, but we still lost the sheep dog trials"
- Robert Rankin, _They Came And Ate Us_
It is not unusual for a 32-bit instruction machine to require two (or more on a less efficient architecture) instructions to synthesize a 32-bit immediate value. Of course, if the actual value fits in much less than 32 bits, an efficient architecture can load it in a single instruction with immediate data.
And you are correct, a lazy compiler/assembler writer may only implement the most general case, even though it deoptimizes the most common cases.
The ARM architecture is quite well designed, and that sometimes means that only hardware-significant fields, like source registers, are all aligned. Other fields may just "fill in" around the fields that must be available to hardware prior to complete decoding.
--
-michael - NadaNet 3.1 and AppleCrate II: http://home.comcast.net/~mjmahon
OK let's design a minimal Virtual-machine. The idea is to have a minimal set of instructions, which can be put at the start/reset-vector, to debug a dud rPi, in the most primitive way.
AFAIK the ports are memory mapped? Here's a simple task: REPEAT IF Inport=%5 THEN OutPort:=%A ELSE OutPort:=%5 FOREVER
So let's see what general purpose instructions this needs.
Probably the most 'general' is to do arithmetic/logical operations in 2 standard accumulators, which I expect ARM can do?
Or is it more 'general' to ALU memory? I want to select a minimal general instruction set.
Let's accept ARM's method of having the 'constants' in . In this case: Mem%5 , Mem%A
Ah. I've noticed that the particular ARM Architecture Reference Manual I've referenced here (DDI-0100) is ARMv5. While it's good for the embedded stuff I do, Raspberry Pi uses the ARMv6 architecture. I don't think ARM makes the appropriate manual available anymore (I'm only seeing ARMv6-M, which covers the microcontrollers using only Thumb instructions), but you should be able to make use of ARMv7-AR manual (DDI-0406) if you're careful to avoid the ARMv7 specific bits.
OK Thanks, Since you are well experienced with ARM/RPi, can you see any way of producing a relative-adr P-code. That means a sequence of bytes which will, eg: #N -> rX Ie. LoadImediate Hex:000000AA into RegisterX I've used 32bit data, since I assume that's what's done.
As we've discussed here, the assembler would put the constant [%AA in this case] at a location after the 'executing code'. Ie. the "#N" is separated from the complete/atomic instruction: "#N -> rX". That's no good!
I want to build a set of Pseudo-instructions, which would make a virtual-machine, which is directly programmable, by just `dd`-ing the byte-sequences, of the various p_codes, directly into the file [which has been backed-up], which normally 'starts' the RPi, as explained in the excellent Cambridge-Uni tutorial.
Perhaps I don't actually need access to 2 registers/accumulators, if ALU-ing is easily done to memory, but as an example, the 2 p-codes: #5 -> R0 #4 -> R1 must be executable by 2 stand-alone sequences of bytes, with no absolute address references, so that the byte-sequences can be relocated to any address.
LoadImediate is the most elementary P-code, and if I can't do that, then my project has failed.
Another approach I've used when I was generating code on the fly and didn't have a good place to stash the constant was to break it up into bytes. Something like:
I want to be able to enter at the shell/CLI eg. Reg1 constant was to break it up into bytes. Something like:
That seems not to solve the problem.
I don't want to know another syntax. Years ago I wrote a kwik asembler, realising that the docos explained eg: `ldi A0, #43` as: "#43 -> Reg0". So I just used: "#43 -> R0" for my syntax. ==== ]Something like: ] ] ldr r3,[pc,#0] ; pick up the constant (pc points to .+8) ] b .+8 ; branch around constant ] .long 0xdeadbeef ; R3 would be good.
If you can post the actual byte sequence, eg. via the listing, I'll test it per:
formatting link
which shows: mov r1,#1 lsl r1,#18 str r1,[r0,#4] ... mov r1,#1 lsl r1,#16 str r1,[r0,#40] ... loop$: b loop$
So we need Pseudo instructions for:
#N -> Rr Rr -> Memory:m branchRelative
I'll post my test results and proposed script. If it's ok, we can extend it to comparing-values & conditional branching.
Without extra effort I haven't been able to parse your contribution. But the intention is to AVOID 'the assembler'. Apparently I managed to explain to roger ivie.
Each one of the few pseudo-instructions, must be a stand-alone/atomic byte-sequence; and position independant.
Since you're @ Organization: University of Cambridge, England perhaps you can send the byte-sequences for the later examples of cam.ac.uk/projects/raspberrypi/tutorials/os/ok02.html ..03...4 ==TIA ipNews.Send *
what he is saying is that position independent code to load constants is perfectly possible and although the assembler would normally take care of it, you don't have to use it, you can 'hand assemble'
Essentially 'load register from contents of where the program counter is now, plus three, then add 5 to program counter to skip the bit of memory with the constant in, ' is how this is normally achieved in ARM code.
As I understand it. Never programmed an ARM in me life tho :-(
Its the ability to load from PC counter plus offset that is key to this. It means you have plenty of room at or near the current PC counters to stick constants in. And adding a small constant to the PC counter as a sort of relative unconditional GOTO.
Because label one and label2 are close to the program counter they can be expressed as relative offsets rather than absolute locations.
This is NOT the 'Intel Way' of course, Do do the same in that you would have to transfer the contents of the program counter to a register, add a constant to it, load the register from than memory and then probably increment te pointer afgainan, and transfer it to the program counter or do a relative jump. Far simpler to do a load immediate, which it understands.
It looks to me like the genius of ARM at work 'how can we work with one size of instruction only'
Depending on how big op codes are, the tail end of a 32 bit fetch as an offset or an address is likely to be quite a big number.
so you CAN address the full 32 bits of memory OR load a 32bit constant simply by putting that consists IN a bit of memory and using a load from memory addressed by register plus offset in also a 32 bit instruction.
So its no worse than a load immediate 32 bit which is also two bus cycles .
looks like a nice processor to code in assembler anyway. So long as you understand how to m,ap what you want onto its instruction set, but that was always the challenge of assembler anyway.
IIRC Z80 was a pig for inter-register transfer, but all registers could be pushed and popped off the stack so that was how you did it.
--
Ineptocracy
(in-ep-toc?-ra-cy) ? a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
====snippage of stuff outside of my area of expertise====
Just to pick up on this point, if speed was critical, you could always toggle between the prime and secondary registers (IIRC, there were two such directives, one for the AF register pair and another for the general register pairs (BC, DE and HL)).
Getting stuff between the normal and shadow registers is a bit of a fiddle because BC/BC', DE/DE', and HL/HL' all swap together with the exx instruction. That's where you end up pushing them on to the stack, exchanging register set and popping them off:
Copying between normal register (pairs) you just us ld R,R instructions:
]what he is saying is that position independent code to load constants is ]perfectly possible and although the assembler would normally take care ]of it, you don't have to use it, you can 'hand assemble' ] ]Essentially 'load register from contents of where the program counter is ]now, plus three, then add 5 to program counter to skip the bit of memory ]with the constant in, ' is how this is normally achieved in ARM code.
Yes. That's how Roger Ivie explained it.
Assuming 4-byte instructions, let's have the byte-list for:-
0: [.+4] -> r0
4:
8: r0 -> c:
10: Branch Relative
Or if I'm wrong, just post the corrected version.
So, 3 P-codes: #N -> rR = 8bytes rR -> (mem) = 8bytes ? or 12 ? br #N = 4bytes ? for short-range ? perhps ?
Then design/add some more P-codes.
How would you eg. decrement and compare rR against a constant, and BranchOnEqual ...etc?
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.