Please paste small asm-listing of:

U

Unknown 12 years ago

ldr r0,=0x20200000 mov r1,#1 lsl r1,#18 str r1,[r0,#4]

mov r1,#1 lsl r1,#16 str r1,[r0,#40]

loop$: b loop$

=== and also of:- mov r2,#0x3F0000 wait1$: sub r2,#1 cmp r2,#0 bne wait1$

I'll tell you what happens when I `dd` the byte-stream into SD, based on

formatting link

== TIA.

Vote

R

Roger Ivie 12 years ago

--- etc ---

Not sure what you're after, but one weakness of the GNU tools is the lack of proper listings. The way you get one is via objdump, but that means the code has to assemble in the first place (i.e., you can't get a listing showing where your errors are).

example:

pi1> cat argh1.s

ldr r0,=0x20200000 mov r1,#1 str r1,[r0,#4]

mov r1,#1 lsl r1,#16 str r1,[r0,#40]

loop$: b loop$

pi1> cc -c argh1.s pi1> objdump --disassemble-all argh1.o

argh1.o: file format elf32-littlearm

Disassembly of section .text:

00000000 : 0: e59f0014 ldr r0, [pc, #20] ; 1c 4: e3a01001 mov r1, #1 8: e5801004 str r1, [r0, #4] c: e3a01001 mov r1, #1 10: e1a01801 lsl r1, r1, #16 14: e5801028 str r1, [r0, #40] ; 0x28

00000018 : 18: eafffffe b 18 1c: 20200000 eorcs r0, r0, r0

Disassembly of section .ARM.attributes:

00000000 : 0: 00001c41 andeq r1, r0, r1, asr #24 4: 61656100 cmnvs r5, r0, lsl #2 8: 01006962 tsteq r0, r2, ror #18 c: 00000012 andeq r0, r0, r2, lsl r0 10: 06003605 streq r3, [r0], -r5, lsl #12 14: 09010806 stmdbeq r1, {r1, r2, fp} 18: 2c020a01 stccs 10, cr0, [r2], {1} 1c: Address 0x0000001c is out of bounds.

roger ivie rivie@ridgenet.net

Vote

R

Roger Ivie 12 years ago

Whadda ya know; I'm wrong. Poking about a bit, I found the -a option on the assembler. Don't know why I never found it before.

pi1> as -a argh1.s

ARM GAS argh1.s page 1

1 0000 14009FE5 ldr r0,=0x20200000 2 0004 0110A0E3 mov r1,#1 3 0008 041080E5 str r1,[r0,#4] 4 5 000c 0110A0E3 mov r1,#1 6 0010 0118A0E1 lsl r1,#16 7 0014 281080E5 str r1,[r0,#40] 8 9 0018 FEFFFFEA loop$: b loop$ 10 001c 00002020

ARM GAS argh1.s page 2

DEFINED SYMBOLS argh1.s:1 .text:00000000 $a argh1.s:9 .text:00000018 loop$ argh1.s:10 .text:0000001c $d

NO UNDEFINED SYMBOLS

roger ivie rivie@ridgenet.net

Vote

T

Tauno Voipio 12 years ago

Have you tried the switch

-Wa,-ahlms=mylist.lst

on the gcc command line?

-T.

Vote

F

fld 12 years ago

Thanks for the listing. It's completely different from 8-bit. I don't like it. How can `ldr r0,=0x20200000` encode to `14009FE5`?

Perhaps it can't loadImmediate [the most fundamental instruction] so the 20200000 needs to be fetched from mem: 10; per:-

10 001c 00002020

Of course that doesn't mean there's no loadImmediate instruction. It may just be the assembler's design: to put all the after the executing code.

OK!! Your previous post shows:- 0: e59f0014 ldr r0, [pc, #20] ; 1c

00000018 : 18: eafffffe b 18 1c: 20200000 eorcs r0, r0, r0

Perhaps comp.sys.arm will give me the primitive/unoptimised individual byte-codes for the operations?

Surely the 32-bits have meaningfull 'fields' like the original 8bit CPUs?

== TIA.

Vote

A

Andy Leighton 12 years ago

It is what the ARM does.

There is some rather interesting stuff going on. The assembler will change the LDR into a MOV if possible. Otherwise it inserts it as a load from memory (from a literal pool - which is usually close to that instruction).

Note in ARM assembler the second operand can be 12 bits long at most. It actually doesn't store the second operand as a 12 bit number thought. It is expressed as an 8 bit number and the last 4 bits encodes the number of Rotate Rights to do. Also it only does an even number of RORs - so the number of rotate rights is 2* the number encoded in the last 4 bits.

This is apparently more efficient.

BTW - it is all documented.

Andy Leighton => andyl@azaal.plus.com "The Lord is my shepherd, but we still lost the sheep dog trials" - Robert Rankin, _They Came And Ate Us_

Vote

R

Roger Ivie 12 years ago

Indeed.

ARM can load immediate, but it's quite limited. It can load an

8-bit value rotated into any byte position.

Perhaps you should head over to arm.com and look for the ARM Architecture Reference Manual (document number DDI-0100).

roger ivie rivie@ridgenet.net

Vote

M

Michael J. Mahon 12 years ago

It is not unusual for a 32-bit instruction machine to require two (or more on a less efficient architecture) instructions to synthesize a 32-bit immediate value. Of course, if the actual value fits in much less than 32 bits, an efficient architecture can load it in a single instruction with immediate data.

And you are correct, a lazy compiler/assembler writer may only implement the most general case, even though it deoptimizes the most common cases.

The ARM architecture is quite well designed, and that sometimes means that only hardware-significant fields, like source registers, are all aligned. Other fields may just "fill in" around the fields that must be available to hardware prior to complete decoding.

-michael - NadaNet 3.1 and AppleCrate II: http://home.comcast.net/~mjmahon

Vote

A

Avoid9Pdf 12 years ago

OK let's design a minimal Virtual-machine. The idea is to have a minimal set of instructions, which can be put at the start/reset-vector, to debug a dud rPi, in the most primitive way.

AFAIK the ports are memory mapped? Here's a simple task: REPEAT IF Inport=%5 THEN OutPort:=%A ELSE OutPort:=%5 FOREVER

So let's see what general purpose instructions this needs.

Probably the most 'general' is to do arithmetic/logical operations in 2 standard accumulators, which I expect ARM can do?

Or is it more 'general' to ALU memory? I want to select a minimal general instruction set.

Let's accept ARM's method of having the 'constants' in . In this case: Mem%5 , Mem%A

Consider: ---- InPort -> r0 Subtract r0,Mem%5 bnz ELSE Mem%A -> OutPort ELSE: Mem%A -> OutPort br Consider

Mem%5 Mem%A

But now, do you need to "ROM" the InPort, OutPort-adr and access it indirectly?

So:- Mem%5 Mem%A Mem[InPort] Mem[OutPort]

For this task, what's being optimised is the MINIMAL user knowledge/effort.

If I have to crack open a pdf ARMdoc, then I've failed.

How would you do this, so that your instructions are general purpose: good for other tasks?

== TIA.

Vote

R

Roger Ivie 12 years ago

Ah. I've noticed that the particular ARM Architecture Reference Manual I've referenced here (DDI-0100) is ARMv5. While it's good for the embedded stuff I do, Raspberry Pi uses the ARMv6 architecture. I don't think ARM makes the appropriate manual available anymore (I'm only seeing ARMv6-M, which covers the microcontrollers using only Thumb instructions), but you should be able to make use of ARMv7-AR manual (DDI-0406) if you're careful to avoid the ARMv7 specific bits.

roger ivie rivie@ridgenet.net

Vote

A

Avoid9Pdf 12 years ago

OK Thanks, Since you are well experienced with ARM/RPi, can you see any way of producing a relative-adr P-code. That means a sequence of bytes which will, eg: #N -> rX Ie. LoadImediate Hex:000000AA into RegisterX I've used 32bit data, since I assume that's what's done.

As we've discussed here, the assembler would put the constant [%AA in this case] at a location after the 'executing code'. Ie. the "#N" is separated from the complete/atomic instruction: "#N -> rX". That's no good!

I want to build a set of Pseudo-instructions, which would make a virtual-machine, which is directly programmable, by just `dd`-ing the byte-sequences, of the various p_codes, directly into the file [which has been backed-up], which normally 'starts' the RPi, as explained in the excellent Cambridge-Uni tutorial.

Perhaps I don't actually need access to 2 registers/accumulators, if ALU-ing is easily done to memory, but as an example, the 2 p-codes: #5 -> R0 #4 -> R1 must be executable by 2 stand-alone sequences of bytes, with no absolute address references, so that the byte-sequences can be relocated to any address.

LoadImediate is the most elementary P-code, and if I can't do that, then my project has failed.

Please advise.

==TIA.

Vote

T

Theo Markettos 12 years ago

.code LDR r3,[pc,#4] ; pc-relative load (pc points to current instruction +8) STMFD r13!,{r3} ; other unrelated instructions MOV pc,r14

.label ; address=(.code+8)+4 DCD 0x12345678 ; the constant to load into r3

In real life you'd want the assembler to calculate the offset for you so you'd write a pseudo-op:

LDR r3,label which is assembled as LDR r3,[pc,#some_offset]

If you want the assembler to embed the constant into the code as well you can do

LDR r3,=0x12345678

These are all functionally equivalent (though the exact offset may vary)

Theo

Vote

R

Roger Ivie 12 years ago

Can you explain why that's no good?

Another approach I've used when I was generating code on the fly and didn't have a good place to stash the constant was to break it up into bytes. Something like:

mov r1,#0xef ; r1

Vote

R

Roger Ivie 12 years ago

Something like:

ldr r3,[pc,#0] ; pick up the constant (pc points to .+8) b .+8 ; branch around constant .long 0xdeadbeef ;

Vote

A

Avoid9Pdf 12 years ago

I want to be able to enter at the shell/CLI eg. Reg1 constant was to break it up into bytes. Something like:

That seems not to solve the problem.

I don't want to know another syntax. Years ago I wrote a kwik asembler, realising that the docos explained eg: `ldi A0, #43` as: "#43 -> Reg0". So I just used: "#43 -> R0" for my syntax. ==== ]Something like: ] ] ldr r3,[pc,#0] ; pick up the constant (pc points to .+8) ] b .+8 ; branch around constant ] .long 0xdeadbeef ; R3 would be good.

If you can post the actual byte sequence, eg. via the listing, I'll test it per:

formatting link

which shows: mov r1,#1 lsl r1,#18 str r1,[r0,#4] ... mov r1,#1 lsl r1,#16 str r1,[r0,#40] ... loop$: b loop$

So we need Pseudo instructions for:

#N -> Rr Rr -> Memory:m branchRelative

I'll post my test results and proposed script. If it's ok, we can extend it to comparing-values & conditional branching.

==TIA

Vote

A

Avoid9Pdf 12 years ago

Without extra effort I haven't been able to parse your contribution. But the intention is to AVOID 'the assembler'. Apparently I managed to explain to roger ivie.

Each one of the few pseudo-instructions, must be a stand-alone/atomic byte-sequence; and position independant.

Since you're @ Organization: University of Cambridge, England perhaps you can send the byte-sequences for the later examples of cam.ac.uk/projects/raspberrypi/tutorials/os/ok02.html ..03...4 ==TIA ipNews.Send *

Vote

T

The Natural Philosopher 12 years ago

what he is saying is that position independent code to load constants is perfectly possible and although the assembler would normally take care of it, you don't have to use it, you can 'hand assemble'

Essentially 'load register from contents of where the program counter is now, plus three, then add 5 to program counter to skip the bit of memory with the constant in, ' is how this is normally achieved in ARM code.

As I understand it. Never programmed an ARM in me life tho :-(

Its the ability to load from PC counter plus offset that is key to this. It means you have plenty of room at or near the current PC counters to stick constants in. And adding a small constant to the PC counter as a sort of relative unconditional GOTO.

in sort of pseudo BASIC that qoes to

REGX=LABEL1 GOTO LABEL2: LABEL1: FF5FOH LABEL2: ....

Because label one and label2 are close to the program counter they can be expressed as relative offsets rather than absolute locations.

This is NOT the 'Intel Way' of course, Do do the same in that you would have to transfer the contents of the program counter to a register, add a constant to it, load the register from than memory and then probably increment te pointer afgainan, and transfer it to the program counter or do a relative jump. Far simpler to do a load immediate, which it understands.

It looks to me like the genius of ARM at work 'how can we work with one size of instruction only'

Depending on how big op codes are, the tail end of a 32 bit fetch as an offset or an address is likely to be quite a big number.

so you CAN address the full 32 bits of memory OR load a 32bit constant simply by putting that consists IN a bit of memory and using a load from memory addressed by register plus offset in also a 32 bit instruction.

So its no worse than a load immediate 32 bit which is also two bus cycles .

looks like a nice processor to code in assembler anyway. So long as you understand how to m,ap what you want onto its instruction set, but that was always the challenge of assembler anyway.

IIRC Z80 was a pig for inter-register transfer, but all registers could be pushed and popped off the stack so that was how you did it.

Ineptocracy (in-ep-toc?-ra-cy) ? a system of government where the least capable to lead are elected by the least capable of producing, and where the members of society least likely to sustain themselves or succeed, are rewarded with goods and services paid for by the confiscated wealth of a diminishing number of producers.

Vote

J

Johny B Good 12 years ago

====snippage of stuff outside of my area of expertise====

Just to pick up on this point, if speed was critical, you could always toggle between the prime and secondary registers (IIRC, there were two such directives, one for the AF register pair and another for the general register pairs (BC, DE and HL)).

Regards, J B Good

Vote

G

Guesser 12 years ago

Getting stuff between the normal and shadow registers is a bit of a fiddle because BC/BC', DE/DE', and HL/HL' all swap together with the exx instruction. That's where you end up pushing them on to the stack, exchanging register set and popping them off:

Copying between normal register (pairs) you just us ld R,R instructions:

Vote

A

Avoid9Pdf 12 years ago

]what he is saying is that position independent code to load constants is ]perfectly possible and although the assembler would normally take care ]of it, you don't have to use it, you can 'hand assemble' ] ]Essentially 'load register from contents of where the program counter is ]now, plus three, then add 5 to program counter to skip the bit of memory ]with the constant in, ' is how this is normally achieved in ARM code.

Yes. That's how Roger Ivie explained it.

Assuming 4-byte instructions, let's have the byte-list for:-

0: [.+4] -> r0 4: 8: r0 -> c: 10: Branch Relative

Or if I'm wrong, just post the corrected version.

So, 3 P-codes: #N -> rR = 8bytes rR -> (mem) = 8bytes ? or 12 ? br #N = 4bytes ? for short-range ? perhps ?

Then design/add some more P-codes.

How would you eg. decrement and compare rR against a constant, and BranchOnEqual ...etc?

==TIA.

Vote

Please paste small asm-listing of:

Join the Discussion

Didn't find your answer?