arm11/armv6 right shift signed packed values

Aug 12, 2008 3 Replies

Rate this thread:

johann.koenig 17 years ago

I'm attempting to pack some numbers for output after doing some work. They're currently r7 = v0|v4 and r10 = v1|v5. They all need to be >>3, before or after repacking. Output will be v1|v0 and v5|v4 (little endian architecture). I managed to get the v1|v0 written reasonably efficiently: mov r8, r10, asr #3 ; 1>>3|xxx pkhtb r8, r8, r7, asr #19 ; 1>>3|,0>>3 str r8, [r0], r2 ; o1|o0, post inc

But v5|v4 is a little ugly because I'm starting with the least significant bits, so right shifting is going to drag in the bottom of the upper word (right?). Right now I'm sign extending, then writing individual shorts. mov r8, r10, asr #3 ; 5 >> 3 strh r8, [r0, #2] ; o5 sxth r1, r1 ;

sxth r7, r7 ; mov r8, r7, asr #3 ; 4 >> 3 strh r8, [r0], r2 ; o4, post inc

I found

formatting link

PKHBT R3, R1, R2, LSL #15 ; R3 = [R2>>1, R1] PKHTB R3, R3, R1, ASR #1 ; R3 = [R2>>1, R1>>1] However, that seems to rely on the input being full words.

Is there a better way to do this?

Vote

Johann 17 years ago

t inc

Bit lazy with the copy/paste. Should be: sxth r10, r10 ; sign extend 5 sxth r7, r7 ; sign extend 4 mov r8, r10, asr #3 ; 5 >> 3 strh r8, [r0, #2] ; o5 mov r8, r7, asr #3 ; 4 >> 3 strh r8, [r0], r2 ; o4, post inc

Vote

Wilco Dijkstra 17 years ago

formatting link

An easy alternative would be to shift r10 and r7 left by 16 and then apply your first sequence. This way you save and instruction and use str.

However the best option would be to avoid shifting at this stage. Unless it is the final result, delaying the shift until the next processing step might be cheaper. Another possibility is to use halving additions if you do any, so that the result is already shifted.

Wilco

Vote

Johann 17 years ago

ght be

Thanks for the tip. At first I thought it would use extra instructions to do the shift, but then I realized that would just replace the sign extends. Unfortunately, this is the only way I can do the operation. The shift has to be the last thing, and can't be pre-processed at the receiving end. New code saves 1 store per loop: mov r10, r10, lsl #16 ; 5|x mov r7, r7, lsl #16 ; 4|x mov r10, r10, asr #3 ; 5>>3|xxx pkhtb r10, r10, r7, asr #19 ; 5>>3|4>>3 str r10, [r0], r2 ; o5|o4, post inc

You mentioned halving addition, but I can't find anything about that. It probably wouldn't help in this case, since the math goes like (x+y

+4)>>3 or (x-y+4)>>3. I can't add 4 first because the subtraction is associative.

-Johann

Vote

Join the Discussion

Have something to add? Share your thoughts — no account required.

Didn't find your answer?

Ask the community — no account required