C18 Compiler again

M

Meindert Sprang 16 years ago

Unbelievable.....

I'm playing around with the Microchip C18 compiler after a hair-splitting experience with CCS. Apparently the optimizer of C18 is not that good. For instance: LATF = addr >> 16; where addr is an uint32, is compiled into a loop where 4 registers really get shifted 16 times in a loop. Any decent compiler should recognise that a shift by 16, stored to an 8 bit port could easily be done by simply accessing the 3rd byte.... sheesh....

Meindert

Vote

D

D Yuniskis 16 years ago

Is LATF *defined* as a uint8_t? (i.e., does the compiler *know* it can discard all but the lowest 8 bits?)

Is uuint32_t *really* unsigned (and not a cheap hack to "long int")? I.e., can the compiler be confused (by the definition) to thinking it is signed and opting for a sign-preserving shift?

How about:

uint8_t pointer;

pointer = (uint8_t *) &addr; LATF = pointer[2];

Clumsy, admittedly, but perhaps more obvious what's going on? (I would have added that this would be easy for an optimizer to reduce to an "addressing operation" but I also would have expected your shift to be recognized as an easy optimization!)

Vote

D

D Yuniskis 16 years ago

uint8_t *pointer;

(sorry, too early in the morning to be writing code :> )

Vote

J

Joe Chisolm 16 years ago

From the Microchip supplied USB code

POINTER addr;

LATF = addr.bHigh; //simple and to the point

If addr is static this will probably compile to a simple movff. If addr is on the stack it gets a little more complicated.

#ifndef TYPEDEFS_H #define TYPEDEFS_H

typedef unsigned char byte; // 8-bit typedef unsigned int word; // 16-bit typedef unsigned long dword; // 32-bit

typedef union _BYTE { byte _byte; struct { unsigned b0:1; unsigned b1:1; unsigned b2:1; unsigned b3:1; unsigned b4:1; unsigned b5:1; unsigned b6:1; unsigned b7:1; }; } BYTE;

typedef union _WORD { word _word; struct { byte byte0; byte byte1; }; struct { BYTE Byte0; BYTE Byte1; }; struct { BYTE LowB; BYTE HighB; }; struct { byte v[2]; }; } WORD; #define LSB(a) ((a).v[0]) #define MSB(a) ((a).v[1])

typedef union _DWORD { dword _dword; struct { byte byte0; byte byte1; byte byte2; byte byte3; }; struct { word word0; word word1; }; struct { BYTE Byte0; BYTE Byte1; BYTE Byte2; BYTE Byte3; }; struct { WORD Word0; WORD Word1; }; struct { byte v[4]; }; } DWORD; #define LOWER_LSB(a) ((a).v[0]) #define LOWER_MSB(a) ((a).v[1]) #define UPPER_LSB(a) ((a).v[2]) #define UPPER_MSB(a) ((a).v[3])

typedef void(*pFunc)(void);

typedef union _POINTER { struct { byte bLow; byte bHigh; }; word _word; // bLow & bHigh byte* bRam; // Ram byte pointer: 2 bytes pointer pointing // to 1 byte of data word* wRam; // Ram word poitner: 2 bytes poitner pointing // to 2 bytes of data

rom byte* bRom; // Size depends on compiler setting rom word* wRom; } POINTER;

typedef enum _BOOL { FALSE = 0, TRUE } BOOL;

#define OK TRUE #define FAIL FALSE

#endif //TYPEDEFS_H

Joe Chisolm Marble Falls, Tx.

Vote

G

George Neuner 16 years ago

You're asking a lot.

I've been programming since 1977 and I have never seen any compiler turn a long word shift (and/or mask) into a corresponding short word or byte access. Every compiler I have ever worked with would perform the shift.

That said, something is wrong if it takes 4 registers. I don't know the PIC18, but I never encountered any chip that required more than 2 registers to shift a value. Many chips have only a 1-bit shifter and require a loop to do larger shifts - but many such chips microcode the shift loop so the programmer sees only a simple instruction. But, occasionally, you do run into oddballs that need large shifts spelled out.

Most likely you're somehow reading the (dis)assembly incorrectly: 4 temporaries that are really mapped into the same register. If the compiler (or chip) really does need 4 registers to do a shift, then it's a piece of sh*t.

George

Vote

J

Joe Chisolm 16 years ago

You have a 8 bit architecture shifting a 32 bit value, shifting out of one byte and into the next, thus 4 temps. You have 1 bit shifts. I suspect the compiler is generating a right shift into carry so the code can tell if a 1 needs to be moved into the most significant bit of the next byte.

Joe Chisolm Marble Falls, Tx.

Vote

D

D Yuniskis 16 years ago

It would be informative to know what sort of "helper routines" the compiler calls on. E.g., it might (inelegantly) treat this as "CALL SHIFT_LONG_RIGHT, repeat" -- in which case the

4 temp access is the canned representation of *any* "long int".

I think George is commenting that a *smart* compiler can realize that an (e.g.) 8 bit shift is: foo[2] = foo[3] foo[1] = foo[2] foo[0] = foo[1] (if you are casting to a narrower data type and can discard foo[3])

and a *9* bit shift is the same as the above with a *single* bit shift introduced (i.e., you operate on a byte at a time instead of the entire "long")

(recall, the shift amount is a constant available at compile time)

Vote

G

Grant Edwards 16 years ago

Really?

I've seen quite a few compilers do that. For example, gcc for ARM does:

------------------------------testit.c------------------------------ unsigned long ul;

unsigned char foo(void) { return ul>>8; }

unsigned short bar(void) { return ul>>16; }

------------------------------testit.c------------------------------

$ /home/nextgen/toolchain/bin/arm-linux-gcc -c -Os -S -fomit-frame-pointer testit.c

------------------------------testit.s------------------------------ .arch armv5te [...] .file "testit.c" .text .align 2 .global foo .type foo, %function foo: ldr r3, .L3 ldrb r0, [r3, #1] @ zero_extendqisi2 bx lr .L4: .align 2 .L3: .word ul .size foo, .-foo .align 2 .global bar .type bar, %function bar: ldr r3, .L7 ldrh r0, [r3, #2] bx lr .L8: .align 2 .L7: .word ul .size bar, .-bar .comm ul,4,4 [...]

------------------------------testit.s------------------------------

Grant Edwards grant.b.edwards Yow! I'm young ... I'm at HEALTHY ... I can HIKE gmail.com THRU CAPT GROGAN'S LUMBAR REGIONS!

Vote

D

David Brown 16 years ago

That's irrelevant (or should be!) - expressions are evaluated in their own right, and /then/ cast to the type of the LHS. The compiler should, as it does, initially treat it as a 32-bit shift, but it's a poor compiler that can't optimise a 32-bit shift by 16 to something better than this. Optimising it to a single byte transfer comes logically at a later stage.

I believe that uint32_t /must/ be an unsigned 32-bit integer. If the compiler cannot work with such a type, then no such type should exist in . A standards-compliant compiler is not allowed to cheat in that way. Of course, I don't know if Microchip's compiler claims to be standards compliant...

Vote

J

Joe Chisolm 16 years ago

I agree with your statement. The C18 suite has some canned libraries like

32 bit division and such. There are other helper routines for doing delays and such.

I just did a test using C18. I choose a 18F86J10 (for no particular reason other than I remember it has a port F and thus a LATF)

For: static unsigned long addr; LATF = addr >> 16;

I get results similar to what you have above. The compiler "shifts" addr into a 32 bit temp by doing two byte moves and two clear byte instructions. It then does a 1 byte move into LATF from the temp. I'm not sure what version the OP is using or what else might be going on behind the scenes with addr. I agree a compiler should be smarter but for the price (free) C18 is not bad for smaller projects.

BTW: I did a quick test with gcc 4.4.1 and it does a load, shift 16 and a store byte.

Joe Chisolm Marble Falls, Tx.

Vote

G

George Neuner 16 years ago

Interesting. But now that I think about it, I almost use shift with a constant count - it's almost always a computed shift - and even when the shift is constant, the value is often in a variable anyway due to surrounding processing.

- What version of GCC is it?

- What does it do if the shift count is a variable?

- What does it do for ((ul & 0xFFFFFF) >> 8) or ((ul >> 8) & 0xFFFF)?

If it recognizes the last as wanting just the middle word then that would be impressive.

George

Vote

T

Thad Smith 16 years ago

And the later stage optimally comes before generating final code. It is logical that a good optimizer transform the statement to single byte move.

Thad

Vote

G

Grant Edwards 16 years ago

4.4.3

It uses a shift instruction. There's not really anyting else it could do with a variable shift count.

ldr r0, [r3, #0] mov r0, r0, asl #8 mov r0, r0, lsr #16

Recognizing the last two as wanting just the middle word is moot because that 16-bit word is misaligned and can't be accessed using a 16-bit load instruction.

Grant

Vote

G

George Neuner 16 years ago

testit.c

GCC 4.4.0 on x86 with the same flags gives:

------------------------------testit.s------------------------------ .file "testit.c" .text .globl _foo .def _foo; .scl 2; .type 32; .endef _foo: movl _ul, %eax shrl $8, %eax ret .globl _bar .def _bar; .scl 2; .type 32; .endef _bar: movzwl _ul+2, %eax ret .comm _ul, 16 # 4

------------------------------testit.s------------------------------

It optimized the half shift but not the quarter shift.

George

Vote

J

John Temples 16 years ago

Here's what Hi-Tech's PIC18 compiler does:

853 ;t.c: 59: LATF = addr >> 16; 854 00FFFA C0FE FF8E movff _addr+2,3982 ;volatile

John W. Temples, III

Vote

D

David Brown 16 years ago

Yes, I meant a later logical stage within the compiler. Note that it may be an /actual/ later stage (such as a peephole optimisation), or combined with earlier optimisations. It comes later logically, but the actual order is implementation dependent.

Vote

D

David Brown 16 years ago

Some compilers will use shifts, some will use byte or word movements.

On the ARM, a compiler will often use shifts because shifts (especially by constants) are very cheap on the ARM architecture, while unaligned and non-32-bit memory accesses may be expensive or illegal (depending on the ARM variant).

A quick test with avr-gcc shows that it uses byte register movements rather than shifts, although it's not optimal for 32-bit values (it is fine for 16-bit values, which are much more common in an 8-bit world). For your example below of "((ul& 0xFFFFFF)>> 8)" it is close to perfect.

That's very nice code generation - faster (on an ARM anyway) than using masking.

Vote

M

Meindert Sprang 16 years ago

Yes.

Both types are explicitly typed as unsigned. That is as far as my influence goes. Even the crappy toy compiler of CCS does this right. My Imagecraft AVR compiler does it right. I even remember that my old Franklin/Keil C51 compiler does it right.

Meindert

Vote

M

Meindert Sprang 16 years ago

I beg to differ...

Well, my experience with embedded cross compilers is different, see my other post. And I think it is fair to demand such a thing since embedded compilers are supposed to be tight on hardware resources.

My AVR compiler for instance does a real load-OR-store operation when more than one bit is set in the constant but a nice single SBI instruction when only one bit needs to be set. This keeps the C code ANSI compliant and this makes optimal use of processor resources. And that is IMO how an embedded cross compiler should work.

Meindert

Vote

M

Meindert Sprang 16 years ago

This is the code that does the shift:

0FCC8 0E10 MOVLW 0x10 0FCCA 90D8 BCF 0xfd8, 0, ACCESS 0FCCC 3203 RRCF 0x3, F, ACCESS 0FCCE 3202 RRCF 0x2, F, ACCESS 0FCD0 3201 RRCF 0x1, F, ACCESS 0FCD2 3200 RRCF 0, F, ACCESS 0FCD4 06E8 DECF 0xfe8, F, ACCESS 0FCD6 E1F9 BNZ 0xfcca

The loop is executed 16 times (>>16) and 4 locations are shifted through the carry bit, if I undestand this correctly.... yuck!

Meindert

Vote

C18 Compiler again

Join the Discussion

Didn't find your answer?