whose cc recognises byte moves

P

Pat LaVarre 22 years ago

> > > Newsgroups:comp.arch.embedded

> > > > ...
> > > > using the usual compilers for 8-bit
> > > > embedded chips has been a let down
> > > ...
> > > Do you have any concrete examples?
> > > ...

Lately elsewhere I saw people mocking the practice of writing explicit byte assignments in C, which brings me here now asking, anyone with an

8051 compiler want to try compiling the following code snippet? Possibly before I saw:
1) Some compilers actually involve 32-bit arithmetic, rather than moving bytes, ouch.
2) Some compilers allocate the unsigned long twice, once as a local variable, then again as a separate result, ouch.
3) Many compilers fail to produce the same machine code for both of these expressions of the same idea e.g. my 32-bit Linux desktop `gcc

--version` 3.2.2 here now, when run as c.a.e. lately helpfully suggested:

gcc -c -fomit-frame-pointer -O3 -Wall -W hi.c objdump -dS hi.o

A frustrating failure of the C compilers for 8 and 16 bit microcontrollers to understand what I meant by what I said plainly and simply in the first place, yes?

Pat LaVarre

/// ways of fetching a potentially misaligned big-endian 32-bits ... /// (op x25 Read Capacity bytes[4:5:6:7] is disc bytes per read block)

long sub1(char * chars) { long result; result = (chars[4]

Vote

D

Dejan Durdenic 22 years ago

I do not agree that both functions are the same - what if long's and char's are signed? I have tried it on KEIL, and sub2 works pretty neat - only a few moves and that's it... sub1 depends on the type of long's and char's...Anyway, a lot of shifting occured. On the other side - you said your compiler to shift the bytes, so it did... Also I have declared pointers to be of data type pointers - that makes the code clearer to follow.

BTW. you have forgotten that we do not have endian.h file ;)

regards

Dejan

Vote

T

Thad Smith 22 years ago

True.

Some may have optimizers that recognize the shift-by-8/shift-by-16. Many don't.

This code has two bugs. For targets with 16-bit ints, (chars[4] #define LIL(I, N) BIG(I, sizeof (I) - (N))

This version will, in most (all?) cases, generate much better code for

8-bit processors (and probably 16-bit processors). It doesn't work for mixed-endian targets, though. It also makes an assumption that chars are 8 bits in length, which the earlier version doesn't.

Thad

Vote

P

Pat LaVarre 22 years ago

No comment?

Aye, writing explicit byte moves works.

But to have to jump thru such hoops to get a reasonable result frustrates me. Mostly when someone else didn't jump thru the appropriate hoop and I have to fix their code. I remember once with nothing but source-to-source transformations I dropped to twenty ms from over a thousand.

Help I don't quite understand: can you easily give an example?

But if we want to compile the same C for a variety of platforms, we don't want our C understood this way. A byte move is a byte move. On a processor with a single cycle barrel shifter that likes 32-bit arithmetic and aligned memory access, we want shifts. On a processor with byte-wide registers and memory, we want moves. Writing either gives me the wrong answer on the other platform, ouch.

What is a "mixed-endian" target"?

To pretend to have a Gnu endian.h was the clearest way I knew to say in C that all the processors that I ever programmed much were either big-endian or else little-endian. I imagine the C standard allows more variability than that.

Pat LaVarre

P.S.

Vote

T

Thad Smith 22 years ago

I haven't seen it and don't expect it with a reasonable compiler, but I wouldn't be terribly surprised, either.

Good work. That's sometimes part of the job.

Then it isn't written correctly. The shift method is robust if written correctly. The byte move method is appropriate for most cases, assuming adjustment is made for endedness.

Sometimes when making these kind of optimizations, I use conditional code to specify two versions -- a generic version that should work on all platforms and an optimized version for a particular platform. The optimized version is only included if preprocessor symbols indicate the associated target. That makes the code portable and the generic version usually serves as a more readable description of what is being done.

One in which is byte order is something other than 0123 or 3210. Was that the VAX?

OK, that should work with a 32-bit int computer. On the other hand, if you or others are concerned about 8-bit processors, why not at least modify it to be well defined for implementations with 16-bit ints (most

8-bit systems, included). It doesn't cost anything for 32-bit systems, just makes it portable to ALL standard C implementations. Just replace 0xFF with 0xFFL.

Thad

Vote

C

CBFalconer 22 years ago

... snip ...

No, it isn't portable. Left shifting bits into the sign bit of an int results in undefined behavior. Bytes may be more than 8 bits, although this is fairly rare, and chars may be signed. The above is obviously intended to convert a bigendian (Least sig. byte first) stream. A portable version is:

/* Convert 4 bytes of bigendian to an unsigned long */ unsigned long bigendian4toul(const unsigned char *chars) { int i; unsigned int result;

result = 0; i = 4; do { result = result * 256 + (chars[--i] & 0xff); } while (i); return result; } /* untested */

And you can replace "result * 256" with "result

Vote

P

Pat LaVarre 22 years ago

Hmmm. Yes to me the orders 1:0:3:2 and 2:3:0:1 feel almost as familiar as 0:1:2:3 and 3:2:1:0 do, and I did work on vaxen in the early 80's ...

Thank you, yes I see, on C platforms where long essentially means int,

0xFFL essentially means 0xFF. Possibly I could be pushed as far as:

result = ((chars[4] & 0xffL) ...

Ouch I see I misled you sorry. I meant to say:

Writing shifts or byte moves commonly provokes actual C compilers to produce machine code that expends unreasonable amounts of time or space in order to calculate the correct value. Compilers for processors that prefer shifts misunderstand byte moves, compilers for processors that prefer byte moves misunderstand shifts.

In theory I like writing a plain version and also a tuned version, but in practice my colleagues and I rarely find the time to maintain the code we never run. Sometimes I do persuade people to run the naive code as a check on the clever code.

Portability isn't free: some people find platform-specific code, in particular cast-free code, easier to read.

I actually launched this c.a.e. thread because I saw some Linux folk who mostly target processors that prefer aligned shifts casually mock people who write byte moves. I was trying to remember why I tended to favour writing byte moves.

No.

Writing code like everyone else writes leads to correct binary-code-only compiler behaviour, because everyone else has already fixed the bugs I'd otherwise discover.

Writing compliant code only guarantees correct output from a hypothetical standard C compiler whose authors have fixed its last bug, and doesn't relate to the time and space expended.

Pat LaVarre

Vote

P

Pat LaVarre 22 years ago

Sorry I neglected to mention that. Fortunately many (all?) two's complement machines agree over what left shifting a 1 into the sign bit means.

Some compilers reward deferring the promotion to long e.g.

#define octet(n) ((unsigned long)(chars[n] & 0xff))

Some compilers reward substituting | for + when in fact they give the same result. Using | says explicitly that the addition needn't carry, which helps, but also says explicitly that the addition shall not carry, which can hurt.

Do we have a practical way to establish whether or not the source code we wrote depends only on what the C standard promises a hypothetically compliant compiler will provide?

Hello here we are. Actual C compilers for actual 8-bit and 16-bit micros commonly do waste much code space and time by wrongly inferring we meant 32-bit shifts, multiplication, and addition when in fact we wanted byte moves.

That kind of "portable" is hypothetical. Me, I care about time and space and plainly correct machine code, not just arguably correct C source code.

If actual C compilers actually understood the equivalence between shift and byte move, then I'd get precisely the same result from saying either, and I'd not be tempted to observe, model, discuss, and exploit the difference.

Pat LaVarre

Vote

P

Paul Keinanen 22 years ago

The VAX is (was) a little endian machine when dealing with integers.

However, the floating point format inherited from PDP-11 looked quite strange, with the first 16 bit word containing the sign, exponent and the most significant part of the mantissa, while the second word contained the least significant word.

However, when looking at byte addresses, the lowest byte contained the most significant part of the mantissa (and the least significant bit from the exponent), the second byte the sign and most of the exponent, the third byte the least significant part of the mantissa and the last byte the middle bits from the mantissa.

However, on the little endian PDP-11, some compilers might put the most significant word of a 32 bit variable into the lower address and the least significant word in the higher address, although the hardware supported only 16 bit memory references for integers. With such compilers, the byte order was 2301 :-).

Paul

Vote

T

Thad Smith 22 years ago

You are correct. If the intent is to convert a a 4 octet twos-complement big-endian representation into a native 32-bit signed value, more logic is needed.

Here's one candidate:

long bigendian2c4otolong (const unsigned char *be) { unsigned long r; r = ((be[0] & 0xffLU)

Vote

D

Dave Hansen 22 years ago

[...]

No they are not. They are encodings. The range of encoded values of the execution character set must be able to be represented by the (char) type. Character string literals (e.g. "Hello world!") have type (char *). Character literals (e.g. 'A') have type (int).

The types (char), (signed char) and (unsigned char) are three distinct types. The (char) type itself may be signed or unsigned: you don't know unless you read the documentation.

If sign matters, it is better to use (signed char) or (unsigned char).

Good idea. Regards,

-=Dave

Change is inevitable, progress is not.

Vote

P

Pat LaVarre 22 years ago

Yes we corrected that tupo once already. Sorry it exists. Specifically I omitted the & 0xFF.

A comparably popular competing rule of thumb is used signed quantities always e.g. in C always say signed char.

Pat LaVarre

Vote

H

Hans-Bernhard Broeker 22 years ago

Fully agreed.

I strongly doubt that this rule is anywhere near popular. The average C programmer isn't that silly --- or so I keep hoping.

To be perfectly clear about this: this rule-of-thumb is too stupid to live. For starters, there's no way you can correctly use the standard functions/macros in a portable manner while sticking to that rule.

Signed chars have exactly one kind of non-silly usage: as very small integers. Using them to represent characters in actual text gives you just a desaster waiting to happen.

Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain.

Vote

P

Pat LaVarre 22 years ago

I see this in code fragments written to run as C or as Java. I thought these might be coming out of a tradition of signed-char

32-bit-int Unix, but personally I'm decidely vague on how the world has split between signed and unsigned bytes express in C as type char.

Sorry I was unclear. Java uses 16 unsigned bits, perhaps unsigned short in C, to mean UTF-16 char. I meant to be talking about signed bytes. Java has only signed bytes, no unsigned bytes.

Pat LaVarre

Vote

H

Hans-Bernhard Broeker 22 years ago

And in that case, it's entirely the *Java* side of that which would be governing it. IIRC, Java has no unsigned types at all.

OTOH, trying to write code that works in more than one different language is even sillier than that rule of thumb. It can be a nice game (the record achievement being more than 20 languages that can all execute a given file, IIRC), but it's quite certainly useless in any productive environment. The idea of there being a programming language "C/C++" you can write program in has wreaked enough havoc to people's education already --- no point in making that even worse.

Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain.

Vote

P

Pat LaVarre 22 years ago

Whether Java does or does not accurately represent a signed-char

32-bit-int Unix C design tradition passed person-to-person, I do not know.

That is, in Java as specified, signedness is always implicit, never explicit or otherwise indeterminate. Instead, all of byte, short, int, long are always signed and char is always unsigned. That is, in Java (96?) we see a different way of spelling the C99 ideas of int8_t, int16_t, int32_t, int64_t and uint16_t.

In particular, Java char by definition works like the unsigned two's-complement sixteen bits that often we can get via C unsigned short (and perhaps wchar?) e.g.

$ cat hi.java class hi { public static void main(String[] args) { int i = -1; char ch = (char) i; int j = (int) ch; System.err.println(j); System.err.println("x" + Integer.toHexString(j)); } } $ javac hi.java $ java hi

65535 xffff $ $ # Pat LaVarre

Vote

E

Everett M. Greene 22 years ago

I've always wondered about this. A char is a char, unless it's a character in which case it's an int! Makes about as much sense as some of the other strange "features" of C.

Right. And to just make matters "interesting", the developers of an OS with which I've worked for a number of years decided that char strings should be UBYTE which then causes the compilers to complain because UBYTE != char!

Right.

Why?

Vote

D

Dave Hansen 22 years ago

[...]

Shifting a bit left into the sign bit is undefined. Shifting the sign bit to the right is implementation-defined. Summary: you don't know what you're gonna get.

Of course, using unsigned types doesn't always help if the type is narrower than int. In that case, it gets promoted to (signed) int before the shift. For example, assuming 16-bit int:

unsigned char x = 0xFF, y; unsigned int z;

y = x

Vote

P

Pat LaVarre 22 years ago

Yes, on paper. But how often is this issue real? Anybody actually selling a 16-bit processor these days whose int's are not two's complement, such that (0x00FF

Vote

E

Everett M. Greene 22 years ago

Agree with that to the extent that /in general/ you don't know, but for a particular compiler/processor combination you will. Source code portability is questionable when using shift operations -- signed vs. unsigned isn't as significant a factor as shifting of any sort.

You snipped my rant! I just found out that strncpy() null-fills the destination to the "n" characters whereas strncat() does not!

Vote

whose cc recognises byte moves

Join the Discussion

Didn't find your answer?