Alignment problems with MIPS architecure

- M
- mahendraguduru
  
  Contact options for registered users
posted
18 years ago

Sun, Mar 12, 2006 1:44 PM

HI, We are trying to port our application from PPC405 to MIPS 4kc Architecture board. But we are getting many Address Alignment exceptions while porting. We have studied that MIPS has lwl, lwr, swl and swr instructions to address alignment problems.

We wanted to know how to instruct the compiler, to use above special instructions (instead normal sw and lw ), whenever a UN-ALIGNED address

access is encounterd.

[We have noticed that, attribute_packed uses above instructions. But we

can not use that, as our structure length has to be same for protocol compatibility.]

Our Compiler toolchain is :MIPS32sfgnu and our ccmips version is :gcc-2.96

Thanks in advance for all your help. Regards, Mahendra

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 12, 2006 2:14 PM

Get the GCC manuals from .

Please note that GCC-2.96 is a) very old, and b) inofficial Red Hat bastardization of the GNU compiler. You can probably use the 2.95 version documents.

Have a look at the 'packed' attribute. Using it creates spectacularly ugly and inefficient machine code. A far better approach is to correct the data structures for proper alignments. This applies especially for all RISC processors, as Sparc and ARM, besides your MIPS.

--

Tauno Voipio
tauno voipio (at) iki fi

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 12, 2006 2:22 PM

There's your problem. You're making the old assumption that you should (or even could) model your protocol in terms of C structures. That's *wrong*.

C structs are for internal use by the program *only*. You don't use them to try and match externally predefined protocols, and you absolutely don't define a protocol based on what a randomly picked C compiler makes out of a given C struct definition.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- M
- Mike Silva
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 12, 2006 3:47 PM

As others have noted, your problem is trying to use the exact layout of your external, protocol-related structures for your internal structures as well. They are two separate things. Define, and then let your compiler create, a corresponding internal data structure for every one of your external protocol structures. Whenever you are reading data from one of your external protocol structures, copy the data element by element into the corresponding internal structure. Likewise, whenever you are writing data to one of your external protocol structures, copy the data element by element from the corresponding internal structure. The rest of the time your code will be working only with your correctly aligned internal structures, and everything will run much smoother and faster.

If this data realigning sounds like a lot of extra work for your code, remember that if you could force the compiler to do it invisibly with various packing directives, your code would still be doing the extra work, but it would be doing it every single time a data element was accessed, rather than just on the initial read-in and the final write-out. Better to do it upfront in a controlled and visible manner and get it out of the way.

- S
- Steve at fivetrees
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 13, 2006 5:45 AM

Absolutely. It's not just a question of alignment, but representation. Little/Big-Endian.... int size, float format, etc etc.

I've recently seen this up close and personal - it's painful. (Project involving a pre-existing protocol defined in terms of i386/Borland structures - and of course my implementation was the first to use a different architecture & compiler. Sigh.) ASCII - or a well-defined and easily-parsed independent binary format - is the only way to fly.

Steve

formatting link

- ?
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 13, 2006 9:05 AM

But then, how *do* you model your protocol? Sure, your reference doc should not use C structs to describe a protocol, but once you have, e.g., a textual description of it, you'll have to write code to implement it. Are you suggesting _not_ to use C structs at this point?

Rob

-- Robert Kaiser email: rkaiser AT sysgo DOT com SYSGO AG

formatting link

Klein-Winternheim / Germany

formatting link

- J
- John B
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 13, 2006 12:03 PM

On 13/03/2006 the venerable ? etched in runes:

Absolutely. Create a message buffer and then fill it with the appropriate values which are plucked out of the C structure. If the protocol is UART based (which most are) then the buffer should be byte oriented and all values should be inserted using byte operations. Even the endianess of a particular process can screw you up here.

--
John B

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 13, 2006 12:08 PM

Not for the actual protocol. Possibly for the internal representation of it.

You write a conversion layer between external and internal representation of the protocol. Typically, each type of message of the protocol will internally correspond to one C aggregate data type (a struct, an array, combinations thereof). The conversion layer reads the incoming protocol octet by octet, or even bit by bit if necessary, and fills the C structure's fields. The opposite activity happens whenever your C code has to talk that protocol. Treating

16-bit or 32-bit unsigned ints in a single I/O operation and fixing up the breakage by ntohl() & friends is an option that should be handled with care --- those functions aren't terribly portable themselves.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- ?
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 13, 2006 2:25 PM

This means that you need a seperate conversion method for every type of telegram. Thats OK, but how do you suggest to describe the physical format of the messages? The only practical way I can see to do this in C without using structures would be to use a bunch of constants that define offsets of individual data items. Depending on the protocol, this can be a _very_ large bunch and a nightmare to maintain or even understand.

I would prefer to have C structs instead, knowing (and documenting) that they may need to be adjusted if toolchain/platform is changed.

The way I see it, ntohl() & friends are not supposed to be portable themselves. They are to be provided by the platform in order to allow other code to be (sort of) portable by using these functions. So, they might be considered part of the conversion layer you mentioned.

Rob

-- Robert Kaiser email: rkaiser AT sysgo DOT com SYSGO AG

formatting link

Klein-Winternheim / Germany

formatting link

- V
- Vadim Borshchev
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 13, 2006 3:49 PM

With pictures, as usually. Look at TCP description , scroll down to "3.1. Header Format".

C structures are implementation. Say, one specify something like struct { uint8_t x; uint8_t y; } foo; Depending on the architecture and/or compiler in question the members x and y might be aligned on 1-, or 2-, or 4-byte boundary. It might happen they are always aligned on 4-byte boundary, and sending this structure out would actually transmit 8 bytes, not 2 as intended. So, the hefty comment is necessary to describe what was ment by that C statement.

Then, C is not the only language, though spoken widely. And English (in comments) might not be the first language of a person who uses the code.

Vadim

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 13, 2006 4:39 PM

In "The Practice of Programming" Kernighan & Pike suggest what they call a "little language" for this. By this they mean something like the format string in printf.

They then have variadic functions

int pack(unsigned char* buf, char* fmt, ...);

int unpack(unsigned char* buf, char* fmt, ...);

fmt is a string containing the description of the packet layout, their example is "cscl" where c=8 bit char, s=16 bit int, l = 32 bit int.

pack() steps through the format string, and for each character encodes the next variadic argument into the buffer.

unpack() does the reverse, except its variadic arguments are pointers to the destination variables.

--

John Devereux

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 13, 2006 6:34 PM

In plain text, being very careful about the exact interpretation of technical terms. Or possibly in diagrams, or a formal grammar. I'd also have an in-depth look at established prior art, first. With a bit of luck, e.g. some Internet RFC already fits the bill, and if not, you at least know how to write that kind of document.

That knowledge (and documentation) would basically bear the message: "I, the designer of this protocol, don't know what it's messages actually look like any better than you do, thanks for asking".

That would be a terribly bad idea. You might just as well try to define the shape of the Matterhorn by providing an approximate model of unspecified scale, in Jell-O (Wackelpudding).

C simply is the wrong tool for this kind for work, period.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 14, 2006 8:46 AM

For portable applications, I use structs containing arrays of unsigned chars (which is OK for most cases, except some DSPs). A 16 bit variable is an array with two elements and a 32 bit item is an array of four elements. The value assembly/disassembly is done by shifts and masks. Apart from the obvious align problems, this also solves the endianess problems.

Paul

- ?
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 14, 2006 9:27 AM

Sorry, I was a bit unclear. I had already taken a formal description for granted, and I totally agree that C structures are clearly not the way to formally specify a protocol(*). When I asked how to "describe" the physical format of messages, I was referring to the process of

*implementing* the protocol in -say- C, and that's the context where this ...

... IMHO still applies.

Hmmm, well, practically speaking, it *can* be a little better than that. This, of instance...

--
  typedef unsigned int   my_uint32; /* 32-bit unsigned integer */
  typedef unsigned short my_uint16; /* 16-bit unsigned integer */

  struct xx {
        my_uint32 a;
        my_uint16 b;
  };

  #define check_assumptions() \
        assert(sizeof(my_uint32) == 4);\
        assert(sizeof(my_uint16) == 2);\
        assert(offsetof(struct xx, b) == 4);

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 14, 2006 12:23 PM

You implement it by implementing it.

C, if written not too badly, should be self-documenting for this kind of job.

It's also a waste of typing, because it's basically guaranteed to not work as an actual implementation of such a protocol. It's Pseudo-C, not real, usable code.

Those assert()s will *fail*. Maybe not in this particular case, on today's hardware, but if you try to actually use this for real, they will. Reverse the element types in the record, and try it on a 32-bit compiler, and it'll break today. That's exactly what I was referring to when I said C struct are not usable for this. A correct implementation would look more like this:

#include #incldue #include "my_types.h"

struct xx { U32 a; U16 b; }

read_xx(struct xx *this, FILE *f) { assert(CHAR_BIT == 8); /* to keep this simple */ fread(&this->a, 4, 1, f); ntohl(this->a); fread(&this->b, 2, 1, f); ntohs(this->b); }

You can make this even easier with some helper functions, like

U32 read_U32(FILE *f);

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- ?
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 15, 2006 10:03 AM

Doesn't that contradict to what you said earlier?

It's not code at all, just definitions. It doesn't even _want_ to implement any protocol. It's just an example of how to write down the physical layout of data (i.e. messages) with the means of C.

It may not be nice, but it serves the purpose. Surely, a seperate (textual or graphical) document could do a less clumsy job in describing physical layout, but external documents tend to go out of sync with the implementation as development advances. Being written in the implementation's language (i.e. C in this example), these definitions can be _part_ of the implementation and will therefore always be in sync.

It is the *purpose* of those assert()s to fail, if the assumptions do not apply in any given environment. The intended use of the macro would be for code to invoke it at some point early during initialization, so the program cleary and visibly fails if assumptions are not met rather than exhibiting obscure failures later on.

Actually it would be better if the code could be made to not even compile, e.g.:

#if(sizeof(my_uint32) != 4) #error "Please adapt my_uint32 to your compiler!" #endif

Unfortunately, that doesn't work with C, so that macro is just a makeshift replacement, but it does help to document the assumptions while at the same time providing some means to prevent them from being lost when buried under heaps of other code.

Sure, no argument about that.

Rob

-- Robert Kaiser email: rkaiser AT sysgo DOT com SYSGO AG

formatting link

Klein-Winternheim / Germany

formatting link

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Mar 15, 2006 2:29 PM

No. The trick is you can't use C data structures to codify the external data layout. That doesn't rule out using the rest of the language, though.

No, they won't, because that implementation doesn't work. Trying to use it would generate non-working code or error messages, depending on the host platform and on whether the assert() are actually called. So people would #ifdef it out and use their own. It'll fall out of synch with the actual implementation just as quickly as a text file would. If you want the documentation of the physical data format close to the implementation, use some kind of literate programming or a documentation extractor.

So what do you expect the user of this supposed reference implementation to do if they fail on their C compiler? Run crying to their supervisors and claim that, sorry, this protocol can't be implemented on this new platform --- because, look, sir, the assertions are failing?

If you're going to write a "definition by implementation", that had better be an implementation that can actually be used on other compilers than the one you happen to be using today.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.