Octets with non-8 bit bytes...

Ok, I'm sure this has been beaten to death, but google, etc. found a lot of descriptions of the problem but none of a portable solution.

I'm working with some firmware drivers which are intended to be as portable as possible. Data moves thru a switchable 8- or 16-bit data bus chip (a USB device controller specifically). Performance is critical so 16-bit is pretty much necessary. Following that example, let's look at the USB mass storage class. You get commands from the host in 31 octet command wrappers that look like this (endian issues aside...):

typedef struct { u32 Signature; u32 Tag; u32 TransferLength; u8 Flags; u8 Lun; u8 CommandLength; u8 Command[15]; } Cbw;

If I have 8 bit data types that's easy enough to get and deal with. But right now I'm working with a TMS320C55x variant with nothing smaller than 16-bit data types. So naturally the 8 bit types get all mixed up when I read them and when I send back similar data every other octet is garbage. Some responses are filled at runtime, a few are global constants. I can pack things early, but then I need to unpack, modify, and repack. Or I can pack before transmission, but that'd take a bite out of performance. Or I can break things down:

typedef struct { BYTE Signature0; BYTE Signature1; BYTE Signature2; BYTE Signature3; BYTE Tag0; BYTE Tag1; BYTE Tag2; BYTE Tag3; BYTE TransferLength0; BYTE TransferLength1; BYTE TransferLength2; BYTE TransferLength3; BYTE Flags; BYTE Lun; BYTE CommandLength; BYTE Command[15]; } Cbw;

Ugly. I'd really like to avoid that...

Now, I see this problem described countless times (yes, yes, sizeof(char)==sizeof(int)==1, 16 bit byte is 100% ok by the standard), but what's the best portable solution to dealing with this? Or at least *mostly* portable. All the messages I see say "don't store binary data and don't worry about how many bits are in anything". Great, but that embedded command field being sent from my host computer 5 meters away is 15 octets whether I like it or not. I don't care if everything's stored locally inefficiently so long as performance is reasonable (and it's clear! Other people *will* be dealing with this code!)

I'm making progress getting things to work, but it's getting ugly so I was curious how people deal with this in real life.

Thanks for whatever guidance you can provide, alex

Reply to
Alex Sanks
Loading thread data ...

"Octets" and "Bytes" are always 8 bits. The term you want is "Words."

Reply to
Guy Macon

Not in C. A C byte id the smallest of

1) a character used by the system, 2) the smallest memory chunk that can be individually addressed, or 3) eight bits.

In most DSPs, a C compilers considers a byte to contain 16 or 32 bits.

sizeof(char) is always 1. sizeof() returns storage size in bytes. On most DSPs, sizeof(int) is 1. On many, sizeof(long) is also 1. Try it.

Jerry

--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply to
Jerry Avins

On 10 Jun 2004 16:59:49 -0700, snipped-for-privacy@sanks.net (Alex Sanks) wrote in comp.arch.embedded:

I ran across something similar in parsing and formatting CAN packets for the TI 2812 DSP, which likewise has 16-bit chars and ints. A CAN packet may contain between 0 and 8 octets in the data field of the frame. In our interface, any octet may be part of an 8-bit, 16-bit, or 32-bit value.

I wrote two low-level routines to pack/unpack to an array of eight

1-bit words. When compiled with full optimization it is quite short and fast, at least on the 2812, which has a C-friendly architecture compared to some older DSPs. The result was good enough that I had no need to write it in assembly language. In fact one of my colleagues who wrote the other side of the interface on an ARM used the code unchanged.

You might be able to adapt something from them:

#define OCTET_MASK 0xFFU

static void split_frame(const uint16_t words [4], uint_least8_t

*split) { /* can't just walk a pointer to unsigned char through the octets of the */ /* data frame because unsigned char is 16 bits on the 2812 DSP! */ split [0] = words[0] & OCTET_MASK; split [1] = (words[0] >> 8) & OCTET_MASK; split [2] = words[1] & OCTET_MASK; split [3] = (words[1] >> 8) & OCTET_MASK; split [4] = words[2] & OCTET_MASK; split [5] = (words[2] >> 8) & OCTET_MASK; split [6] = words[3] & OCTET_MASK; split [7] = (words[3] >> 8) & OCTET_MASK; }

static void assemble_frame(const uint_least8_t *split, uint16_t

*words) { /* can't just walk a pointer to unsigned char through the octets of the */ /* data frame because unsigned char is 16 bits on the 2812 DSP! */ words [0] = ((uint16_t)split [1]
Reply to
Jack Klein

Yup. One more reason to hate C.

Reply to
Guy Macon

In C, Octets yes, but bytes contain CHAR_BIT bits, as defined in

--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
 Click to see the full signature
Reply to
CBFalconer

Please excuse as I can give no "whatever guidance". But another question to you: Can you tell me where to get information about the USB mass storage class ?

Thanks, Wolfgang

Reply to
Wolfgang

What the data bus of that chip is should be pretty much irrelevant. What you need to know is what size the registers are. Or more generally, how that 16-bit layout actually works. The makers of that USB controller *must* be aware of this problem, so check them for app notes.

So don't do that. Marshal your incoming data into something your CPU can use (e.g. one 16-bit word for each octet, let 32bit words keep

32bit words, and forget about possible waste), right at the interface betwen the USB controller and the DSP.

You won't manage to avoid all the ugliness --- you've maneouvered yourself into too ugly a situation for that.

Essentially the same one you use to work with single bits in a C byte: masks and shifts. Or, only if you know your compiler will _never_ change its behaviour in that aspect, bit-fields.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
Reply to
Hans-Bernhard Broeker

USB.org has a good collection of documents, including class specs.

formatting link

HTH,

Vadim

Reply to
Vadim Borshchev

Close. It's the smallest addressable unit which will hold a character.

From the standard: byte addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

2 NOTE 1 It is possible to express the address of each individual byte of an object uniquely. 3 NOTE 2 A byte is composed of a contiguous sequence of bits, the number of which is implementation defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit.
--
Al Balmer
Balmer Consulting
 Click to see the full signature
Reply to
Alan Balmer

Closer, but still no cigar. It must be addressable, and must be able to represent each character distinctly. But by no means does it

*have* to be the _smallest_ addressable unit fulfilling those requirements. E.g. a C translation system targetting a 32-bit x86 PC yet using 19-bit chars, although obviously a total perversion, is quite certainly allowed by the C standard.
--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
Reply to
Hans-Bernhard Broeker

In article , Guy Macon writes

Octets yes but not so bytes so I am told. Back in the depths of computing history bytes could be other than 8 bits hence the use of "octet"

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/\ /\/\/ snipped-for-privacy@phaedsys.org

formatting link
\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

Reply to
Chris Hills

So how large is a byte on such a machine? I'm not clear if this is 24 or 32 bits. I guess this would be 32 bits since 24 bits would not be "directly" addressable. How is that different from what Alan said?

--
Rick "rickman" Collins

rick.collins@XYarius.com
 Click to see the full signature
Reply to
rickman

Just to stir the pot a little, a pdp8 was a

12-bit machine and 12 bits was common usage for a byte. The machine addressed data in 12-bit bytes, but also common usage was to store and manipulate chars as 6-bit nibbles packed 2 per byte. This was back on TTY's where upper case was irrelevent.
Reply to
Jim Stewart

Providing that the 32-bit words are addressable in 19-bit chunks? My head hurts.

--
Al Balmer
Balmer Consulting
 Click to see the full signature
Reply to
Alan Balmer

Now the question is, what is a character ?

I can think of character sets based on 5, 6, 7, 8, (9), 16, 21 and 31 (32) bits.

Paul

Reply to
Paul Keinanen

Just to stir the pot a little more...

formatting link
byte: /bi:t/, n. [techspeak] A unit of memory or data equal to the amount used to represent one character; on modern architectures this is invariably

8 bits. Some older architectures used byte for quantities of 6, 7, or (especially) 9 bits, and the PDP-10 supported bytes that were actually bitfields of 1 to 36 bits! These usages are now obsolete, killed off by universal adoption of power-of-2 word sizes.

Historical note: The term was coined by Werner Buchholz in 1956 during the early design phase for the IBM Stretch computer; originally it was described as 1 to 6 bits (typical I/O equipment of the period used 6-bit chunks of information). The move to an 8-bit byte happened in late 1956, and this size was later adopted and promulgated as a standard by the System/360. The word was coined by mutating the word ?bite? so it would not be accidentally misspelled as bit. See also nybble.

formatting link
chawmp: n. [University of Florida] 16 or 18 bits (half of a machine word). This term was used by FORTH hackers during the late 1970s/early

1980s; it is said to have been archaic then, and may now be obsolete. It was coined in revolt against the promiscuous use of ?word? for anything between 16 and 32 bits; ?word? has an additional special meaning for FORTH hacks that made the overloading intolerable. For similar reasons, /gaw´bl/ (spelled ?gawble? or possibly ?gawbul?) was in use as a term for 32 or 48 bits (presumably a full machine word, but our sources are unclear on this). These terms are more easily understood if one thinks of them as faithful phonetic spellings of ?chomp? and ?gobble? pronounced in a Florida or other Southern U.S. dialect. For general discussion of similar terms, see nybble.

nybble: /nib´l/, nibble, n. [from v. nibble by analogy with ?bite? ? ?byte?] Four bits; one hex digit; a half-byte. Though ?byte? is now techspeak, this useful relative is still jargon. Compare byte; see also bit. The more mundane spelling ?nibble? is also commonly used. Apparently the ?nybble? spelling is uncommon in Commonwealth Hackish, as British orthography would suggest the pronunciation /ni:´bl/.

Following ?bit?, ?byte? and ?nybble? there have been quite a few analogical attempts to construct unambiguous terms for bit blocks of other sizes. All of these are strictly jargon, not techspeak, and not very common jargon at that (most hackers would recognize them in context but not use them spontaneously). We collect them here for reference together with the ambiguous techspeak terms ?word?, ?half-word?, ?double word?, and ?quad? or quad word; some (indicated) have substantial information separate entries.

2 bits: crumb, quad, quarter, tayste, tydbit, morsel

4 bits: nybble

5 bits: nickle

10 bits: deckle

16 bits: playte, chawmp (on a 32-bit machine), word (on a 16-bit machine), half-word (on a 32-bit machine).

18 bits: chawmp (on a 36-bit machine), half-word (on a 36-bit machine)

32 bits: dynner, gawble (on a 32-bit machine), word (on a 32-bit machine), longword (on a 16-bit machine).

36 bits: word (on a 36-bit machine)

48 bits: gawble (under circumstances that remain obscure)

64 bits: double word (on a 32-bit machine) quad (on a 16-bit machine)

128 bits: quad (on a 32-bit machine)

The fundamental motivation for most of these jargon terms (aside from the normal hackerly enjoyment of punning wordplay) is the extreme ambiguity of the term word and its derivatives

Also see:

formatting link
formatting link
formatting link
formatting link
formatting link
formatting link

Comment by Guy Macon: Concerning the statement "on modern architectures this is invariably 8 bits", in my opinion C has no resemblance to anything that can reasonably be called "modern." See [

formatting link
].

--
Guy Macon, Electronics Engineer & Project Manager for hire. 
Remember Doc Brown from the _Back to the Future_ movies? Do you 
 Click to see the full signature
Reply to
Guy Macon
[...]

I first used Forth in the late 70's-early 80's (though I've never been a "Forth Hacker"), and I've never seen this term before.

The term I've always heard used is "cell," which is the size of a single entry on the data stack, and at least 16 bits wide in ANSI standard Forth. A "cell pair" holds "double cell" values. A "character" is allowed (but not required) to be narrower than a "cell."

FWIW, a "word" in Forth is what you might call an "operator" or a "function" in c. Actually, it's a little more generic than that. Almost everything in a Forth program is a word.

Never heard of that one either...

[...]

Shave and a haircut...

Regards,

-=Dave

--
Change is inevitable, progress is not.
Reply to
Dave Hansen

Would that make 8 bits a dollar? I've never liked calling 8 bits an octet, it sounds like an overgrown musical group...

Lets see... a dollar buys an ascii char, two dollars gets you signed numbers from 32575 to -32576 and four dollars can buy... well you get the idea. :)

I am building a 50 cent CPU! Cool, thats what I'll call it, FiftyCents.

--
Rick "rickman" Collins

rick.collins@XYarius.com
 Click to see the full signature
Reply to
rickman

Yes, and comes from the practice of cutting a silver dollar into 8 "bits" for making change.

--
Bill
Posted with XanaNews Version 1.16.3.1
Reply to
William Meyer

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.