Octets with non-8 bit bytes...

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Ok, I'm sure this has been beaten to death, but google, etc. found a
lot of descriptions of the problem but none of a portable solution.

I'm working with some firmware drivers which are intended to be as
portable as possible.  Data moves thru a switchable 8- or 16-bit data
bus chip (a USB device controller specifically).  Performance is
critical so 16-bit is pretty much necessary.  Following that example,
let's look at the USB mass storage class.  You get commands from the
host in 31 octet command wrappers that look like this (endian issues
aside...):

typedef struct
{
    u32    Signature;
        u32    Tag;
    u32    TransferLength;
    u8     Flags;
        u8     Lun;
    u8     CommandLength;
    u8     Command[15];
} Cbw;

If I have 8 bit data types that's easy enough to get and deal with.
But right now I'm working with a TMS320C55x variant with nothing
smaller than 16-bit data types.  So naturally the 8 bit types get all
mixed up when I read them and when I send back similar data every
other octet is garbage.  Some responses are filled at runtime, a few
are global constants.  I can pack things early, but then I need to
unpack, modify, and repack.  Or I can pack before transmission, but
that'd take a bite out of performance.  Or I can break things down:

typedef struct
{
    BYTE    Signature0;
    BYTE    Signature1;
    BYTE    Signature2;
    BYTE    Signature3;
    BYTE    Tag0;
    BYTE    Tag1;
    BYTE    Tag2;
    BYTE    Tag3;
    BYTE    TransferLength0;
    BYTE    TransferLength1;
    BYTE    TransferLength2;
    BYTE    TransferLength3;
    BYTE    Flags;
    BYTE    Lun;
    BYTE    CommandLength;
    BYTE    Command[15];
} Cbw;

Ugly.  I'd really like to avoid that...

Now, I see this problem described countless times (yes, yes,
sizeof(char)==sizeof(int)==1, 16 bit byte is 100% ok by the standard),
but what's the best portable solution to dealing with this?  Or at
least *mostly* portable.  All the messages I see say "don't store
binary data and don't worry about how many bits are in anything".
Great, but that embedded command field being sent from my host
computer 5 meters away is 15 octets whether I like it or not.  I don't
care if everything's stored locally inefficiently so long as
performance is reasonable (and it's clear!  Other people *will* be
dealing with this code!)

I'm making progress getting things to work, but it's getting ugly so I
was curious how people deal with this in real life.

Thanks for whatever guidance you can provide,
alex

Re: Octets with non-8 bit bytes...

"Octets" and "Bytes" are always 8 bits.  The term you want is "Words."


Re: Octets with non-8 bit bytes...

Quoted text here. Click to load it


Not in C. A C byte id the smallest of

1) a character used by the system,
2) the smallest memory chunk that can be individually addressed, or
3) eight bits.

In most DSPs, a C compilers considers a byte to contain 16 or 32 bits.

sizeof(char) is always 1. sizeof() returns storage size in bytes. On
most DSPs, sizeof(int) is 1. On many, sizeof(long) is also 1. Try it.

Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯


Re: Octets with non-8 bit bytes...

Quoted text here. Click to load it

Yup.  One more reason to hate C.



Re: Octets with non-8 bit bytes...

Quoted text here. Click to load it

Close. It's the smallest addressable unit which will hold a character.

From the standard:
byte
addressable unit of data storage large enough to hold any member of
the basic character
set of the execution environment
2 NOTE 1 It is possible to express the address of each individual byte
of an object uniquely.
3 NOTE 2 A byte is composed of a contiguous sequence of bits, the
number of which is implementation defined.
The least significant bit is called the low-order bit; the most
significant bit is called the high-order
bit.
Quoted text here. Click to load it

--
Al Balmer
Balmer Consulting
We've slightly trimmed the long signature. Click to see the full one.
Re: Octets with non-8 bit bytes...

Quoted text here. Click to load it


Closer, but still no cigar.  It must be addressable, and must be able
to represent each character distinctly.  But by no means does it
*have* to be the _smallest_ addressable unit fulfilling those
requirements.  E.g. a C translation system targetting a 32-bit x86 PC
yet using 19-bit chars, although obviously a total perversion, is
quite certainly allowed by the C standard.

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: Octets with non-8 bit bytes...
Quoted text here. Click to load it

So how large is a byte on such a machine?  I'm not clear if this is 24
or 32 bits.  I guess this would be 32 bits since 24 bits would not be
"directly" addressable.  How is that different from what Alan said?  

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com
We've slightly trimmed the long signature. Click to see the full one.
Re: Octets with non-8 bit bytes...
Quoted text here. Click to load it

Just to stir the pot a little, a pdp8 was a
12-bit machine and 12 bits was common usage
for a byte.  The machine addressed data in
12-bit bytes, but also common usage was to
store and manipulate chars as 6-bit nibbles
packed 2 per byte. This was back on TTY's
where upper case was irrelevent.




Re: Octets with non-8 bit bytes...


Quoted text here. Click to load it

Just to stir the pot a little more...


http://www.catb.org/~esr/jargon/html/B/byte.html
byte: /bi:t/, n.
[techspeak] A unit of memory or data equal to the amount used to
represent one character; on modern architectures this is invariably
8 bits. Some older architectures used byte for quantities of 6, 7,
or (especially) 9 bits, and the PDP-10 supported bytes that were
actually bitfields of 1 to 36 bits! These usages are now obsolete,
killed off by universal adoption of power-of-2 word sizes.

Historical note: The term was coined by Werner Buchholz in 1956
during the early design phase for the IBM Stretch computer; originally
it was described as 1 to 6 bits (typical I/O equipment of the period
used 6-bit chunks of information). The move to an 8-bit byte happened
in late 1956, and this size was later adopted and promulgated as a
standard by the System/360. The word was coined by mutating the word
‘bite’ so it would not be accidentally misspelled as bit.
See also nybble.


http://www.catb.org/~esr/jargon/html/C/chawmp.html
chawmp: n.
[University of Florida] 16 or 18 bits (half of a machine word).
This term was used by FORTH hackers during the late 1970s/early
1980s; it is said to have been archaic then, and may now be
obsolete. It was coined in revolt against the promiscuous use
of ‘word’ for anything between 16 and 32 bits; ‘word’ has an
additional special meaning for FORTH hacks that made the
overloading intolerable. For similar reasons, /gaw´bl/ (spelled
‘gawble’ or possibly ‘gawbul’) was in use as a term for 32 or
48 bits (presumably a full machine word, but our sources are
unclear on this). These terms are more easily understood if
one thinks of them as faithful phonetic spellings of ‘chomp’
and ‘gobble’ pronounced in a Florida or other Southern U.S.
dialect. For general discussion of similar terms, see nybble.


nybble: /nib´l/, nibble, n.
[from v. nibble by analogy with ‘bite’ ? ‘byte’] Four bits; one
hex digit; a half-byte. Though ‘byte’ is now techspeak, this
useful relative is still jargon. Compare byte; see also bit.
The more mundane spelling “nibble” is also commonly used.
Apparently the ‘nybble’ spelling is uncommon in Commonwealth
Hackish, as British orthography would suggest the pronunciation
/ni:´bl/.

Following ‘bit’, ‘byte’ and ‘nybble’ there have been quite a
few analogical attempts to construct unambiguous terms for
bit blocks of other sizes. All of these are strictly jargon,
not techspeak, and not very common jargon at that (most
hackers would recognize them in context but not use them
spontaneously). We collect them here for reference together
with the ambiguous techspeak terms ‘word’, ‘half-word’,
‘double word’, and ‘quad’ or quad word; some (indicated)
have substantial information separate entries.

2 bits: crumb, quad, quarter, tayste, tydbit, morsel

4 bits: nybble

5 bits: nickle

10 bits: deckle

16 bits: playte, chawmp (on a 32-bit machine), word (on a 16-bit machine),
         half-word (on a 32-bit machine).

18 bits: chawmp (on a 36-bit machine), half-word (on a 36-bit machine)

32 bits: dynner, gawble (on a 32-bit machine), word (on a 32-bit machine),
         longword (on a 16-bit machine).

36 bits: word (on a 36-bit machine)

48 bits: gawble (under circumstances that remain obscure)

64 bits: double word (on a 32-bit machine) quad (on a 16-bit machine)

128 bits: quad (on a 32-bit machine)

The fundamental motivation for most of these jargon terms (aside from
the normal hackerly enjoyment of punning wordplay) is the extreme
ambiguity of the term word and its derivatives

Also see:

http://www.catb.org/~esr/jargon/html/P/playte.html
http://www.catb.org/~esr/jargon/html/T/tayste.html
http://www.catb.org/~esr/jargon/html/Q/quarter.html
http://www.catb.org/~esr/jargon/html/B/bit.html
http://www.catb.org/~esr/jargon /
http://www.catb.org/~esr/jargon/jargoogle.html

Comment by Guy Macon: Concerning the statement "on modern
architectures this is invariably 8 bits", in my opinion C has
no resemblance to anything that can reasonably be called "modern."
See [ http://cm.bell-labs.com/cm/cs/who/dmr/chist.html ].


--
Guy Macon, Electronics Engineer & Project Manager for hire.
Remember Doc Brown from the _Back to the Future_ movies? Do you
We've slightly trimmed the long signature. Click to see the full one.
Re: Octets with non-8 bit bytes...
On Fri, 11 Jun 2004 12:17:15 -0700, Guy Macon
<http://www.guymacon.com wrote:

[...]
Quoted text here. Click to load it

I first used Forth in the late 70's-early 80's (though I've never been
a "Forth Hacker"), and I've never seen this term before.  

The term I've always heard used is "cell," which is the size of a
single entry on the data stack, and at least 16 bits wide in ANSI
standard Forth.  A "cell pair" holds "double cell" values.  A
"character" is allowed (but not required) to be narrower than a
"cell."

Quoted text here. Click to load it

FWIW, a "word" in Forth is what you might call an "operator" or a
"function" in c.  Actually, it's a little more generic than that.
Almost everything in a Forth program is a word.

Quoted text here. Click to load it

Never heard of that one either...

[...]

Quoted text here. Click to load it

Shave and a haircut...

Regards,

                               -=Dave
--
Change is inevitable, progress is not.

Re: Octets with non-8 bit bytes...
Quoted text here. Click to load it

Would that make 8 bits a dollar?  I've never liked calling 8 bits an
octet, it sounds like an overgrown musical group...

Lets see...   a dollar buys an ascii char, two dollars gets you signed
numbers from 32575 to -32576 and four dollars can buy... well you get
the idea.  :)

I am building a 50 cent CPU!  Cool, thats what I'll call it,
FiftyCents.  

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com
We've slightly trimmed the long signature. Click to see the full one.
Re: Octets with non-8 bit bytes...
oN 11-Jun-04, rickman said:

Quoted text here. Click to load it

Yes, and comes from the practice of cutting a silver dollar into 8
"bits" for making change.

--
Bill
Posted with XanaNews Version 1.16.3.1

Re: Octets with non-8 bit bytes...
On Fri, 11 Jun 2004 22:23:05 GMT, "William Meyer"

Quoted text here. Click to load it

I think it was a Spanish "dollar", though. Pieces of eight.

--
Al Balmer
Balmer Consulting
We've slightly trimmed the long signature. Click to see the full one.
Re: Octets with non-8 bit bytes...
oN 11-Jun-04, Alan Balmer said:

Quoted text here. Click to load it

Yes, you're right.

--
Bill
Posted with XanaNews Version 1.16.3.1

Re: Octets with non-8 bit bytes...
oN 11-Jun-04, rickman said:

Quoted text here. Click to load it

Actually, I was in error, according to this link:

http://www.geocities.com/fifth_grade_tpes/twobits.html

From this it's clear that the usage predates the US silver dollar.

--
Bill
Posted with XanaNews Version 1.16.3.1

Re: Octets with non-8 bit bytes...
Guy Macon <http://www.guymacon.com wrote in message
Quoted text here. Click to load it

Does it matter if they are made of wood?
 
Quoted text here. Click to load it

12 bits?
 
Quoted text here. Click to load it

What is it called on an 18-bit machine?

20 bits?

21 bits?

24 bits?
 
Quoted text here. Click to load it

Best Wishes

Re: Octets with non-8 bit bytes...
Quoted text here. Click to load it

14 bits: fortbit

Quoted text here. Click to load it



31 bits: month

Quoted text here. Click to load it


52 bits: deck


Quoted text here. Click to load it

76 bits: trombone

80 bits: PhileasFogg

Quoted text here. Click to load it

144 bits: dergrossbit

365 bits: year
366 bits: leapyear

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com
We've slightly trimmed the long signature. Click to see the full one.
Re: Octets with non-8 bit bytes...
Quoted text here. Click to load it
[...]
Quoted text here. Click to load it


Neither.  It's 19 bits.  By definition of "byte" in the C standard,
one 'byte' is whatever the size of type char is.  And, perverse as it
may be, CHAR_BIT=19% is allowed.  Even if the native addressing unit
of the processor is, say, 11 bits ;-).  Now, don't get me wrong, no
compiler writer in a remotely sane state of mind would actually do
that, but it's their customers and their own mental health that
dictates that, not the definition of C.

Read the fine print on pointer arithmetics in the C standard with such
an implementation in mind, and many of the seemingly crazy clauses and
restictions will suddenly begin to make sense...

I guess this would be 32 bits since 24 bits would not be
Quoted text here. Click to load it

You're not getting my point.  Which is that the C standard only
demands that a byte is directly addressable, but not that everything
directly addressable by the hardware must be a byte (by C's definition
of the term) of its own.  Otherwise, on 8051s a byte would have to be
1 bit wide, because they can address single bits.

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: Octets with non-8 bit bytes...
On 11 Jun 2004 23:18:08 GMT, Hans-Bernhard Broeker

Quoted text here. Click to load it

In fact CHAR_BIT=21% would even make a lot of sense.

The Unicode Scalar Value (USV) is from 0x0..0x10FFFF, thus fitting
nicely into 21 bits.

Paul


Re: Octets with non-8 bit bytes...
Quoted text here. Click to load it

You're right, I'm not getting your point.  If a byte must be directly
addressable and be able to hold a character, then it would have to be
either 24 or 32 bits on this hardware since 19 bits is not directly
addressable.  

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline