Squeezing bytes out of a struct

D

D Yuniskis 16 years ago

Hi,

[I'll post this here instead of c.l.c as I suspect folks here are far more adept at "squeezing bytes" out of things (vs. the "desktop jocks")]

While arguably not legitimate, the following is in the

*spirit* of the language (consider much of this pseudo-code):

typedef struct { size_t length; int foo; char bar; element_t array; } fussbucket_t;

fussbucket_t sample;

sample.array[0] = (element_t) foo; sample.array[1] = (element_t) bar;

ASSERT(sample.array == sample.array[0]);

etc.

To accommodate cases where sizeof(array) is "too large" (whatever that means) to dangle off the bussbucket_t, you could optionally store a pointer to the array in place of array[0]. E.g.,

typedef struct { size_t length; int foo; char bar; union { element_t array; element_t *pointer; } u; } fussbucket_t;

fussbucket_t sample; element_t longArray[A_BIG_NUMBER];

sample.u.array[0] = &longArray[0];

If sizeof(element_t) is smaller than sizeof(element_t *), then you can cheat and pack a few element_t's into the union. I.e., as if:

struct { size_t length; int foo; char bar; union { element_t array[sizeof(element_t *)/sizeof(element_t)]; element_t *pointer; } u; }

I *think* this may be portable -- I haven't even checked the syntax to see if I've shot myself in the foot before I even got started! :< Is there a cleaner way of doing this? I can already see some pathological cases where it would *look* right and yet result in entities that would choke at run-time.

Thx!

--don

Vote

D

D Yuniskis 16 years ago

[Grrr... should probably have run this through cc before posting :< ]

sample.array[0] = (element_t) fu; sample.array[1] = (element_t) baz;

(eliminate any confusion with struct members)

s/bussbucket_t/fussbucket_t/

sample.u.pointer = &longArray[0];

[sorry!]

Vote

D

David Brown 16 years ago

It's probably fair to say that the c.l.c. crowd are more concerned with what is theoretically correct in C, while in c.a.e. we are more concerned with practical working code.

Your code is inarguably not legitimate, and is not in the "spirit" of any language I know of.

It would help enormously if you could give a short description of what you are trying to do - it's very hard to guess from this post. Your second "correction" post doesn't really help either - it is still mangled code, and mixing in partial quotations from your first post with partial corrections and smart-ass "sed" notation makes it even less clear.

Figure out what you are trying to do, and explain it in a post. Then someone will probably be able to help you.

mvh.,

David

Vote

J

Jon Kirwan 16 years ago

The best I could make of it is that the op wanted something like:

typedef struct { size_t length; int foo; char bar; element_t array[1]; } fussbucket_t;

And then use:

fussbucket_t* sample= (fussbucket_t *) malloc( sizeof( fussbucket_t )+ sizeof( element_t ) * (n-1) ); sample->length = n; sample->array[0] = (element_t) foo; sample->array[1] = (element_t) bar; . .

up to some limit determined by sample->length.

But I think that is standard c and discussed in the standard.

So that left me more certain I don't understand.

Oh, well.

Jon

Vote

C

Chris Burrows 16 years ago

Agreed. Not only working but also maintainable. The sample code posted was anything but that.

-- Chris Burrows CFB Software

formatting link

Vote

T

Thad Smith 16 years ago

You can only index array types and pointer types. What is the type of element_t?

That is true if array is a multidimensional array, but not, in general, otherwise.

That works if sizeof(element_t*) >= sizeof(element_t). Of course, that is true only for small types. In the general case, if that is not true, you have declared an array of zero elements, which isn't allowed in that part of the declaration.

You could make it work by using a max() macro: element_t array[max(sizeof(element_t),sizeof(element_t*))/sizeof(element_t)];

which guarantees at least one item in array. Unfortunately, you cannot use the standard preprocessor for selecting different code, depending on which is greater.

Thad

Vote

D

David Brown 16 years ago

I also thought it was something like that he meant. But then it looked like he wanted to have space for a few element_ts within the original struct, or alternatively a pointer to an external array of element_ts, and that he hoped to have the same simple syntax for accessing everything.

As you say, arrays of length 1 used as above are legal in standard C. With C99, you can use a flexible array (i.e., "element_t array[];"), but I'm not sure if sizeof(fussbucket_t) is then valid (it should work as desired in gcc, but is not necessarily standard). But it saves space if the array is sometimes empty, and makes the malloc size calculation slightly neater.

With gcc, you can also use "element_t array[0];" for pretty much the same effect in this case.

Vote

D

D Yuniskis 16 years ago

Yes, this was intended just to illustrate dangling the balance of the "array" off the end of the struct. See the later example for better syntax...

Again, it is intended to state that the 0th element of the array resides within the struct. How else could I express it (array[0] == array[0] is too obvious :> )

Yes, as I said: "If sizeof(element_t) is smaller than sizeof(element_t *), ..." it also works if sizeof(element_t *) == sizeof(element_t) but that degenerates to the original example.

Exactly. If element_t is "big" (relatively speaking), then there are no economies in this approach. But, in *practice*, this lets me pack as many as eight in.

Aside from the syntactic complexity, there is no real cost to this approach (though on some older compilers the extra addressing could benefit from a good peephole optimizer).

On more modern platforms, there was an unexpected and notable performance *benefit* (besides space economies). I suspect this may be a side-effect of references to the struct's members priming the cache (exploiting locality of reference between the "element_t's" and the other struct members that are associated with them).

The risk only happens when sizeof(element_t) > sizeof(element_t *). This is something that the developer has to safeguard against as there is no way that I can "#error" at compile time. What I have done is added a tiny executable to my makefile that essentially compares the sizeof's, prints an error message and returns a nonzero result code that causes the make(1) to fail with the message that it displays.

I can think of no other way to enforce this requirement.

This is only legal in newer C (C99?). E.g., you couldn't put such a declaration in file scope... (?)

Exactly. An unfortunate consequence of sizeof not being valid in the preprocessor (though I have found many compilers that

*will* recognize its use there).

There is also a hazard that can bite a sloppy coder if sizeof(element_t *) mod sizeof(element_t) != 0. But, you can't protect against incompetence :-) (I guess I could provide a macro to simplify accessing the array elements while avoiding this hazard).

Vote

D

D Yuniskis 16 years ago

Perhaps you could suggest how this could be made *more* maintainable? (keeping in mind that you don't want to make it *brittle* in the process!)

The following should compile and "work". I can't imagine how to change it to make it more "maintainable" :-(

===================================== typedef struct { size_t length; int fu; char baz; union { element_t *pointer; element_t array[sizeof(element_t *)/sizeof(element_t)]; } u; } fussbucket_t;

#define FREEBIES (sizeof(element_t *)/sizeof(element_t))

int main(int argc, char *argv[]) { fussbucket_t fuss, *fptr; element_t elem, longArray[(CNT)]; int count;

ASSERT((CNT) > 0); ASSERT((NUM) >= 0); ASSERT(sizeof(element_t *) >= sizeof(element_t)); /* FREEBIES > 0 */

/* trivial case */ fuss.u.array[0] = elem; /* guaranteed to fit */

/* optimization */ for (count = 0; count < (FREEBIES); count++) { fuss.u.array[count] = elem; /* guaranteed to fit */ }

/* dangling elements */ fptr = (fussbucket_t *) malloc(sizeof(fussbucket_t) + ((NUM) * sizeof(element_t))); ASSERT(fptr != NULL);

for (count = 0; count < (FREEBIES) + (NUM); count++) { fptr->u.array[count] = elem; /* guaranteed to fit */ } free(fptr);

/* external storage */ fuss.u.pointer = &longArray[0]; for (count = 0; count < (CNT); count++) { *fuss.u.pointer++ = elem; /* guaranteed to fit */ }

return 0; }

Vote

M

Mike Paff 16 years ago

A big thing that makes code maintainable is good comments. Describe what each variable/structure/type is used for. Describe input & output parameters. Include a description of each function/procedure's purpose, assumptions, and limitations.

Vote

D

David Brown 16 years ago

You can't do this directly with an #error (though C++ 0x should include static asserts, and it will probably turn up in C compilers too).

However, you can "cheat" to get static asserts by using declarations like this:

extern char sizeCheck[ (int) sizeof(element_t *) - (int) sizeof(element_t)];

If sizeof(element_t) is greater than of a pointer, this is a declaration of an extern array of negative size, giving a compile-time error. (Without the (int), this would be calculated with unsigned arithmetic and just declare a very big array.)

I believe that with a "standard" max macro, it is legal for pretty much any C.

#define max(a, b) (((a) > (b)) ? (a) : (b))

extern char modCheck[!(sizeof(element_t *) % sizeof(element_t)) - 1];

Vote

C

Chris Burrows 16 years ago

Some more information is required first. AFAICS you have proposed a solution without clearly describing the problem domain that it is attempting to address.

What exactly are you trying to achieve?

What alternative solutions have you rejected and why?

Have some particular real-world circumstances driven you to devise such an obscure approach or is this just an intellectual exercise?

-- Chris Burrows CFB Software Armaide: ARM Integrated Development System

formatting link

Vote

J

Jon Kirwan 16 years ago

I was just about to ask the same thing.

Jon

Vote

D

D Yuniskis 16 years ago

I try to keep everything compatible with C89 (!). It helps when/if I have to back port something to an older design (sometimes for processors that are no longer being produced; or, for toolchains that are no longer maintained). Even using "//" comments leaves me nervous (though I can clean them up with a sed(1) script, if need be)

Ha! That is clever! I'd have to surround it with enough commentary to guarantee someone didn't just "fix" this by changing the array dimension.

I think I will still keep my little "testConfiguration" executable as it can be more exhaustive in checking the various other conditions that apply to each library/application...

Grrrr... yes. Brain fart. I was thinking of "sizeof()" as being a function invocation...

Yes, as above. Though I still think a macro might be more appropriate, here, as it cleans up the syntax a bit.

Vote

D

D Yuniskis 16 years ago

Of course! Though folks often don't create *good* comments! As a result, you often spend (maintenance) time "debugging the comments" instead of the code! You then end up with code that has changed but commentary that hasn't kept up with those changes.

How is my code any different from (apologies for any typos as I just quickly removed the struct syntax): ====================================

#define FREEBIES (sizeof(element_t *)/sizeof(element_t))

element_t array[sizeof(element_t *)/sizeof(element_t)];

int main(int argc, char *argv[]) { element_t fuss, *fptr, *pointer; element_t elem, longArray[(CNT)]; int count;

ASSERT((CNT) > 0); ASSERT((NUM) >= 0); ASSERT(sizeof(element_t *) >= sizeof(element_t)); /* FREEBIES > 0 */

/* trivial case */ array[0] = elem; /* guaranteed to fit */

/* optimization */ for (count = 0; count < (FREEBIES); count++) { array[count] = elem; /* guaranteed to fit */ }

/* dangling elements */ fptr = (element_t *) malloc(((FREEBIES) + (NUM)) * sizeof(element_t)); ASSERT(fptr != NULL);

for (count = 0; count < (FREEBIES) + (NUM); count++) { fptr[count] = elem; /* guaranteed to fit */ } free(fptr);

/* external storage */ pointer = &longArray[0]; for (count = 0; count < (CNT); count++) { *pointer++ = elem; /* guaranteed to fit */ }

return 0; } ============================ I.e., if I had posted this, would you complain that it was not maintainable? (note the comments are intended for *this* presentation -- as a code sample, not as "production code")

Is the following more to your standards? (It won't compile since it is just a *snippet* from a speech synthesizer I wrote) Apologies for the long lines and embedded tabs... I also had to selectively remove words from comments to try to reduce the number of lines that wrapped :<

It's amusing that the sources for this application consume just about half a megabyte -- yet the binary is only about 20K (most of that being tables -- because they are *easy* to document and maintain :> )

========================================= // Note RIGHT and LEFT context processing is virtually identical. An // obvious difference is the direction in which the ``input'' pointer is // advanced. This also affects limit checking -- since the ``end'' // of the input string in a LEFT context parse is to the left of the // point at which the parse was begun. // // This is a recursive algorithm. However, the depth of the recursion // is controlled entirely by the ``template'' provided. The ``input'' // being examined has no effect on the level of stack penetration. So, // we can control the depth of recursion at compile-time without fear // of unbounded run-time resource requirements. bool context( const char *input, // -> text being examined const char *const end, // -> last character to examine const char **template, // -> templates to match against const int direction // parse direction {LEFT,RIGHT} ) { const char *pattern;

// contractual obligations ASSERT(template != NULL); ASSERT(*template != NULL);

ASSERT(input != NULL);

ASSERT(end != NULL); // if (direction == RIGHT) // ASSERT(end >= input); //else // ASSERT(end

Vote

D

D Yuniskis 16 years ago

What does it matter whether I am designing a surgical robot vs. an HVAC controller? I, with my knowledge of the application and my design approach to it, have decided that I need to package "things" in structs. Think of them as objects (without a genuine OOPL).

But, obviously, at least one member of that struct has the potential for being "large" and I don't want to penalize all instances of that struct (i.e., object class) with the overhead of being able to completely contain that member.

On the other hand, it is not fair to penalize all the *other* instances of that struct with the overhead of storing a pointer just to handle those instances where that "member" must be stored external to the struct. E.g.,

typedef struct { size_t length; int fu; char baz; element_t *pointer; element_t array[SOME_NUMBER]; } fussbucket_t;

is wasteful of the storage set aside for pointer (array) in those struct instances where the member is actually in the array (pointer).

So, a union makes perfect sense (as memory isn't without cost in a limited resource environment).

But, putting a union in potentially wastes the space in the array if the pointer is NOT used -- since the element_t can be smaller than an element_t*. (e.g., consider element_t being a char on a 64 bit machine)

There are lots of other subtle advantages to this approach that manifest themselves in *some* applications (but not others). But, I don't really see any downside to using this solution other than the fact that the syntax gets a bit more sugary (this isn't a real issue as I use lots of other "hairy syntax" quite freely that could just as easily intimidate a maintainer -- "hire competent people")

See above.

See above. Also see the "other" code sample I posted in response to Mike Paff's comment (different problem entirely)

I don't get paid to "pontificate" :>

This is c.a.e; as with most problems, it all boils down to resource constraints: time, space and money! This approach eases the memory requirement substantially, offers a small performance increase ("for free") on some processors and, as a result of both of the above, saves *money* (in DM+DL).

Vote

D

David Brown 16 years ago

Do you also keep copies of your source code on 3.5" floppies, just to be compatible with ten year old computers?

Write your code to a portability level that /realistically/ suits your work. If you are writing code that is aimed targeting processors and tools from this century, take advantage of the language and tools currently available. It may be that you one day have to backport your code to work with an ancient development system - but you do that work when that day comes, rather than hobbling your current work.

Never use comments for something that can be expressed in the language - too many comments in code is at least as bad as too few. As you mention below, this is best wrapped in a macro.

#define staticassert(name, claim) \ extern char staticassert_##name[ -!(claim) ] \ __attribute__((unused));

(Note the use of a gcc extension, because it aids in getting better static code checking. If I were to use this on a non-gcc compiler, I'd just define __attribute__() as an empty macro.)

Put in comments at the definition of staticassert, of course. But use it as:

staticassert(checkElementSize, sizeof(element_t *) >= sizeof(element_t));

There is no need to comment such a line.

Vote

D

D Yuniskis 16 years ago

No. Everything gets moved to a RAID5 array. But, I keep a wide variety of drives/media so that I *can* support whatever media a client brings to me or on which they want their work delivered. This includes 8"/5"/3.5" floppies, several different (physical) size/format MO, MD, CD, DVD, ORB, ZIP, various tape (incl 1/2" 9-track), etc. I *did* misplace my 8" HARD sectored floppy but haven't wasted any time trying to find a replacement :> I can even support PPT -- though only for a short while as I need to find a new home for the ASR-33 :-/

That is *exactly* what I am doing! :> I don't force clients to do things the way *I* want but, rather, adapt to their requirements. "They're the Boss".

I work in lots of different industries and for companies of varying sizes. If a firm has invested $20K in tools, do I tell them "No, I want to use these *other* tools, instead? And, if you want to be able to maintain what I have done, you should buy a copy for yourself, too?" Tail/dog/wag.

In some industries, products are highly regulated -- including the *process*. Using something that "suits me" might cost the client a fair bit of time/effort just to get that toolchain "certified".

Over the years (decades?) I have found that the safest approach is to follow the greatest common denominator in a particular language. The more specific your choice, the less flexible.

If you work for *one* employer or in *one* industry, you can just adopt whatever is en vogue, there, and never need to worry. :> I like the variety of challenges that I am exposed to so I adapt my style to be most receptive to these.

Probably 3/4 of the products that I have designed were in the

*last* century. Many of them are still in production (I know of two that are 20 and 30 years old, respectively -- I have no idea how they are even buying *parts* for these things!). The fact that I can make a change to them in a man-week keeps them viable. If I had to tell the clients to redesign their products so that I could use newer tools and language constructs, I'd get a blank stare: "You want us to do *what*??"

The products that I am designing for deployment years *hence* don't suffer from my design choices. I don't deprive myself of the ability to use long longs, for example. Or, restrict myself to using only integer arithmetic (as some processors don't support hardware assist), or restrict myself to use only floating point (as some processors *prefer* floating point arithmetic!).

I write in a very flexible way (hence "element_t" and not "long", "long long", "char", etc.) and adapt the code AT COMPILE TIME to my needs. E.g., my BigDecimal library looks at the capabilities of the target machine and picks data types to optimize for speed or size (based on a single configuration switch). Sure, I could make parts of it run significantly faster by dropping into ASM in key places (e.g., handling carries is a pig), but what it buys me doesn't make up for what it *costs* me (limiting me to a particular processor/assembler/compiler/debugger)

How does anything I have proposed here hobble my current work? If I *need* something that is only available in "X", then there is no choice. But, adopting "what's en vogue today" just because its "today" is ludicrous. Should I start convincing clients to write everything in C++, Java? Python? Should I go on a religious crusade to "reeducate" them on the approaches and tools that are "hot" today? Should I point to studies touting one design methodology over another and project cost savings that they might attain?

Do *you* do this with your employer(s)? How successful have you been at it? Is this a part of your job that you *enjoy*? ;-)

You're assuming someone will understand *why* this is the case. You will note that the code I have posted here has not changed substantially from my original post. Yet folks (yourself included) seemed to have a hard time wrapping their heads around that original! (devoid of comments) Despite the fact that the concepts were relatively straightforward (in hindsight).

Yet, you are expecting someone to see a compiler error ("array index out of bounds", etc.) emitted from a make(1) and be familiar enough with the code to understand *why* it happened?? Without supporting commentary?

Exactly. The comments that you include should pertain to what you are doing and how *generally* you are doing it and "why". Instead, people seem to spend much of their time describing what the code already describes. It is akin to: IAC ;add one to accumulator (d'uh...) in the i4004 days. :-/

I don't use gcc extensions. Nor MS extensions. Nor extensions to the language. The code I posted earlier will run (i.e., be deployed) on three different processors in three different libraries (using three different compilers). Why should I litter the code with all sorts of "#ifdef GCC", etc. when this doesn't buy me anything in *ALL* OF THE ENVIRONMENTS IN WHICH I'M BUILDING? (i.e., if it is "necessary" in one place, then why isn't it necessary in the others? Conversely, if it ISN'T necessary in those other places, then why is it in the *one* place?) Every time I have to fix a desktop application that has been "widely ported", I cringe at sorting through all the #ifdef's, etc. It just clutters things up.

[granted, most/many of these things have effectively been back-ported so it would be a lot to expect someone to rewrite them from scratch just to make the code "prettier". It is far more practical for me just to run the sources through cpp(1) and then remove superfluous contiguous blank lines (with another smart-ass sed script :> ).

Note that this isn't intended as an indictment of "your" way of doing things. Rather, it's just an explanation of why

*I* do things "my" way. "Because it is right for *me*" :>

End of rant. :> I've got to go cover the citrus trees...

--don

Vote

D

David Brown 16 years ago

Since such a large proportion of your work involves maintenance and revisions of old systems, then I fully understand your need to work with older standards.

My point was just that it makes sense to take advantage of more modern languages and tools if these are suitable for a major part of your work

- you don't have to stick to the lowest denominator. For example, I have some systems that were developed with a 15 year old C compiler. I need to be sure that I can modify and re-compile the code if there are changes, but when I write code for current systems, I don't make any effort to be backwards compatible with the 15 year old tools. I think many developers are in a similar situation - clearly, you are not since you need to share code between many different older systems.

There have been many improvements in the C language since C89, and there have been many improvements in the tools since those days. And for people who work with only a few different compilers, they can take advantage of tool-specific extensions. An example would be the use of "long long" (since you mentioned it above) - if you need 64-bit integers in your software, with modern tools, you just use a "int64_t", rather than having to write your own library. (Even once the library is written, you still have to maintain it and possibly port it, and its usage will cost you in code space and run time.)

However, for your work compatibility with existing systems is more important that getting the benefits of newer tools. It's all a matter of balance.

That's taking it to extremes. You might be being sarcastic (I must apologise for my somewhat sarcastic tone earlier), but some people do actually think that way. I am a fan of weighing up the pros and cons of language and tool choices, to get a reasonable balance and the right tool for the job. I would not consider something like Java for a small embedded system (and certainly not moving a working software base to a different language), but neither would I consider C for a large piece of PC software.

I /do/ use different languages, and different variants or versions of languages, and different tools at my job. And yes, it /is/ a part I enjoy - I like the variation. But I don't pick different tools or languages just for fun. Code that can be reused between projects has to be fairly portable between tools, while code that is more specific for a particular target can be written in whatever way works best at the time.

I don't mean to say you don't need comments or shouldn't use them, just that you should not use a comment if you can say the same thing in the language. For example, if a good choice of name (for the macro, function, type, whatever) can make the code clear, then it is better than using a poorer name choice and a comment.

The problem was that your code had significant errors in it - it was far from clear what you were trying to do. After your other posts, it made more sense - as you say, the concepts are not that hard.

I expect two types of comment here. One is some commentary on the definition of the macro (in a common header file), saying what it does and how it is used, and also noting what error messages are given when the check fails. When the macro is actually used, there may or may not be comments depending on how clear you feel it is.

All I am saying, is that unnecessary comments are a bad idea. They detract from the flow of the code, and can make it harder to see what is actually going on. If the code does not need comments, then they should not be there.

In the case of checks like this, you probably expect that the check will succeed unless someone uses the code without understanding it. It's /okay/ if a hard-to-understand error message turns up in this case. The programmer then has look up the definition of the macro (with its comments) to see what is going wrong. It might take him more time than it would if there were comments beside the macro call explaining about the "array size negative" error messages - but that would detract from the legibility of the code. It is better that the extra effort is needed in the rare case of it going wrong, rather than the common case of things going right.

I think we agree on most points here, it's probably just my posts were not very clear.

You typically aim to put the #ifdefs hidden away in a single place in a header file, rather than scattered your code. I agree that #ifdefs in the main code are to be avoided where possible.

But why do I use such an extension? Well, if I compile code using the macro (at function scope) with gcc and most of its warnings enabled, then I get a warning about an unused name. This is a useful warning - it often indicates a mistake in the code. I don't want a warning here, because I like my code to compile without warnings - but I don't want to disable the warnings for unused variables in general. If the same code has to be compiled with compilers that don't support gcc __attributes__ (some non-gcc compilers also support them), then a simple #define is used to remove the attribute.

This comes back to using the features available. Using __attributes__ in gcc often lets the compiler give you better warnings, or build more optimal code. Thus I use them in my code. And if I'm compiling the same code with another compiler, the attributes will be ignored. Writing portable code does not mean just limiting yourself to the least common denominator of your tools - it means getting a balance between a common subset for all targets, and the use of the preprocessor to get better object code matched for individual targets. Where that balance lies will vary according to the developer's needs, of course - my priorities are not the same as your priorities.

sed scripts are just another form of preprocessor (or postprocessor). Such scripts can be very useful.

Ditto!

mvh.,

David

Vote

Squeezing bytes out of a struct

Join the Discussion

Didn't find your answer?