Using C preproc to uniquely identify structure members

- J
- John Speth
  
  Contact options for registered users
posted
3 years ago

Sat, Apr 10, 2021 5:51 PM

I could use some advice for a problem that I've had on my mind for years. I'm looking for a black box function that will examine a memory block of a known structure and output a list of IDs that point to any members that have changed. I have a strong feeling that this problem has been solved by many people before me but I've never seen any solution. I'm looking for an elegant solution that requires little to no source code changes when a structure member is changed, added, or removed. Please follow my problem description below to get an idea of what I'm looking for.

I have a largish C structure containing data members of various types and sizes. The structure definition will probably have to change over the course of project development. Data packaged in the structure will arrive at some processing function. The processing function will always keep a copy of the previously processed structure so that a compare operation can be used to identify any structure members that have changed. There is such a large number of members that my quest is to automate as much as possible the ID and comparison operations. Not automating would require the programmer to redo the comparison which would be a giant if/elseif/else statement with customized comparison for each structure member. That would invite mistakes if not done accurately.

Below is what I've hacked out so far using the C preprocessor. It quickly points to various problems that makes the problem seem like too much work for the preprocessor (like, for example, it won't work for c[3]). Maybe a code generation step using perl or python would work better. I thought I'd check with the experts before working on it further.

#define MEMBER(t,v) t v; size_t size_ ## v; size_t id_ ## v

typedef struct { MEMBER(int,i); MEMBER(char,c); MEMBER(float,f); } STRUCT;

STRUCT s;

s.size_i = sizeof(s.i); s.id_i = offsetof(STRUCT,i); s.size_c = sizeof(s.c); s.id_c = offsetof(STRUCT,c); s.size_f = sizeof(s.f); s.id_f = offsetof(STRUCT,f);

printf("I: Size = %d, ID = %d\n",s.size_i,s.id_i); printf("C: Size = %d, ID = %d\n",s.size_c,s.id_c); printf("F: Size = %d, ID = %d\n",s.size_f,s.id_f);

Thanks - JJS

- H
- Hans-Bernhard Bröker
  
  Contact options for registered users
Vote on answer
posted
3 years ago

Sat, Apr 10, 2021 9:16 PM

Am 10.04.2021 um 19:51 schrieb John Speth:

The key flaw in that plan, as executed so far, is that you're trying to use a native C struct for this job. Arbitrary data of varying amount and composition doesn't live comfortably in such a rigidly structured format. Among others, forget about ever loading a piece of memory stored by a programming referring to one definition of that struct and having it interpreted sensibly by a program referring to a changed definition. C structs are not at all meant for such work.

This is a job for a low-level type of database, or at the very least something like a tagged record-based data file.

Automate what: the work of doing the actual comparison, or the work of creating/updating the code that does the comparison?

I don't see a re-do, just an update to the existing comparison.

This design also violates an important principle: do not mix constant and variable data in the same structure. This is even more important in smallish, bare-metal embedded work, where you want the constants to actually be in (flash) PROM, not waste space in RAM.

Worse than that, they wouldn't even contribute to the solution of the actual problem. If these data remain strictly inside a realm of executable code using the exact same definition of that struct (on binary compatible platforms), you don't need the extra fields for anything. If they can cross the boundary of that realm, having those elements doesn't help with the actual problem --- you then need a proper interface specification, which will _not_ be a C struct definition, for the transfer of those data, and serialization/deserialization functions to handle the translation between internal and external formats.

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
3 years ago

Sat, Apr 10, 2021 10:12 PM

On Sat, 10 Apr 2021 10:51:18 -0700, John Speth wrote:

While I agree with Don and Hans-Bernhard re: maintainability, I did something similar in one of my projects.

In my case, I was translating messages between machine format (C structs) and readable text format (to be sent via TCP or logged to a file, depending).

I used a table-driven solution shown below. Note this is just one example - there were hundreds of such message structures in the application, and keeping everything consistent was a chore. The C structs were used by many tasks, but only one needed to translate them.

YMMV, George

*****************************************

typedef struct { DWORD jobId; float position; /* position for _first_ exposure

*/ int slices; long H1exposureTime; /* ms total */ long H2exposureTime; /* ms total for all - this may be a guesstimate */ int H2s; /* number of H2s */ char* magazineId; RECTANGLE border; /* image border */ char* patientName; char* printDate; } MSG_JOB_PARAMETERS;

*****************************************

typedef struct { BOOLEAN use; char* text; t_Parameter type; int offset; } MESSAGE_FORMAT;

static MESSAGE_FORMAT JobParametersFormat[] = { { 1, "[EXPJ]" , PARAM_NONE , 0 }, { 1, "%JobId=" , PARAM_INTEGER, offsetof( MSG_JOB_PARAMETERS, jobId ) }, { 1, "%StartPosition=" , PARAM_FLOAT , offsetof( MSG_JOB_PARAMETERS, position ) }, { 1, "%NumberOfSlabs=" , PARAM_INTEGER, offsetof( MSG_JOB_PARAMETERS, slices ) }, { 1, "%TotalH1ExposeTime=" , PARAM_INTEGER, offsetof( MSG_JOB_PARAMETERS, H1exposureTime ) }, { 1, "%NumberOfH2s=" , PARAM_INTEGER, offsetof( MSG_JOB_PARAMETERS, H2s ) }, { 1, "%TotalH2ExposeTimeGuess=", PARAM_INTEGER, offsetof( MSG_JOB_PARAMETERS, H2exposureTime ) }, { 1, "%MagazineId=" , PARAM_STRING , offsetof( MSG_JOB_PARAMETERS, magazineId ) }, { 1, "%HotspotX1=" , PARAM_INTEGER, offsetof( MSG_JOB_PARAMETERS, border.left ) }, { 1, "%HotspotY1=" , PARAM_INTEGER, offsetof( MSG_JOB_PARAMETERS, border.top ) }, { 1, "%HotspotX2=" , PARAM_INTEGER, offsetof( MSG_JOB_PARAMETERS, border.right ) }, { 1, "%HotspotY2=" , PARAM_INTEGER, offsetof( MSG_JOB_PARAMETERS, border.bottom ) }, { 1, "%PatientName=" , PARAM_STRING , offsetof( MSG_JOB_PARAMETERS, patientName ) }, { 1, "%PrintDateTime=" , PARAM_STRING , offsetof( MSG_JOB_PARAMETERS, printDate ) }, { 1, "[]" , PARAM_NONE , 0 }, { 0, NULL , PARAM_NONE , 0 } };

static BOOLEAN LM_Encode( LM_ConnectionInfo* connection, BYTE* msg, MESSAGE_FORMAT* format ) { char buffer[64]; int retcode; int index; int length; int i;

int* iptr; float* fptr; char** sptr;

retcode = LM_OK;

index = 0; for ( i = 0; format->use == TRUE; ++i ) { if ( i == 1) { length = sprintf( buffer, "%%ConnectionId=%d\r\n", connection->socketId ); } else { iptr = (int*) (msg + format->offset); fptr = (float*) (msg + format->offset); sptr = (char**) (msg + format->offset);

switch ( format->type ) { case PARAM_NONE : length = sprintf( buffer, "%s\r\n" , format->text ); break; case PARAM_INTEGER: length = sprintf( buffer, "%s%d\r\n", format->text, *iptr ); break; case PARAM_FLOAT : length = sprintf( buffer, "%s%f\r\n", format->text, *fptr ); break; case PARAM_STRING : length = sprintf( buffer, "%s%s\r\n", format->text, *sptr ? *sptr : "" ); break; } format++; }

if ((index + length) < connection->sendLimit) { strcpy( &connection->sendBuffer[index], buffer ); index += length; } else { retcode = LM_BAD_MESSAGE_SIZE; break; } }

return ( retcode ); }

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
3 years ago

Sun, Apr 11, 2021 1:20 AM

But you're essentially (and unconditionally) doing the same thing with each member.

The OP's comment suggests he wants to act on CHANGED values. What's not specified is what those actions are likely to be I'm assuming if member X changes, he may want to do X_action() while if Y has changed, he may want to do Y_action(). When member Z is added, he still relies on the developer to create Z_action().

Also, your application benefits directly from the definition of the members -- in terms of defining which are present in the "messages" as well as the order in which they are to be emitted (which may not be significant). Nothing is ever ignored (if it's in the table, it's emitted). The OP's comment suggests if there is no *change*, then none of the

*_action()'s are invoked (?)

- J
- John Speth
  
  Contact options for registered users
Vote on answer
posted
3 years ago

Sun, Apr 11, 2021 2:42 PM

Thanks for the alternative thoughts you have provided. I admit my idea is only half thought out. My quest is to be able specify a structure definition, run make, and get some code of some sort that will output a list of changed structure members when a new copy of the structure arrives during run time. The details are all TBD by way of this discussion.

After reading the replies and thinking it through further, I think a perl or python assisted code generation step in the makefile will be the best way to go. The C preproc alone is insufficient. The script will take the structure definition as input. Script output will be a list of #defines that uniquely identify each structure member and a processing function that scans the incoming memory block for changes. There could be more than that for output after all is designed and implemented.

The proposed output above is something that could be coded manually but I see that something like this could have a high degree of re-usability and worth the investment.

JJS

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
3 years ago

Mon, Apr 12, 2021 5:06 PM

All true, however I said I did something "similar" - not the same. The first problem to be solved here is how to iterate over the struct members - and I've shown a fairly simple way to do that.

My approach easily can be modified to detect changes in member values. It's simple to cache the last value seen for each member, or in the case of a string to keep a hash value so as not to have to copy and store characters.

What I don't have a good solution for is automating construction of the tables. In C++ it might be doable with some template wizardry (I haven't actually tried it), but in plain C its up to the programmer to keep the tables congruent with their target structs.

YMMV, George

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
3 years ago

Mon, Apr 12, 2021 6:13 PM

Yes. But, what you're ULTIMATELY doing with the results (creating a message) is uniform across all members. You're not doing one thing with magazineIDs and something different with printDates.

My point was the OP seems to be wanting to automate the "easy stuff"; the hard stuff is likely "member-dependent". I don't see how he's going to add that in a way that makes it resilient to developer errors (if he can't expect developers to get comparison operations correct!)

So, consider how to focus on the *actions* that are associated with each "detected change" instead of just automating the detection of the change.

In my IDL compiler, I just copy member function names to a "foo.h" as I parse the interface (with appropriate command line switch). The developer dutifully #includes that where needed.

Similarly, building the "dispatch tables" for each object class. (Of course, I'm in the same boat as the OP when it comes to actually crafting those stubs; there's no way to tell a tool what you want the code to do -- other than writing the code to do what you want it to do! :> )

So, having one place where the interface is defined ensures everything that needs to coincide with that definition gets defined properly (as an output of the IDL compiler). The goal being to ensure the REAL compiler throws an error if you've forgotten to do something that *only* you can do (like fleshing out the stubs or failing to #include a header file, etc.).

But, this comes at the expense of developing a tool to do that work for you.

I have no idea how you could do this in the C preprocessor. M4 may be able to lend a hand. But, in general, most (old) assemblers had more flexible "macro languages" where you could actually parse arguments, etc.

And, you have to also consider how robust the tool/technique will be. What if the input isn't what you'd expected? Will you throw an (compile) error? Or, silently generate gobbledygook?

In my world, the C compiler doesn't provide any effective type checking/enforcement. E.g., a foo_t and a bar_t are both the same underlying basic (C) type. So, no way of ensuring that you're actually invoking a member defined for a particular object type *on* an instance of that type! (i.e., you'd have to rely on run-time error detection instead of catching it at compile-time; rolling a tool gives you that added benefit)

Again, I repeat the observation that you are now in the tools business once you head down this path. And, perpetually obligated to ensure your tools track the needs of the project AT ANY POINT IN ITS DEVELOPMENT *history*. You're now developing, testing and maintaining a tool instead of just "your code".

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
3 years ago

Tue, Apr 13, 2021 1:51 AM

Detecting changes AND doing something with them (the next part below) can all be table driven.

Include in each table entry a (pointer to a) function to be called with the value(s) if/when a change is detected.

Back to the maintenance issue.

However, the OP himself raised the possibility of custom preprocessing tools. If you really need to work in C, a tool that reads struct declarations and generates code for walking them probably IS the best generic solution.

However, if the input is limited to legal C syntax, associating arbitrary struct members with arbitrary functions will be ... let's call it "really, really hard" and leave it there.

The complexity will depend on whether structs/unions recursively can contain other structs/unions. The more general, the more complex.

Personally, I wouldn't want to try doing it with M4 - it would be easier to use a real parser tool and grab/modify the struct handling code from an already written C parser. There are a number of them available.

The input should be legal C structs, so if a custom preprocessor fails, so should the C compiler. Not a problem if fails silently.

Similarly, if a preprocessor produces illegal C code, the C compiler should catch that later.

The worrisome case is that the preprocessor produces legal code that is incorrect given the input. Only inspection will catch this.

Yup. Tools to make tools.

YMMV, George

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
3 years ago

Tue, Apr 13, 2021 3:49 AM

The *hook* to invoke an "X_action" can be part of the table. But, the developer is still faced with writing that code; likely more involved than checking for changes.

Note, also, that using a function leaves you with scoping decisions for variables; it's unlikely X_changed() will want to access the same stuff as Y_changed().

See above.

Subject to all these caveats... :>

M4 is... "disappointing" :>

I think you'd want to "expose" the struct early so the compiler can throw errors before the "tool" has to try to make sense of them. That way, the tool *knows* the input is, at least, syntactically correct.

I'm finding it hard not to "evolve" the tools -- largely because I'm discovering new uses for them as I use them.

But, the idea of being able to support earlier versions is more trouble than its worth; anything older than X is simply "unsupported".

[As I have very few people using the tools, I can likely get away with this -- as long as I don't piss anyone off by making "radical" changes.]

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
3 years ago

Tue, Apr 13, 2021 11:07 AM

That is an odd thing to say. C /does/ provide type checking and enforcement. If you choose to use it in particular ways, it can have very strong typing - but that comes at a cost in writing convenient and clear code. In particular, structs and unions introduce new types and you have to explicitly write messy pointer casts in order to break the type safety.

The challenge with C is that it makes it quite easy to break type safety, and people often write code that does this.

The way you ensure that you are dealing with a member of a particular struct type on an appropriate object of that type, is to avoid point casts except under very controlled and very necessary circumstances.