C preprocessor magic to put a define in a string

- T
- Tim Wescott
  
  Contact options for registered users
posted
11 years ago

Wed, May 16, 2012 12:12 AM

Title says it all.

I'd like to be able to pass version numbers in as command-line defines, then do some magic such that they can appear both in strings and in numbers.

So, I call something like:

g++ -DMAJOR_VERSION=3 -DMINOR_VERSION=10 version.cpp

and in the code I do one thing in one place:

unsigned int major = MAJOR_VERSION; unsigned int minor = MINOR_VERSION;

and in another I end up with a string that says

"blah tee blah tee blah version 3.10"

But -- how do I make the blasted string?

At worst, I can make everything strings and use str2long, but I prefer to do as much during the compilation step as possible.

--
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

- R
- Radey Shouman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 12:27 AM

#include

#define VERSION_STRING_1(major, minor) "blah tee blah tee blah version " #major "." #minor #define VERSION_STRING(major, minor) VERSION_STRING_1(major, minor)

int main () { fputs(VERSION_STRING(MAJOR_VERSION, MINOR_VERSION), stdout); return 0; }

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 12:28 AM

C pre processor can concatenate two strings with ##

#define MyString(maj,minor) blah tee blah tee blah ## major##.##minor

called with

#define MAJOR_VERSION 3 #define MINOR_VERSION 10

Or command line define

MyString(MAJOR_VERSION,MINOR_VERSION)

generates

blah tee blah tee blah 3.10

w..

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 12:48 AM

Welcome to quoting hell. :)

Compile line: gcc -DMAJOR_REV="\"4.3\"" -o eff.exe eff.c

Please note the extra, *sacrificial*, quote marks!

Program:

#include #include

#ifndef MAJOR_REV #error OHNO #endif

const char MAJOR[] = MAJOR_REV;

int main(void) {

printf("MAJOR=\n",MAJOR);

return 0; }

When doing this sort of thing with gcc, the -E option is most helpful.

-- Les Cargill

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 1:33 AM

#

Thank you. It's even in my C book, now that I know what to look for.

--
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

- A
- Andy
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 3:05 AM

,

"

0;

ttdesign.com

you might want an #ifndef to define fallback values or issue a warning if you forget to set the magic values

- J
- Joe Chisolm
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 3:46 AM

#define str(s) #s #define DtoSTR(x) str(x)

unsigned int major = MAJOR_VERSION; unsigned int minor = MINOR_VERSION;

char version_string[] = "blah version " DtoSTR(MAJOR_VERSION) "." DtoSTR (MINOR_VERSION) #if ALPHA_VERSION > 0 "_a" DtoSTR(ALPHA_VERSION); #else #if BETA_VERSION > 0 "_b" DtoSTR(BETA_VERSION) #endif ; #endif char copyright[] = "Copyright " DtoSTR(CPYEAR) " " DtoSTR(CPCOMPANY) ;

CPYEAR and CPCOMPANY are normal -D also with possible quoting if there is a space in the company name or such. Other than that no funny quoting.

--
Chisolm
Republic of Texas

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 4:34 AM

You can glue strings together with the "##" pseudo-operator:

#define NUMBER 4 "This is a string " "that is built from " ## NUMBER ## " pieces."

Note that this doesn't assume NUMBER is actually a "number"!

However, when it comes to versioning, I much prefer letting the version control system (RCS, SCCS, CVS, SVN, etc.) keep track of the actual version identifiers for me. Otherwise, you have to use some other mechanism to "remember" that "*this* binary was built with *this* snapshot of *these* sources using the command line XXXXXXXX (which is where you specified minor/major)".

In this case, you can freely use whatever keywords your VCS supports to tag the sources themselves! (so, when you check out a particular version of the sources, you get the sources as they existed at that time -- along with the associated versioning information).

E.g., under SVN, you might stick the following in your main.c:

char version = "$Id$";

which SVN will "fill in" (text substitution) for you when you check out the file so it actually looks more like:

char version = "$Id: main.c 295 2012-05-15 21:18:57 tim $";

(keywords other than "Id" can be used to get smaller portions of this/different variations)

KNowing this format, you can then extract whatever information you want from the "version" string at run time:

if (9 == sscanf(version, "$Id: %s %d %d-%d-%d %d:%d:%d %s $"; filename, revision, year, month, day, hour, minute, second, who) ) { printf("This file, named '%s' was last modified by %s " "at %d:%d:%d on %d/%d/%d. This is it's %d-th revision.", filename, who, hour, minute, second, month, day, year, revision); }

(Of course, you can do far *less* with this information, too!)

N.B. if you *don't* reference "version" in your code, you usually have to protect it from lint.

I don't like having to put the commands I used for each "build" under version control in order to keep track of variations I might have made in invoking the build tools :<

(though makefiles are under the VCS)

YMMV

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 8:23 AM

No. "##" does not concatenate strings. It concatenates tokens, and requires both sides of the "##" operator as well as the result to be valid tokens.

For me, it generates the errors (after fixing the maj->major typo) foo.c:10:1: pasting "blahMAJOR_VERSION" and "." does not give a valid preprocessing token foo.c:10:1: pasting "." and "MINOR_VERSION" does not give a valid preprocessing token and the output blah tee blah tee blahMAJOR_VERSION. 10

The correct operator for stringification is "#". There is no need to actually concatenate the result strings using the C preprocessor, because the compiler will do that later just fine. That is, #define MyString(maj,minor) "blah tee blah tee blah " #maj "." #minor #define MyString1(maj,minor) MyString(maj,minor) produces "blah tee blah tee blah " "3" "." "10" which the compiler turns into one string.

Stefan

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 10:24 AM

It is /really/ ugly to split up your initialiser like this with preprocessor conditionals in the middle. It is hideous style, and it makes it very easy to make mistakes with things like the semicolons - as you did in the example.

More controversially, I think it is also bad practice to use "#if" on macros if they are not defined - use "#ifdef ALPHA_VERSION" of preference.

A better way to write this would be:

#ifdef ALPHA_VERSION #define ALPHA_VERSION_STRING "_a" DtoSTR(ALPHA_VERSION) #else #define ALPHA_VERSION_STRING "" #endif

#ifdef BETA_VERSION #define BETA_VERSION_STRING "_b" DtoSTR(BETA_VERSION) #else #define BETA_VERSION_STRING "" #endif

char version_string[] = "blah version " DtoSTR(MAJOR_VERSION) "." DtoSTR(MINOR_VERSION) ALPHA_VERSION_STRING BETA_VERSION_STRING;

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 10:56 AM

No, subversion will /not/ do such keyword substitution unless you specifically tell it to do so for backwards compatibility with outdated version control philosophies. The modern style (for at least the last decade) has been that a source code version control system is there to track your source code - not to fiddle with it, re-write parts of it, or modify the code. There are lots of good reasons /not/ to do this.

First, someone other that /you/, the programmer, is modifying your source files. It's fine to have software write parts of your source code - but the software should write /its/ files, not /your/ files.

Secondly, it disturbs the integrity of the source code by making changes at unexpected times. A system where a check out changes the code is broken - a version control system should give you exactly the files you asked for, when you ask for them. No more and no less. When I compile a project, then check it in, then check it out again and re-compile, the source code had better be 100% identical and give 100% identical binaries.

Thirdly, it doesn't work for binary files or files which might happen to use the same syntax in the file. So either you are only applying it to some files, which makes it inconsistent and you have the chance of forgetting to enable it for important files, or you break your binary files.

Forth, the information contained in the version control system is not the same as the information you /actually/ want. There are all sorts of changes that can be made without passing through the version control system, but which might need tracked by version numbers or similar information. Conversely, you will often want to check things in without changing version numbers or revision numbers.

There are also better ways to achieve the same effects if you really want them. If you need information about the versions or logs tracked by the version control system, ask the version control system at the time that you need them (your binary output should not need these details, only your developers). If you need some general information to be included in the build, then add makefile commands to get that information from the repository at build time.

It is with good reason that many modern version control systems (such as git and mercurial) do not support keyword expansion, or discourage them (as subversion does).

- R
- Radey Shouman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 2:30 PM

I looked for "stringification". Hard to forget such a silly sounding name.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, May 16, 2012 6:58 PM

And I *strongly* disagree with this!

I consider it *imperative* that I be able to identify the branch and version that I am working on BY EXAMINING THE DOCUMENT ITSELF. Ever drop a bag full of screws and try to sort them *back* into their appropriate "bins"? Boy, it *sure* would be nice if they were color coded or easily differentiable by some means OTHER than examining head style (pan, round, flat, etc.), drive type (philips, slotted, clutch, etc.), thread diameter (#2, #4, etc.), thread pitch (20, 32, 56, etc.), length, measurement system (english/metric), etc.

Ever look at two pieces of (source) code and try to figure out where each resides in the chronology of the product's development? (remember, timestamps on files mean nothing) If sorting screws is hard, imagine having to compare files in the hope of finding a perfect match! (*assuming* you haven't made even the *slightest* alteration to the file since checkout)

By extension, are you suggesting SCHEMATICS shouldn't carry version information on them? Instead, each individual should use some ad hoc method of remembering which version they are viewing/modifying? Should PCBoards *not* bear revision identifiers? What about final products? Maybe we should just discipline ourselves to write the version number on the antistatic envelope that the board travels in -- and hope like hell that we never take it *out* of that envelope when it might be confused with *other* similar boards! Else, we'll spend a lot of EFFORT trying to figure out just what version we have in our left hand vs. our right!

The extent to which I support use of the $Log$ keyword in files varies. Usually, I err on the side of including it so that the chronology of a particular file version can be easily inspected by the maintainer WITHOUT HAVING TO GO FETCH OTHER DATA (and manually "staple" it to the file).

This is probably a throwback to my hardware design practices where the "revision block" on the title page contained terse descriptions of each sequential update to the board (while the formal ECO describes each individual change in detail).

Schematics are much denser than source listings. You tend to only find sparse comments *on* the schematic. For theory of operation, you look elsewhere (this is contrary to how software tends to be documented -- *when* it is documented!). So, you don't see lengthy explanations as to *why* the rating on a particular device was changed. Or, why a note regarding layout was added to the schematic. etc.

But, software is more fickle than hardware. Five revisions to a board might be "a fair number" whereas *fifty* revisions to a piece of code/module might be comparable! So, $Log$ tends to get lengthy. Often, part of a commit's log entry will be "trimmed old log entries from source file" -- since the truly inquisitive can always fetch them back from the repository.

Someone other than me, the programmer, is modifying my *object* files each time I invoke a compiler/assembler.

Note that a keyword expansion need not alter the *code* -- keywords can reside in comments (but, then can't be accessed *from* the code)

Anything you do with a file *before* checking it *in* is "illegal". The VCS has no way of "recording" the file before you've *used* it (to create some other controlled object). You check your changes

*in*. Then, check the objects *out*, build them and "do whatever".

Three weeks from now, the copy of your file that existed before checkin may not exist. Or, may still exist but no one can vouch that it has NOT been altered. It would be foolish to allow someone to build anything from that -- at least nothing that was going to be *controlled*!

If that file *disappears* before checkin, anything built *from* it is crap. Simply because you can't *recreate* the process by which those items were built!

Do you put your software through final test and *then* check it in? You might try a preliminary run of the test procedure but it's not "official" -- because the thing it is testing doesn't exist *formally* -- it has no part number, etc.

Formal test would *begin* by checking OUT "version XXX" of your software suite (using whatever tag you assigned to the release), building it using the documented procedures for that release. Then, running it through the test suite specified for that release.

You can compile your code before checking it in to verify that it *compiles*. You can test it before checking it in to verify that it performs as expected. But, until it is formally checked in, none of that is formally "blessed" by the organization.

"Wow! You've cured the common cold? Great! But I don't think the FDA is going to rely on these notes on the back of this napkin..."

I checkin binary images after they are built. Otherwise, how do the manufacturing folks know what to put into the FLASH? (if it isn't under VCS, then it isn't a referenceable entity!)

The keywords are resolved for each check-in/change.

When you move the version information into the command line, now you have to add "whatever" resolves *those* keywords (or constants!) to your VCS. "How did I build the binary that identifies itself as 4.203?" or "What were MAJOR and MINOR as referenced herein when this was built?"

Binary files require *different* handling. But, that's just a consequence of the VCS implementation! E.g., a header prepended to the file could carry this information -- which is stripped from the file *as* it is used (in much the same way that comments and source tokens are stripped from a file when it is compiled!).

As for "files which might happen to use the same syntax in the file" (i.e., "coincidentally" need to have the token "$Id$" appear somewhere in a file *as* the literal "$Id$" and NOT as a keyword to be expanded), you *avoid* those sorts of things!

I.e., you *know* never to put two question marks adjacent in a C program: printf("What are you, crazy?" "?" "?\n"); You can discipline yourself for that but *not* to avoid putting letters after dollar signs (for example)?

We have no problem worrying that the string "/bin/sh" in a "text" file might "accidentally" cause that file to be executable as a script! We simply insist that those *wanting* that behavior ensure that the FIRST LINE of the file has the special form: #!/bin/sh This frees the balance of any such file from that "complication".

That's a separate issue and can be controlled separately. Create a file called "versions.c" and link it into your build. PUT A KEYWORD IN THAT FILE SO YOU CAN TRACK WHICH REVISION OF THAT FILE YOU ARE USING IN ANY PARTICULAR BRANCH!

Each check-in, by definition, creates a unique version of the file. You might *call* them all "version 5" in layman's terms. But, there is really only *one* version from which the released "version 5" was created -- even if there are no *code* differences between ten consecutive "revisions" of a file (e.g.,\ maybe all you did was edit comments or whitespace).

If you checkout a schematic and then check it back in, it's a NEW schematic if it contains *any* changes. Even if you just changed the color of a signal. If that change isn't "significant" IN SOME WAY, then the VCS could DISCARD the change and you shouldn't complain!

You might still *call* this "version 3" of the schematic. But, it's *real* revision "number" has now changed.

I can replace the entire implementation of a product with a different one -- and still *call* it the same product. I can still call it the same *version* (in user terms)! But, as far as manufacturing and engineering are concerned, it might be an entirely different *product*!

(witness my USB hub comments a while back -- NOTHING visible to the consumer to differentiate one device from another -- despite vastly different implementations AND BEHAVIORS!)

And use a self-adhesive label to affix it to the file(s)? Like putting a circuit board in a *labeled* antistatic bag and hoping it never comes out??

As I said, I strongly disagree with this philosophy.

Rationalize the parallel to the guy in manufacturing that has just been given two stacks of boards -- neither of which carries version identification. Or, the engineer handed two copies of add.c -- perhaps even from different parts of the build tree!

"Gee, these two files look nothing like each other -- yet they are both called 'add.c'!"

"Yes, one adds ints, the other adds floats. The namespace resolves the difference between the two filenames -- unfortunately, that namespace isn't *tied* to the files themselves! You can't peek

*inside* the file to get that information. So once you've taken them out of their respective 'bags', its up to you to keep track of that information in some ad hoc manner.

Bobby renames each file as soon as he takes it out of its bag. You'll have to ask him how he comes up with the new names for the files without incurring other conflicts -- I think he uses cyrillic characters for the changes he makes! Tom tries to recreate portions of the original namespace (even if 99.99453% of it is empty!) to preserve that information. Find something that works for

*you*. We're pretty lax around here about 'little details'..." [I don't *think* so! :> But, hey, find a system that works for you and live with *its* consequences. There's no such thing as perfection when it comes to engineering activities! You trade this for that; I might opt to trade that for this!]

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, May 17, 2012 2:21 PM

Yes, I was looking for something that would let me do this without having the numbers quoted. And, the # operator (which must be in a macro to work, interestingly enough) works great.

Thanks for the reminder -- I should have remembered that. While I'm with the tide that doesn't want the number coming from the version control system, I'll be defining the MAJOR and MINOR macros within my version.cpp file, which will be under version control and hence keep everything duplicatable.

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, May 17, 2012 4:03 PM

Could this help:

---- clip clip ----

/* Expanded argument stringification macro pair */

#define STR(s) #s #define XSTR(s) STR(s)

/* Program identifier */

#define MAJOR 3 /* major version */ #define MINOR 14 /* minor version */

#define MANUFACTURE "Tim's Shop" #define DEVICENAME "XYZZY-1"

#define DEVVERSION XSTR(MAJOR) "." XSTR(MINOR) #define HEXVERSION ((MAJOR

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, May 17, 2012 6:52 PM

It costs you an extra level of "indirection" -- it only applies to macro "arguments" so has to be wrapped in one.

If you just want to concatenate two (or more) strings, then you can do that anywhere -- but, you need two *strings* to do so! :> (depending on how you are passing these "values" around, you can end up with all sorts of "extra" quoting (e.g., shell strips one level of quotes).

The token pasting operator (##) can let you do things that are a bit less constrained as it allows the whitespace that implicitly surrounds tokens to be elided -- "gluing" the tokens together (by contrast, explicit rules *cause* adjacent string tokens to be catenated).

E.g., in my test suites, I use ## to "dynamically" create the name of the function being tested based on arguments elsewhere in the script (#define testname test_ ## ROUTINE)

There are two different, orthogonal issues wrt VCS.

The first (discussed at length, elsewhere) is whether you "litter" the files with information that is generated and/or maintained by the VCS. I.e., does the file identify itself wholly -- or, does its versioning information require some

*other* context to provide that information. I prefer being able to take a file and transport it to another machine/medium and *still* know what version it was declared to be.

E.g., for at least the last decade (?) your MP3's have NOT had to carry file names of the form: - - - .mp3 ID3 tags let you embed that information within the actual

*file*, itself (along with lots of other cruft: genre, year, artwork, composer, etc.). In the absence of such a mechanism, you are stuck with cumbersome filenames (as above) and/or having to carry a *separate* document that provides these details to the user.

The same is true of digital photos (EXIF), etc. This is recognition of the fact that there is lots of metadata that belongs *to* a file that really best belongs *in* that file (instead of a in a separate mechanism/medium).

Since EVERYTHING that affects the resulting file needs to be reproducible (assuming you operate in a structured environment), then you want everything that goes into the release to appear *somewhere* as a version controlled document/object.

The other issue is how you convey "version information" to the end user. I call this the "vanity version" because it bears no relationship with the version(s) of the various files/modules that comprise the release.

For example, I could be using version 3.2.5blue of strlen() and 1.0red.1.1 of printf() together in version "6.1" of the *released* product. And, I could even *rerelease* this same set of binaries under the (vanity) version identifier "6.2" -- new and improved (ha!).

[I use the term "vanity version" as a parallel to "vanity addresses" where the *artificial* vanity address has nothing to do with the actual *physical* address. E.g., you could claim to live at "1 Tim Esplenade" while someone else tracks the *real* street address of the edifice.]

E.g., during development, my top level makefile emits a timestamp that gets *embedded* in the binary being built to reflect when it was created (I find this easier than tracking "build numbers"). This makes it easy for me to examine the binary to identify it. I can then keep it around (think: regression testing -- was this *future* discovered behavior present in this particular version? What about the version before it??) for future reference.

This has proven invaluable with the speech synthesizers! I can tag the audio files captured from the synthesizer's output so I can quickly review the differences in pronunciations without having to rebuild previous versions of the synthesizers; or, resynthesize particular phrases (for comparison), etc.

[N.B. The . scheme of identifying a release seems to be common. E.g., "4.1.2.0876" where the "0876" is somewhat redundant with the "4.1.2". I.e., there was no 4.1.1.0876... nor was there any .0876... nor will there be any .0876. I.e., the first half of the version designator (4.1.2) is treated independently of the second half (0876) -- the latter of which simply increments with each build (scripted in the make target).

I dislike this approach as it suggests that the vanity release is 4.1.2 -- yet, allows *another* version of

4.1.2 to be issued (4.1.2.0899?). Which of these is 4.1.2? *Neither*. One is 4.1.2.0876 and the other is 4.1.2.0899 so it is hazardous to let your customers *think* either of these are "4.1.2" :< ]

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, May 18, 2012 6:44 AM

Well, each to his own. It would be a boring world if everyone agreed :-)

If you think that is *imperative*, then I agree that keyword expansion from the source code management system is one possible step towards this. But I don't agree that it is the /only/ way, and I certainly don't agree that it is a useful idea, let alone an essential one.

Long, long ago, in a galaxy far, far away, "directories" were invented. If you want to be able to easily tell that the file you are working on is on the "experimental" branch, then make sure there is a directory called "experimental" in the file's path.

As some wise Noob so recently said, a bad analogy is like a leaky screwdriver.

I don't know about you, but I'm a professional. If I had a collection of screws of different sorts, I'd put them into a box for emergency or ad-hoc usage, and forget about them. It's not worth the cost of my time to sort them.

For things that /are/ worth money or effort, I keep them sorted. My development files are in a source code management system - I know what files are what, where they came from, what project they belong to, who wrote them, and so on.

Yes, I use a source code management system that lets me do that. I can use it to view logs, timestamps, branches, etc. And I can use it to compare files historically or between different versions.

/I/ have good hope of finding perfect matches quickly and easily, because /I/ make the changes to my source files. /You/ don't have a hope of getting perfect matches because you force your version control system to put extra changes into files that are exactly the same.

You misunderstand about versions.

The /developer/ knows about version numbers and other indicators of changes and developments. And the developer should put these into files at an appropriate level - perhaps a version number at the start of a file, perhaps a small changelog on each file, perhaps a "versions.txt" file in a directory, or whatever suits the project and the developer or development team.

A revision control system knows /nothing/ about version numbers. All it knows is a tracking indicator, such as a revision ID, a checksum, a timestamp, etc. That's not a version number - it just tracks when the file was last checked in to the system.

The revision control system's tracking indicators are /not/ good enough as a substitute for properly keeping track of versions and changes. And they do not add much of significance once you /do/ have good tracking of your versions of changes. They are, in fact, completely orthogonal, and are for a different purpose entirely.

That is a petty detail, and you know the difference.

Most people only ever use keyword expansion in comments.

That makes almost no sense whatsoever.

You check in files when you have reached a useful point to check in the files. That could be after writing a file, after fixing an error, after adding a feature. It could be the end of the day's work, or because another developer wants access to the file. It could be because you are tagging a release version. There are many reasons. Basically, you check in your code when you have something you might want to look back at some time in the future.

What you normally /don't/ do is write the code, then check it in before you have even bothered trying to build it. Different development philosophies have different rules as to when to check in, and whether or not the trunk should be kept for buildable code only, but you'd be foolish to clutter your VCS and its logs with lots of checkins that are of no use to anyone.

If you are using your VCS properly, then code that hasn't been checked in (other than the code you are currently working on, of course) doesn't exist. You don't keep copies of it lying around - that just leads to confusion.

That's correct - such builds are just temporary, for testing, bug checking, and other development. But that happens all the time - the fraction of builds that are ever kept is tiny.

I suspect you are using "build" in a different way from me. I "build" my projects continuously during development, but the builds are not important or preserved until they are "releases" that other people make use of. If by "build", you mean a complete released build that is delivered to others (either internally for testing, or externally to customers), then I am much more in agreement with you.

That sounds reasonable enough - though I call that a "release" (or "release candidate"), not just a "build".

However, I would have build the executable and tested it myself before checking it in as "tag version X".

In particular, it is absolutely essential that I get the same builds by doing a "make" during development, or by doing a clean check out of "trunk" and doing "make", or by "copy to tag X", then check out out of "tag X", then "make". There is usually some difference in the "elf" files and similar, because they track dates and directories, but the final "hex", "bin", "exe" files should be the same.

This is one of the main reasons why "keyword expansion" is bad, especially if it is in code rather than comments.

And it is of no benefit in comments - anyone who has access to the source code has access to the VCS system, and can get the same information directly from there.

I also check in the final binaries in the VCS. It is not for production

- handover from development to production involves much more than just saying "this tag/revision ID is the production version", and production should not have access to the source code. But it is very useful for testing or for working with others in development - not everyone has the same development tools installed.

You don't use the command line for building - you use makefiles, which are included in the VCS.

It is easier to avoid the whole issue by keeping your version information as version information, and your files as files.

No, create a file called "versions.txt" that has information you want. A file called "versions.c" (or "versions.h" according to preference) contains the part that is really relevant for the finished binary, such as version number or timestamp.

Each check-in creates a new revision number in the VCS system - that is the /only/ thing it does "by definition". It is not a new "version" of the source code. You might go through several "versions" of a project between checkins (though it's not advisable), and you will certainly go through many checkins between "versions".

If you want to track the versions of the program (and you /should/ do so), use a "versions.txt" or perhaps a comment block at the start of "main.c". If you want to track versions of particular files (and normally you do not want to do that), put comment blocks at the start of the file or use separate text files.

Again, you are being petty and you know it. If you are incapable of including information in an appropriate format in files beside your source code, or incapable of ensuring that these stay together during development and VCS tracking, then you really need to learn about some basic development methodologies.

And as for using a label on a bag for your circuit boards as identification - if your production and test departments are not confident of being able to keep the circuit board and label together during the production process, then your production department is incompetent.

Look, it's quite simple - you don't mix up two "add.c" files or two different boards, because you /never/ have just those two files or two boards.

For software, you have your version control system. Your world revolves around it. No file stands alone - "svn info add.c" (or equivalent) tells you /exactly/ which "add.c" you have.

This last sentiment I agree with.

mvh.,

David

- A
- Andrew Smallshaw
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, May 18, 2012 4:58 PM

Careful: the token pasting operator (##) works on tokens generally, not strings specifically, so it doesn't do the quoting # to stringify its parameters. It generally isn't necessary with strings anyway since consecutive string literals are concatenated together by the compiler anyway.

## is mostly used to make a general chunk of code specific by modifying identifier names. The example that comes to mind would be Unix device drivers, where traditionally the function names for each driver differ by a driver-specific prefix. For example code along the lines of:

#define STRATEGY DEV_PREFIX##_strategy

...

STRATEGY(param1, param2, ...);

Can be made to call the correct driver by changing a #define at compile time. If the prefix is "cgd" a -DDEV_PREFIX=cgd turns the above code into

cgd_strategy(param1, param2, ...);

--
Andrew Smallshaw
andrews@sdf.lonestar.org

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, May 19, 2012 9:07 PM

Hi David,

[much elided -- probably not enough! :> ]

Have you tried importing (checking out): ./a/very/long/series/of/directories/from/an/OS/that/supports/ \ truly/long/pathnames/myfile.foo into an OS that puts *tiny* limits on pathnames (e.g., UN*X vs. Windows)? Ever been screwed by trying to move that portion of your filesystem hierarchy to some other place -- like: ./my/playpen/this/project/this/version only to discover that the copy just *balks* in the process? And you want to *add* information to that pathname (e.g., putting a "/experimental" somewhere within) -- instead of putting that

*information* in the file? Ditto the "version" number, author, etc.? In windows, that may be as little as ~250 bytes (or, perhaps 250 wchar's?) -- while in most Eunices it's ~1000. "experimental" throws away a dozen of those (250) just to represent one or two bits of "state" (release, experimental, etc.)

What happens when some of that "information" is represented with characters that the "local" filesystem's namespace doesn't support? "oops! -- represent 'information' in a way that is representable in *any*/every filesystem namespace".

Ever been screwed by checking out: ./some/file/Name.c and ./some/file/name.c and wondering where *one* of them disappeared to? (because the local filesystem was case-insensitive but the hosting filesystem WASN'T?) "oops! -- use lowercase characters for all pathname components lest you risk conflict on some *other* filesystem"

The point of the analogy was to draw attention to the disproportionate amount of effort required to identify/differentiate between screws -- objects having very *few* differentiable features. I.e., if you chose to represent each "feature" that a screw could exhibit as a unique *letter* and encoded those features positionally into a *word*, you might end up with *a* 10 letter word.

A source file (or other controlled object) would *easily* have three orders of magnitude of information within! (i.e., 10K).

*If* the file hasn't been touched in any way, you could conceivably compare it to every checked in version of files having that name (regardless of where they reside in your filesystem -- do you know if this file was "experimental"? Do you know which version it may have been? Do you know if this add.c resides under the float, double, long, short, etc. directory hierarchy?).

OTOH, if the file *has* been touched, now you can't reliably tell from whence it came: is this 2.5 with a few additions? Or, 2.6 with a few *deletions*?? Which branch was it selected from? What *peers* will I likely want to chase down (i.e., it is invoking foo()... how do I know *which* foo is likely to mate with it: the RELEASE_4.0 version or the EXPERIMENTAL_5.1?)

But you only know that by noticing WHERE they reside in the file system hierarchy -- because you are "storing" that information in that namespace instead of in the file!

If you want to use version 4 of a file and version 5 of that same file to evolve the *new* version *6*, you have to open: /playpen/myname/myproject/release/ver4/os/HAL/drivers/uart.c /playpen/myname/myproject/release/ver4/os/HAL/drivers/uart.h /playpen/myname/myproject/release/ver5/os/HAL/drivers/uart.c /playpen/myname/myproject/release/ver5/os/HAL/drivers/uart.h and, from these, fabricate: /playpen/myname/myproject/experimental/ver6/os/HAL/drivers/uart.c /playpen/myname/myproject/experimental/ver6/os/HAL/drivers/uart.h While doing so, you'd constantly be looking up at the filename in the titlebar/etc. to sort out which file you are actively viewing/editing. (what if the titlebar only represents the last

20 characters of the file name due to space constraints?)

OTOH, if the information that you rely on as being present in the pathname is, instead, present in the *file*, itself, then you can *find* it, directly, in the file regardless of where the file is moved to.

What if the status of the version 5 uart.c was still "experimental" but version 5 of uart.h was "release"? Are you forcing them to reside in different parts of the filesystem *just* so you can keep "experimental" and "release" in their respective paths? Seems a bit draconian when the file, itself, can have "exp" in it's "$Id$" entry (and *reside* wherever you want it to!)

E.g., I routinely check out files from different branches and releases and toss them together into a single "working" directory. They get arbitrary names: x, y, z, etc. (sometimes a, s, d, f, ...). I can open them *all* without having to rummage around to different parts of the filesystem locating them. They represent a "work unit" to me while I am fabricating "newfile". At any time, I know the origins of the files I am consulting -- because it is written *in* the files themselves. Ditto for pertinent parts of the revision history, etc.

And, when I am *done*, I can commit "newfile" using whatever name is appropriate and whatever *status*/author/etc. I choose. I don't have to fabricate some arbitrary directory hierarchy to ensure it bears "experimental" status, etc.

And you *never* look at those files anywhere other than with that repository continually available to you? You keep the software installed on *every* machine that you use and replicate the repository to each of them?

Seems like a lot of overhead just to be able to compare two files!

If the files are *intended* to be the same, then they have no superfluous changes. If they are NOT intended to be the same (even if they happen to be) then they aren't treated as such!

Agreed. I call these "vanity versions" as they are, essentially, a marketing necessity.

I track "versions" with specific *tags* on specific *branches*. So, I can take a set of sources for "version X of product A" and incrementally modify them -- going through various "experimental" stages -- to "version Y of product A". I can also take "version X of product A" and fork a branch to become "version q of product B".

I can then tag new versions of that "product B" branch without regard for product A -- using entirely different designations (i.e., the "gemini" release of the "sandstone" branch)

Agreed. Where it MAKES NO IMPACT ON THE EXECUTABLE! I use a pretty printer on my sources to ensure they have a consistent visual style. As long as it doesn't alter the functionality of my code, I can apply it at will. Keywords in comments are the same issue.

I didn't say you *can't* compile it. Or test it. But, the compiled code is (must!) be discarded. It doesn't exist because the sources from which it was compiled do not (yet) officially exist! (unless you happened to JUST have checked them out).

So, compile, test, do *whatever* you want. When you *think* you are ready for a release, you make sure EVERYTHING is checked in -- including the makefiles, *tools*, etc.

*Then*, you check these things out and build the release binary. *If* you have been methodical in this, the released executable will behave identically to the "tested" one.

If you have *not* been fastidious, then something will break and you;ll have to figure out what *you* did wrong! We, for example, allow *nothing* to be done before the release compile. I.e., you can't "make test" before "saving" the previously built executable -- to minimize the risk that the "test" target might silently alter something that changes the behavior of the built executable (imagine someone opting NOT to 'make test' and the executable NOT being silently changed in that way!)

I checkin *frequently*. It lets me and others see where I've been, what I am working on and where I am headed. The same is true of the work of others. I'd sure be annoyed to discover that Bob has been making significant changes to the design of a piece of code and happens to beat me to checkin -- leaving me with a boatload of conflicts to resolve! People *try* to keep each other abreast of changes they are making (harder when we are in different geographical locations) but it is easy to get preoccupied with something and forget to tell your peers about your activities until its too expensive (for them!).

E.g., you think you are going to make 'a little tweak" to some innocuous piece of code -- only to discover that it escalates into something much more. So, your previous plans of notifying folks of the "little tweak" get delayed or forgotten as you get absorbed in the (unexpected!) problem in front of you. Similarly, you resist the *duty* to go back and fork a new branch for these big changes -- putting the onus of merging them back into the main branch on *you*, where it should belong -- since you keep wanting to believe it's "almost done".

Frequent commits let others see *where* you are working and the types of things you are doing -- and *why*: "Don, don't waste time on that! We're revamping the I/O subsystem so you'll have the hooks you need in a more consistent manner."

Do you *only* rely on modifications to a single source file at a time? Does it not rely on other modules to achieve its functionality? How do you keep track of where those all came from?

E.g., if I am writing a math library to support 40b ints, I definitely want to consult the 32b and 64b versions of those library modules (plural) to see which techniques are best suited to the 40b representation. Do I have to check them out fresh each day? What happens if I get pulled off to put out a fire for manufacturing? Will I *know* what versions those were? Will I know what I had *done* on the 40b version that I am creating? Will someone *else* be able to step into my shoes to take over that task if I am distracted for an extended period?

I keep almost every build as they are invaluable in comparing performance of different versions (call them "nonreleased revisions") of the software. Disk space is cheap (I have many TB of stuff archived). Rebuilding an executable from scratch is *expensive* (check out the tools, check out the sources, run the makefiles, etc.) and easily avoided if you simply treat all builds as "version controlled objects".

E.g., each time I built my backup speech synthesizer, I wanted to see how it fared on the various test cases that I had compiled. "Ah, version X2.7.4 handles silent p's much better than version X2.7.3!" Or: "Fricatives were more distinct in version X2.7.3 than they are in X2.7.4!"

And, as the synthesizer matured, I started "exposing" more test cases to it (early on, you focus on getting the *basics* working and don't consider more exotic test cases). When I found something that I was disappointed with, the *first* thing I did was (quickly) run the test case through *all* the previous (saved -- checked in!) versions of the *executable* to see if *past* performance may have been better than *current* performance -- if I had thought to try the test case "back then".

[Speech synthesis is hard to look at code and "imagine" what it will sound like -- especially for languages as jumbled as USEnglish.]

Yes. But that "for yourself" version doesn't formally exist! I.e., if you dropped dead, it wouldn't be recreatable. If the company is sold, it's not a verifiable *asset* --it's just cruft sitting on your workstation.

There is nothing that causes the "value" of a keyword to change after checkin. I.e., once "foo" is checked in, every check *out* yields the same file image. Each time I have added a keyword to (e.g.) CVS, I've had to make sure that they observed this rule. I.e., you can't have a "$CheckedOutDate$" keyword because that varies with each *access* to the repository. OTOH, "$Date$" tracks when it was checked *in* (which can "only" be done once for a given revision).

No. You can only get access to that information *if* you have access to the VCS! If you *need* that information, then you have to carry the VCS with you -- along with the repository.

I disagree. And, it seems modern thinking in other areas seems to share that belief.

You embed ID3 tags in MP3 files so the information pertinent to the "song" is present *with* the song. The lame approach of creating directories for each performer, then subdirectories for each "album" easily falls apart when you have anything other than a trivial "collection".

E.g., "Mozart as performed by the LSO". Does this belong under "\Mozart", then ".\LSO"? Or "\LSO" and then ".\Mozart"? Maybe all of it needs to be rooted under "\Classical"? What about "Mozart and Brahms performed by the LSO"? Under "\Mozart and Brahms"?

How do we deal with guest conductors? And guest *performers*? I.e., it quiclky becomes a mess when we try to force objects into a *structure* (i.e., filesystem) just to track *information* that is easier embedded *in* the object!

You embed EXIF tags in photographs so the information is available *in* the photograph. Try sorting your photos sometime: family, friends, pets, family&friends, family&pets, friends&pets, vacation, etc. You end up with objects littering the filespace just because you opted to use (sub)directories to convey information.

Or, create a ".info" that accompanies the file and hope it never gets separated from the file (gee, maybe *tie* it to the file so that it is INSEPARABLE!)

I'm trying to show that any scheme that doesn't embed this information *in* the object in question relies on keeping that object "with" something ELSE for that information. Whether it is a place in a filesystem, a ".info" file or an antistatic bag (I find the bag analogy easier to relate to since you can see how easy it is to misplace one or the other)

*You* probably work in a much more "vanilla" work environment than I. *I* might be called upon to edit a file on a client's server, remotely -- in whatever context *he* has chosen to represent the object. Or, might quickly copy a file onto a laptop (devoid of any tools other than a text editor) to begin fleshing out some changes to it. Or, sit on a real workstation with gobs of tools and resources available. And, it is possible that I will do all three of these things in a given day -- on the same file or different files.

Competence/incompetence are great for scapegoating.

*Fire* the incompetent ones! But, that's not getting product out the door today or even this *week*!

The surgeon that cuts off the wrong leg can be disciplined, censured, sued, etc. But, that doesn't do anything for the

*no*-legged patient (since the "right" leg will eventually also be cut off)!

I prefer systems that are more resilient to screwups. That's why we have makefiles, etc. We don't *rely* on people being competent and knowing/remembering dependencies, build options, etc.

So you force yourself to never put those two add's in the same directory (where it might be easy to edit/view them simultaneously)/ And, you always ensure that whenever you carry either of them to another machine, you preserve teh entire filesystem hierarchy above them (lest you "forget" which is which, which is "exp" vs. "rel", etc.)

Seems more likely that someone will forget this sort of detail than failing to notice the revision, status, date, author, etc. AT THE TOP OF THE FILE!

Yay! :>

If I was starting over today, I would opt for a different sort of VCS. I've used SCCS, RCS, VSS (ickickick), CVS, and Hg (of necessity). I toyed with SVN early on but found it didn't offer me much (to offset the cost of porting my **huge** repositories!). Perforce seems like it might be a good balance (concept, approach, implementation, feature set, etc.) but "for pay" means I would have to fight with "cheapskates" who seem to want everything for free :-/

So, CVS tends to manage most of my sources. It's reasonably efficient (esp now that its no longer "scripts"!) and has some features that I've grown to love (branch *and* tags; "watches", etc.). In fact, I have learned to do much of my development as different "users" -- creating a userID for each project -- and setting up watches as "forward reminders" of modules that a project *might* be interested in if/when I (later) choose to update them.

E.g., I might have "projectA" set a watch on a math library that it relies on for much of its performance. If "projectB" later makes some changes to that library, the next time I log in as "projectA" I will be notified (reminded) of those activities.

I'd also create a "directory" of all objects in the repository and any specific options used to access them (checkin/out/update/etc.). E.g., I have an "EndOfHeader" keyword that I use to disable (further) keyword substitutions (in that file, only). This allows me to make unconstrained use of keywords upstream from that keyword without concern for what might *follow*.

But, I don't want every file (think "legacy" code that existed before the keyword was created!) to respond to that keyword as it might NOT be intended (coincidental). recording *which* files/objects should recognize it and which *shouldn't* would be nice if it was automated.

Similarly, to be able to script checkin and check out actions so I can automatically strip off headers that I have added to files, etc. (for file formats that place overly restrictive constraints on their contents)

But, regardless, I want files to be self-contained objects without reliance on external mechanisms to provide the information I want *about* the file.

(I can do some of these things in CVS but its clumsy so it is likely to be forgotten or poorly enforced).

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, May 20, 2012 11:11 AM

And have /you/ ever tried thinking about what makes sense, instead of trying to create imaginary problems? You are inventing issues that the rest of the world dealt with fine 20 years ago.

For pity's sake, please learn to use a VCS. That's what they do - they keep track of files, their histories, their branches. You access that information with simple commands like "svn log". Learn to use a filesystem with directories and hierarchies.

And please learn that source code files are /not/ stand-alone files. They are /always/ part of a system of files. If you keep losing track of your files, dropping them on the floor, copying them randomly around your filesystem, then you have such big problems with your organisation that no amount of technical measures (including "keyword expansion") can help you.

Modern thinking is that mp3 files are music files, and C files are part of the source code for a program. The fact that they are both files is irrelevant. Your comparison is as sensible as suggesting that since bicycles and bananas both begin with "b", then bicycles should have sell-by dates and bananas should come with manuals.

mp3 files have tags because that is part of the file - they are not just sound recordings, but contain other information as well.

A C file that is part of a project is useless as a file on its own.

Occasionally you will have C files that /are/ useful on their own, or get moved around on their own - in which case they need extra information in comments at the start, such as copyright information and version or history information. You don't want such stuff in normal project files, as it gets in the way of finding the actual source code. And taking that directly from your version control system is a silly way to get that information - it is such a small proportion of the information needed, and most of it is utterly irrelevant to people receiving the file.

Why on earth would you think that that working methodology would suggest "VCS keyword expansion" as a useful technique? When you have working environments like that, it can make a great deal of sense to put some version or historic information in comments at the start of your source files - but you do it /manually/, giving the information that /you/ want to put there. What you describe is the worst possible situation for trying to do things automatically using VCS tools.

I live in Norway - we /teach/ people that are doing things wrong, rather than firing them.

There are two sides to this - first, make a system that avoids or prevents screwups. Secondly, make it deal well when there /are/ screwups.

makefiles are an essential part of the build system, and therefore an essential part of the source code for a project.

Most filesystems I have used over the years have had trouble with two files of the same name in the same directory. So no, I don't put the two "add.c" files in the same directory.

And if there are two files in two different directories that I want to compare, view, or edit, I really don't find it very difficult.

I carry them to a different machine by doing "svn checkin" on one machine, and "svn update" or "svn checkout" on the other machine. This is one of the main reasons for using a VCS!

I am normally connected to our subversion server. If your working environment doesn't allow that, then a decentralised VCS (such as mercurial) may be a better choice.

I can understand not wanting to move or change existing systems. But SVN is, I think, the top contender for a centralised VCS system today - lots of features, easy to use, with good cross-platform tools. There is no reason to use CVS instead of SVN, except when you have an old CVS repository that you don't want to migrate (there are tools for automating that migration).

Distributed version control systems are getting more popular, with Hg probably being the best general-purpose choice (git is more suitable for some types of project, but has a steeper learning curve and is highly Linux-oriented).