This is about code that clings to "embedded" by it's fingernails -- it's running on a fast PC-compatible single-board computer, under Windows, as a DLL. So it's not exactly some little thing shoehorned into 4kB of flash.
At any rate:
I have a rather complicated algorithm that I've coded up, to do marvelous stuff for my customer. It recently grew quite a bit, and in the process I've introduced some subtle bugs. I'm looking for ideas on things to look for to see if I can figure out what's going on.
Here's the deal:
First, some time this spring I got a shiny new machine, and went ahead and loaded 64-bit Linux onto it, with all its 64-bit appurtenances. This did not, at the time, cause problems.
I coded up a bunch of changes, tested it on my 64-bit machine, and happily shipped it off to my customer -- who reported that it broke, horribly.
Oh drat. On top of this, at some point the MinGW stream library broke, so my test code no longer worked under Wine -- I could only test with the Linux version.
After much trial and tribulation, I managed to get Linux 32 and 64-bit versions, and Windows 32-bit versions all working. I tracked down my problems (size_t and unsigned int are not the same size in gcc 64 bit for Linux), fixed them, and shipped.
So now I'm getting four different results from three different software loads and two different circumstances. I can't go into detail, but I'm going to give a general story 'cause I'm looking for general things to look for:
Under Linux 32-bit I get behavior A (correct operation)
Under Linux 64-bit I get behavior B (correct operation, just different)
Under Wine running a 32-bit Windows program I get behavior B
My customer calls my DLL from Labview. Nine times out of ten he gets some correct behavior -- he's not sophisticated enough that I can know whether it's A, B or something else. The tenth time the thing fails to work correctly.
So, I suspect that I've got some uninitialized memory someplace. But, I'm running the Linux versions under Valgrind and it's not finding any problems (Valgrind is great, by the way -- great enough that for my embedded ARM stuff I do unit testing under Linux and Valgrind).
I'm going through the code with a fine-toothed comb, and so far I've only found a few very minor problems that border on the stylistic, although one of the changes that I made did improve things a bit.
So -- other than picking through the code line by line, can you guys suggest anything that I can do or look for in specific?
Also, does anyone know of a Linux tool that'll randomly populate the heap with junk then call a program? I suspect that I'm not seeing the "sometimes it is, sometimes not" behavior that my customer is because of the different environment, not because Linux is magically fixing my bugs. Suggestions on how to make the bugs apparent would be helpful.
Thanks for reading, suggestions welcome -- I'm becoming a candidate for a rubber room over this one.