books for embedded software development

Alessandro Basili · 2011-12-12T20:01:12+00:00

Hi everyone,I just started to (re)design the software for an embedded application ona very old DSP (ADSP21020 32bit floating point) and I was trying to lookfor some good books on embedded software development since I wanted tostart it right from the beginning rather than chase it later on.The application is a Star Tracker which is using a CCD aiming at starswhile the DSP processes and compresses the images for transmission toground.Thanks,Al-- A: Because it fouls the order in which people normally read text.Q: Why is top-posting such a bad thing?A: Top-posting.Q: What is the most annoying thing on usenet and in e-mail?

A

Arlet Ottens 14 years ago

Typically, the BSS segment is cleared in a loop, so there's no code space cost for clearing a few extra variables, only a bit of runtime. Usually that runtime is small compared to time required for other startup events (like waiting for oscillator stability)

A compiler that warns for unintialized static variables is broken.

A decent compiler will issue a warning for unitialized auto variables.

Just curious: do you also initialize your initialized variables during code execution ?

Vote

D

Don Y 14 years ago

I.e., the device expects to see "AC".

I think he military have some systems where they have what amounts to a bandpass filter so the *right* frequency (of pulses, in your case) has to be generated.

E.g., if everything crashes *except* the idle task, the pulses willcome at too high of a frequency. If the idle task isn't invoked "nominally" they come at too *low* of a frequency.

Hmmm... I've always just had the watchdog task toggle the state of an output.

I wonder how much more/less robust it would be to have *two* INDEPENDANT (!!) tasks running. One (unconditionally) *sets* the output while the other clears it. The premise being that if the system is functioning properly, both tasks would INDEPENDENTLY run at roughly the same frequency. So, here would be roughly as many 'sets' as 'clears'.

If the conditions for either task were violated, then that task would become less effective and the effective frequency would drop.

You'd probably have to target 2X the mandatory update frequency in order to ensure pathological conditions didn't cause you to miss "toggles". (since even *one* missed set/clear could result in the watchdog firing)

Coming up with independent criteria for the two tasks would be the challenging part!

Vote

D

Don Y 14 years ago

It's still a "wasted" (Hans-Bernhard's word) effort if the next reference to that variable in the body of the code is not a *read*!

Every variable is explicitly (re)initialized prior to use. Almost every variable is effectively an auto variable -- since they sit in "task contexts". (We already know I'm a big fan of dynamic memory allocation :> ) If I want to start or restart a task, I have to ensure that ALL of its state is as it is expected to be. The alternative is to have a process loader that does crt0.s for each independant task (and another that does it for the "system" as a whole). I can't read a variable until *I* have written it.

What happens if you jump to main()? What happens if you jump to the "main" for this task/process? (i.e., crt0.s isn't reexecuted unless you "reboot/reset" our device)

Think about the products and applications that you've had to mysteriously "fix" by resetting or restarting them...

Vote

H

Hans-Bernhard Bröker 14 years ago

And people were supposed do deduce that information from ... what?

If you're going to jump out of context like that, you could at least say so.

Seems like just about the entire world of C programming disagrees with you on that.

Aha, so you force initialization by disabling intialization. Interesting logic.

No. I do that with a plan of getting a stack consumption estimate out of it later.

No heap usage in embedded. You know that.

Aren't different from other variables.

You're really trying hard to be difficult, aren't you? If zero is the wrong initial value, then the variable will of course be defined with the correct one as its explicit initializer, instead.

It's not the amount of memory, it's the number of variables that counts. Your method has to initialize every variable on its own. And of course, gazillions can be a remarkably small number.

Hmmm... if they're just running on and on, why do they even need re-initialization of anything to begin with? How does an application that runs "forever" manage to de-initialize itself like that?

The problem is: it'll be everywhere. I.e. you've replace the concentrated, small initialization routine in the startup with the aforementioned gazillions of individual initialization code fragments all over the place.

No, I don't. Because that situation can only happen if I either broke my C compiler deliberately, or that "application" code was using uninitialized variables before it ever reached "otherstuff".

At least a factor of 10 more in any but the most trivial programs. The startup code doing this often fits on a single screenful of assembly code. You replace that by individual code fragments for handling every single variable, spread all over the place. That'll easily be ten screenfuls, combined. Probably over a hundred.

Now you ignore that the linker does the same type of optimization for non-zero initial values, too, if you let it. Since I'm pretty sure you know that, I have to assume you're again being difficult just for the sake of it.

Pretty much all of it. All that's left is an equivalent of

memset(start_of_bss, 0, size_of_bss); memcpy(start_of_data, start_of_data_init_in_ROM, size_of_data);

in the startup code. Or, if your machine is one those blessed with enough memory to make such tricks worthwile:

uncompress(start_of_statics, start_of_compressed_image);

Easy: _all_ such warnings are show-stoppers. People showing up in review without having linted out such warnings will get sent up to their rooms without desert.

By paying attention to the loud protests of lint and the compiler I would get otherwise.

All cases where that would be a problem refuse to compile or link after such a rough transformation anyway. Think about it: if that variable was automatic before, there's no change to worry about. If it was of static duration, it either was fully global (so somebody must now be missing it), or it was already marked "static" --- so no again change in inialization state.

Those variables the startup clears are _not_ unitialized. They're just not initialized /explicitly/ at their point of definition --- so they're initialized implicitly, to zero.

Only automatic variables can ever be unitialized. And heap objects usually are unitialized, but they're not exactly variables.

Vote

L

Les Cargill 14 years ago

Interesting. Thanks, Robert. I knew some did and some didn't; wasn't aware that it was a C89 or C99 requirement.

-- Les Cargill

Vote

R

Rich Webb 14 years ago

A windowed watchdog. Some micros have one natively, e.g., the STM32F10x CM3 processors from ST. Like a conventional watchdog it needs to be kicked before it times out but it will also cause a reset if the kick occurs too soon.

I'm sure others have this feature but I've been working with these recently. Nice chips.

Rich Webb Norfolk, VA

Vote

P

Paul E. Bennett 14 years ago

Interesting idea. One thing to watch out for though would be the possibility that each routine was trying to access the output at the same time (or nearly). This might result in a very short pulse which the Pulse Maintained Relay would not recognise.

One task doing a walking memory test sets the flag.

The other task checking the programme CRC, Input State Complements, Incorrect Procedure Flags etc. to reset the flag.

Just one suggestion. YMMV.

******************************************************************** Paul E. Bennett............... Forth based HIDECS Consultancy Mob: +44 (0)7811-639972 Tel: +44 (0)1235-510979 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

A

Arlet Ottens 14 years ago

So far, none of my projects had tasks that could be restarted this way. I start the tasks at boot time, and leave them running forever. Or, I use Linux and let the kernel/crt0.s take care of the memory.

I can't recall any that would be fixed by explicit variable initialization in the code.

Vote

A

Alessandro Basili 14 years ago

why you are so confident that if ld21k was created there *must* be an "ar21k" as well? Whoever started the effort of porting the binutils may have very well gotten tired in between and left the rest of the work to someone else. I personally have been looking for this since three months now and haven't found anything. The only - broken - source that I found does not contain ar. Here's the - not maintained - source:

formatting link

It took me sometime to make it compile and I still didn't get rid of all the warnings (even though most of them are understood and I only need some time to fix them).

I would certainly use it if I could find it. Haven't so far.

Vote

A

Alessandro Basili 14 years ago

On 12/17/2011 10:13 AM, Steve B wrote: [...]

Indeed an artificial defocusing is often - and in our case also - introduced. This allows for a better chance to estimate the sigma of the image, hence providing a better precision in the pointing information. It's called "hyperacuity technique" and is used to enhance the centroiding algorithm and obtain a resolution below the dimension of your pixel (bin).

This is just a reference you can look up:

Vote

A

Alessandro Basili 14 years ago

Apologize for the delay, it took me some time to digest whole that meat ;-)

That's correct. The bootstrap resides in the protected area of the FLASH and performs a majority check over three identical copies of the "loader" while copying it to RAM. Should one of the three copies fail to match the other the content would be copied from one of the other two copies. This operation is performed bit-wise.

If we name p1, p2, p3 the element of each copy the majority element will be:

pmaj = ((p1 xor p2) and (p3)) or ((not (p1 xor p2)) and (p1))

At the end of the majority check the flow will jump to the beginning of the memory location where the loaded has been copied. The loader copies resides in the protected area of the flash. The protection is performed with a jumper removal (performed prior to launch

- indeed prior to assembly of the flight board within the flight crate).

(Or, is loader also executed from ROM?) And, once

once the loader is running we can execute a 'start main' which actually copies (after a checksum check) the 'main program' from the re-writable section of the Flash into the PM and then jumps to the beginning of the memory location designated for the 'main' application.

The memory location where the loader was residing is left as is, in theory one can 'jump back' to the loader but I don't know to which benefit. So I believe the memory can be released and reused.

I currently don't quite understand the reason behind. Maybe it is worth knowing that.

Uhm, I have not systematically reproduced this issue therefore will be difficult for me at this time answer this question. What I can tell is that from a power cycle of the unit the loader is loaded correctly and is waiting for commands and is sending replies as expected.

Given the structureless of the software as is now it's hard to say if the stack has been corrupted.

Unfortunately - or maybe fortunately - the loader cannot be modified since is residing in the protected area of the FLASH. I am not sure how harmful can be the loader's resident code in the PM when control is passed to a different memory segment. It is true though that an uncontrolled jump may end up in a part of the PM which was not intended to be ran and can do effectively random things. Do you think the 'main' program should take care about removing the loader from the PM segment? How?

This was my first idea as well, but as I realized that I may be loosing interrupts I started to wonder about a different mechanism, trying to get under control the WCET of each function in order to meet timing, even though how can I rely on my WCET analysis?

Which probes the question: why the hardware didn't have a FIFO in the first place? And the answer is straight forward: no specs covering the subject. :-(

In addition to that, there are two serial ports, but only one active at any time. Now the hardware engineer was a very creative one and decided to automatically swap from one to the other simply detecting the transition. Of course the transmitter also will follow the same logic, so a glitch on the secondary port will prompt the receiver to change from serial port 0 to serial port 1 and the transmitter as well, where on serial port 1 nobody is listening, being configured for communicating over port 0 (too much freedom may be as dysfunctional as many other lesser free social models!).

The problem here is that I can detect I have a FIFO overrun but then how to continue? Indeed I believe the protocol should have a format such that the software knows when the message has started (message header) and how long should that be, otherwise the detection of the override is not sufficient. Even with this scheme a multiple override may get the algorithm to search for the next message a little trickier, since a priori there's no way to find out how many bytes were lost.

Since on the serial port can only travel a message from the master, which is eventually triggering a function (or many) to execute, I believe that 'event' is a higher level object which is the outcome of the processing of the message.

Since the serial port is the communication link with the master(s), any message should be served, otherwise there's a potential risk to loose control over the unit. [Assume a 'write memory' operation is ongoing and one of the bytes may get lost because the ISR is not in the correct state - assume that memory location is in the flash memory and the content is the image of my new version of the code...]

I had a similar idea in mind. The idea of having a single output function is very appealing. I thought about the idea of marking 'messages' to be transmitted, so every functionality has it's own message type where data has been processed. Once the process is finished the message is marked for transmission. A monitor function checks for all the message types and see which one has to be transmitted, taking care of adding it to the FIFO.

I believe in this way the message structure can be flexible enough to accommodate several message types, but it provides a uniform interface for data to be transmitted. This should also simplify the decoding efforts on the ground to analyze the output data.

Uhm, here I think I lost you on the PopState. By definition of FSM the state of the program (i.e. all the control variables) is only function of the FSM state where it is currently in. If you now involve the possibility to have an FSM stack, it means that you can move from one state to any other state, without any conditional check. I'm trying to picture it in my mind but I am miserably failing, unless I think of a graph with an arch from every node to any other node, but then how would you distinguish a transition based on an 'event' and one using the popstate?

On the contrary I liked the idea of the 'lookup table', I am a *huge* fan of LUT, if correctly implemented they scale incredibly well (at least in my experience).

The idea of not having the function to change the state, but letting the FSM loop over the list of events and dispatching events according to the table is quite new to me but I think I followed along and kind of liked the idea. It seems to me an 'event driven' fsm, instead of what I used to see as a 'flow driven' fsm - a sequence of states which may have branches and loops according to conditions.

If I got it right, the 'next state' does not depend on the 'current state', but just on the 'next event'. The FSM will go to the processing function to handle the event any time the event shows up. But how do you guarantee there's no 'starvation'? An event GENERATOR may never get the chance to push its event in the ThisHoldsTheEventsForTheFSM variable, effectively being ignored. Maybe I missed something.

Do you by chance have any example of the structure you just mentioned?

More than skinny it was messy! Every command was treated with no common interface. Every command has a length in the beginning and a crc at the end, a type and few parameters according to the type. Length, crc and type could have been checked in a single point while having the parameters being processed as function of the type. On the contrary every function related to a type was checking length and crc on its own.

That's correct. Of course there's no documentation available to understand what the control should be like and what are the functions are available in the FPGA, therefore we are reading the vhdl code to extract that information... painful!

Let me rephrase, there's no need for pacing, since both transmitter and receiver are set both at 19200 baud rate. To be more blunt I have no idea why they needed handshaking here :-/

The uart is set at 19200 baud rate, while the 256 byte limit comes from the format of the command/reply, which has a byte for the length. Since there cannot be a 'reply' without a 'command' (it's a handshake), that scheme effectively limits the amount of transmitted data to 256bytes per command, hence relying on the capability of the master to send commands.

Indeed for other reason the master is sitting on a can bus and is a slave of a higher level computer which is the issuing commands and performing lots of other tasks, so there are lots of other factors which eventually reduce the bandwidth on the serial port.

The scheme I have in mind does not have any handshake in place, whenever the data from the camera is ready it will be sent to the serial port. On the master side there will be somebody emptying the serial port fifo, to avoid loss of data. The 4 KB fifo on the master side gives some leeway to retrieve it.

This mechanism is effectively used on the GPS we have onboard, which is continuously sending data and the control is only changing configuration to allow several types of data to flow.

That is correct. Well I try again, but it's just for the sake of clarity. Our instrument is not pointing, it is a solid state telescope that looks at the sky. Given the fact that it can reconstruct the direction of an incoming photon relative to its azimuth, it is needed a star tracker to understand what was the absolute pointing direction at the time the photon was detected.

To do that we shoot pictures at the sky, compress them and send them down to the ground with all the rest of the data. On ground there is an algorithm that looks at the pictures and tries to calculate the absolute position based on a stellar database. The compressing algorithm is pretty 'simple':

- find the coordinates of the pixel above a threshold

- find the prime neighbors

- reconstruct a cluster of neighbors (mean and sigma)

- select only the N brightest clusters.

Therefore N parameter here is crucial because it affects the compression factor.

Since the algorithm on ground used for pointing reconstruction works better with a higher N, there's a tradeoff. Limiting the bandwidth with a poorly sized format seems to me shortsighted and may effect the quality of the results (N smaller => worse pointing reconstruction).

With a little effort I can assign an ID to each function and have a list of standard logs (started/stopped/running/timestamp) which can be organized in a lookup table.

I personally don't like too much to spend time in a verbose description of the process since most of the time the description is either wrong (which is bad) or misleading (which is even worse). Since I have to search for the string in the code to understand where I am, then I better search for a number rather than a poorly structured comment.

Actually a very similar thing to what you suggested is used by one of our main developers for the core computer of the experiment, I respect the choice, but don't quite share the output format :-)

This is what I call built-in testing. Again, coming from the hardware, to me it's a very essential tool to build in the application in order to enable/disable at runtime without the need of any special device (jtag/debugger) or any special compilation flag (DEBUG).

I like to have in the code the same functions that I run when I test it, with the possibility to turn them on/off with very little efforts. We have a library for our ground control software which is exactly providing this capability, we can simply enable the printf of all our telecommands while we execute them and have the possibility to hold them for execution in case they do not match criteria.

People often do non-systematic backup of their software/hardware/work, but very few are using the backup copy when they continue, until the day comes and not only the work is lost, but the backup copy is unusable since the an option was forgotten!

I understand your point though, bad specs are worse than ever, I would at least try to make them for myself, but I would be speaking in the wilderness if I share this in my group. We - unfortunately - do not work that way :-/

We have a non written rule here:

- ALL CAPS : defines

- all small : the rest

- leading underscore: typedefs

On top of it, program names are graphical if name is ALL CAPS, textual user interface if all small. By all means we avoid to mix cases.

[...]

Do you have any example of such a framework? I cannot think of anything except an RTOS, which would be in my mind overly complicated in such a project. On top of it a great deal depends on the support you may receive w.r.t. the specific architecture used.

The core computer of the experiment is a PowerPC750 which is supported by a variety of RTOS and indeed it has been decided to use eCos on it, on the contrary IMO the ADSP21020 is a dinosaur which was never such a primer to prompt for great support.

I would be more than happy to even evaluate this possibility but to be honest I wouldn't even know where to start from.

[...]

Indeed. Well, I have a suspicious that the configuration performed by the loader is still in place and the one the main program has performed is sitting on a preexisting one, therefore adding confusion. I would first need to make sure the main is in charge for the configuration of the ISR with no leftovers.

Correct. Likely the hardware design allows that and the FPGA sequencer can run with a dedicated memory buffer to store CCD data, while the previous image is being processing. The DSP receives an interrupt or reads a register to check when the picture is ready, at that point it can switch the image buffer for the next picture while the memory buffer is being used by the sequencer.

That is why I am not such a fan of dynamic memory allocation just for that reason. I allocate what I need in the beginning and try to avoid any dynamic allocation along the way. That is another reason why if I had to allocate memory I would never use malloc type functions directly, but my_own_alloc which would take care also about tracing.

I used to program fuse-logic FPGA, one shot, 100 bucks. The development process was such that I would never program a chip without running all the test cases in a back-annotated simulation (~12 hours run). I am pretty sure the level of attention I paid to my vhdl was 100 times higher than the one I have when compiling (it virtually takes no time and the 'temptation' to try it out is high).

There's a good section in the "Mythical Man Month" that talks about the volatility of the media on which the software has to run.

Vote

U

upsidedown 14 years ago

The C-standard requires default initialization of static data to 0.

Doing explicit static data initialization to zero is directly harmful in virtual memory systems.

In a virtual memory system the initial "clearing" is usually implemented by the OS as Demand Zero Pages (or whatever your OS might call it). These pages appear in the virtual address map, but neither in the physical memory or paging file.

A read reference within this page will actually create the page in physical memory. As long as this page remains 'clean', it can also deleted from physical memory, without storing it to the page file, since it can be recreated at will, whenever there is a reference to this page.

However, if you explicit initialize your static pages at startup, all those pages needs to be loaded into physical memory at once (potentially throwing out rarely used pages from other tasks),

Worst of all, your explicit writing anything (including zero) into your created page, will make it 'dirty', potentially needing backup store from the page file.

Thus, explicit initialization to zero is not required in embedded systems thanks to the C-standard and in virtual memory systems, it would be directly harmful.

Vote

A

Alessandro Basili 14 years ago

I can revert the question here: will the watchdog reset my hardware with

100% certainty? I don't see much of a difference in terms of complexity between the two. Actually an accidental bite from the dog worries me more than an accidental reset command (which could be protected, recovered)

I understand your point. The idea is to minimize the time the application can go crazy and harm. But I can follow if and only if a non running application does not pose any hazard either.

In our case the idea is to take pictures. Now the main programs is stuck and we are not taking picture anymore, then the watchdog kicks in and we end up in a mode of limited functionality, where we don't take picture either.

If this star tracker was - but likely is not - used for attitude control, I bet the system would have been designed in a totally different way.

We have actually both mechanisms on board. I personally prefer the former - man in the loop - approach and try to design everything accordingly. If a piece of equipment or hardware is in danger I would never call the software to take an action (I have never seen a software interlocks working - but I may have a limited experience), rather design the hardware to failover or orderly handle the anomaly.

Moreover the "insane" program may still be partially working, in which case I would like to still communicate with it rather than having the watchdog give it a bite.

Sure, pointless.

Now I recall a subsystem in our experiment which would have gracefully turned off itself if a watchdog kicked in. Unfortunately the subsystem never made it to fly :-)

Vote

A

Alessandro Basili 14 years ago

Even though I may agree with you that breaking large modules into small ones may improve reliability, I cannot backup my opinion with anything else than some personal experience and observations. It would be interesting to see what kind of proof can be give the grounds to this approach.

Could you specify what does it mean "real application size"? Is there somehow a threshold for an application to be "real"? Shouldn't the programming paradigm you choose be part of your design since the very beginning?

When in most state machine based application system timing starts to become a problem? I presume there was a moment the application timing was not a problem and as the program "grew up" the problem started appearing. Don't you think this is a scalability problem, rather than a paradigm problem?

Surely if the design was to be scalable but was not designed to be scalable than it's no surprise the timing falls apart. On the contrary if the design is scalable (of course to certain limits) then the timing should not be a problem.

Vote

D

David Brown 14 years ago

Explicit initialising to zero is also harmful (to code size and speed) in embedded systems. As Hans-Berhard pointed out, initial clearing on startup is much more efficient than explicit initialisation.

Vote

D

David Brown 14 years ago

Signedness of char is not normally an issue when storing characters - but it /is/ an issue when using them as 8-bit (or more, if you have a painful DSP architecture with 16-bit or 32-bit char's) numbers.

The only way to stay sane, and write clear code that will work, is to make no assumptions about the signedness of char on your particular target/compiler/compiler-option combination. "char" is a type for holding characters. Small integers are of types "uint8_t" and "int8_t".

Vote

M

Marco 14 years ago

hat

The Software Development community could use some standard terminology. W= alter uses the term component to decompose a "module" where I have typicall= y used the term "component" to be a cluster of related modules. I tend to e= quate "module" approximately equals "class" from the OO and UML world.

Not saying I am right and Walter is wrong just saying that a more standard = terminology would help discussions.

Vote

H

Hans-Bernhard Bröker 14 years ago

Because they're built from the same source package, in a single go.

Can't happen. Either the binutils build was configured to support a format, or it wasn't. There's no such thing as ld knowing about a format, but not ar. They both use the same bfd library to handle object file formats.

Vote

W

Walter Banks 14 years ago

terminology would help discussions.

I fundamentally agree Marco's comment. This has been a long standing problem in computer science. I used to laugh 30 years ago about conferences that the participants went to the conference exchanged vocabulary and then had a conversation.

Walter..

Vote

W

Walter Banks 14 years ago

There is an old online book on fuzzy logic on Byte Craft's website

formatting link

It has a small chapter (pp13-17) on software reliability that details the basic principles and the effects

I am out of the office at the moment and need to get some copies of some old papers to post to Byte Craft's website. Essentially what is involved is when larger modules are broken down into well isolated parts the over all reliability goes up for two reasons.

1) The number of series terms are lower because each new smaller module potentially is associated with only those program parts that actually use it. The gain comes from isolating the individual parts of the program from all code except those parts that directly affect the current outcome. One of the series terms that is missed in most real time systems is the impact of execution timing. 2) The reliability of the individual component is higher than its bigger original module. In most cases impressively more reliable. There is a simple exercise in the online book quoted above about splitting a module with a reliability of 1 into two equal parts creating two new but overall functionally equivalent independent modules with a combined system reliability of 4. What happens in reality is that most of this gain in system reliability is actually gained in practice.

Regards,

Walter Banks Byte Craft Limited

Vote

books for embedded software development

Join the Discussion

Didn't find your answer?