Debugging RTOS problems in embedded systems

ssubbarayan · 2007-07-02T05:41:57+00:00

Dear All,Most of us involved with RTOS projects in embedded systems need toencounter at one point or other problems pertaining to real timesituations:1)DeadLocks2)Racing conditions3)Problems related to re-entrancy.4)Problems related with priority settingWhat are the general debugging techniques used to uncover problemsrelated to above mentioned?I have earlier worked with windriverTornado and Vxworks,and I got a tool called Windview which will showgraphicaly the status of tasks as well as some analysis on timing.Onecan get an idea,if not exact,on what would have gone wrong in it.But some of the OS are propreitory and further custom OS provided bychip vendors(as is the case in my current product).These OS dont haveany tools like windview for debugging.I would like to know what can bedone to perform debugging?Clearly adding printf wont help in suchsituations.I would also like to learn building tools on my own to debug suchsituations.Can of you provide your suggestions or point me to a link ?Looking farward for your replies and advanced thanks for the same,Regards,s.subbarayanP.S:I have crossposted the same query in to attract moreinputs.

H

Hans-Bernhard Bröker 19 years ago

In other words, you say you solved the problem by handing it over to somebody else.

Vote

G

Grant Edwards 19 years ago

"You're very clever, young man, very clever," said the old lady. "But it's turtles all the way down!"

formatting link

Grant Edwards grante Yow! I represent a at sardine!! visi.com

Vote

G

Grant Edwards 19 years ago

Event objects/queues? Messages? Signals? Can you say "shared resources"? Very good! I knew you could.

They're _all_still_shared_resources_.

Grant Edwards grante Yow! Just imagine you're at entering a state-of-the-art visi.com CAR WASH!!

Vote

B

Boudewijn Dijkstra 19 years ago

Then perhaps I didn't understand completely. I thought you meant to say that your scheduler didn't behave as you expected intuitively.

Perhaps a better term in lieu of my 'to share' is 'to share directly'. A task can use a resource without knowing or caring how many others are using it, and therefore without the task knowing whether it is 'sharing' the resource. This is the case when the resource is encapsulated by a service task with a thread-safe communication interface.

You can become water from a tap without sharing the tap directly.

Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/

Vote

B

Boudewijn Dijkstra 19 years ago

And that somebody has to do it right only once. And then everybody can taste the fruit. What's wrong with that?

Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/

Vote

S

speedplane 19 years ago

Wow Vladmir... thats a pretty bold statement. Maybe if you're writing software for something trivial all you need is an LED switch, but if its anything more than the most trivial project you're wasting your time looking at LEDs. When you have dozens of different processes communicating different messages over different buses with different protocols toggling an LED just won't do. Take a look at Green Hills Software's PathAnalyzer. It takes windview to a higher level. Instead of displaying just system events and context switches it can display every function executed. You can clearly see when a task comes in and executes unsafe code.

Vote

S

speedplane 19 years ago

There are three ways to threads/applications can share data. Messages (like you mention), shared memory (which you forbid), and transactions (only really exist in databases). Both messages and shared memory can easily suffer from deadlocks and race conditions... although shared memory makes it easier to screw up. Transactions are the way of the future... unfortunately they are the distant future.

Vote

C

CBFalconer 19 years ago

... snip ...

In other words you are withdrawing your earlier inaccurate statements, I presume. It would be simpler to just admit it. I will admit a possibility of language difficulties.

cbfalconer at maineline dot net

Vote

G

Grant Edwards 19 years ago

There's nothing wrong with that. It's actually rather a good idea to used tested adn proven libraries to do resource sharing.

Just don't pretend it's not resource sharing.

Grant Edwards grante Yow! And then we could sit at on the hoods of cars at visi.com stop lights!

Vote

4

42Bastian Schick 19 years ago

Sure, there are always resources shared between tasks, at least the cpu and its register.

The PI problem often arises if a resource (memory, peripheral) can be controlled by more than one task (by way of a mutex/semaphore).

Using a dedicated task to control the resource minimizes the risk of a PI because the resource itself has some kind of memory.

This all could be seen as some kind of priority inheritance with the advantage, that not the OS decides but rather the resource itself.

I am new to RTOS design (only 8 years) and all this time I work with direct message passing OS´ an I never heared from any customers of a PI problem.

42Bastian Do not email to bastian42@yahoo.com, it's a spam-only account :-) Use @monlynx.de instead !

Vote

B

Boudewijn Dijkstra 19 years ago

Everybody has language difficulties. And I thought I already admitted to have misused a word to mean something it does not mean. But, I could have been more explicit.

Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/

Vote

E

Ed Prochak 19 years ago

Shared memory may be a special case in that there may be some different understandings of what we all mean. In many cases however I disagree that it will

the variations

A common memory pool, such as the heap used in C programming (via malloc() calls)
A common memory area at a fixed address and used by multiple applications a. with a simple interface (e.g. a circular buffer between 1 producer and 1 consumer) b. with a complex interface (e.g. a buffer between multiple producers and consumers)
A memory segment shared between different CPUs (dual ported RAM or addressable RAM on a common backplane bus)

since there is a common monitor routine (i.e., malloc), there are no deadlocks and race conditions. They are handled in malloc()
a. there are simple algorithms for circular buffers. This is easy to do.
b. Here, you need to define a protocol. yes that may be subject to failures if not designed carefully. I think this is the case you are thinking about.

this is similar and a little more complex than case 2.b and may even require hardware assistance. But just needs careful design.

Passing messages have no need to "suffer from deadlocks and race conditions" in a well designed interface. Message passing can be centralized (most OSs support some method) so the applications do not deal with any locking. Then above the message passing, you just build a application specific protocol. (for example, do all messages sent to a given application get a reply sent back?) Such message passing systems are little different than using IP (Internet Protocol).

So where are your future transactional interfaces? in the OS or the application layer? Who builds them? Presumably they are more complex than a common memory library, so what makes them more reliable?

Are they more reliable just because you are not involved in using or implementing them? sorry, I could not resist a little poke at you. 8^)

Ed prochak

Vote

S

speedplane 19 years ago

Message passing protocols can very easily have deadlocks and race conditions. I have to deal with them all the time. One simple type of message is sending a signal to another thread/application. If the protocol is complex and the threads/apps are sending signals to each other all the time, then you have to be prepared to handle many different types of ordering between the two threads.

< Passing messages have no need to "suffer from deadlocks < and race conditions" in a well designed interface

I think the key to your statement is "well designed interface". Designing a good interface for a complex system is by no means easy. In fact it may be one of the hardest parts of the entire project.

Transactions are commonly used in databases to do processing on the database atomically. There are a couple steps to a transaction in a database:

1) the hardware and/or software creates a small copy of the database which it is going to modify 2) It then does all of the processing on that mini-database 3) Once its done it checks to see if any other transaction modified the memory in that mini-database 4.a) If no one modified the original memory, then it commits its work and the transaction completes. 4.b) If another transaction modified that memory then it discards its work and depending on the implementation either goes back to 1, or returns some error.

The main advantage to transactions is that they are really easy to program and its virtually impossible to deadlock them or cause race conditions. The main disadvantage is that they are very hard to implement in non- database systems and will probably require specialized hardware.

As I said, this is still pretty far off. But the work is pretty interesting and may one day replace threads/mutex/semaphores.

For further reading there is a java-like language that supports transactions:

formatting link

Vote

K

keinanen 19 years ago

As you have a clear view who owns a particular data (i.e. is allowed to modify it), there should not be a great problem.

This was a major issue with FORTRAN and COBOL and with some stupid C-library functions, but in general, in most stack based languages, this should not be an issue, unless you really do some really stupid things.

With any sensible division of labour between tasks, the priority setting should be obvious.

Absolutely.

If you know that there are going to be problems debugging your program, you should really spend more time in the design phase.

Even then, you should plan how to debug the system, e.g. by placing some critical information into a static location so that any debugger can find it.

BTW, the first time I heard about "priority inversion" was with the Mars Rover story. However, I had been working with realtime multitasking systems for more than a decade without that kind of problems.

Paul

Vote

B

Boudewijn Dijkstra 19 years ago

Are you talking about synchronous (e.g. Unix signals) or asynchronous (queued) messages?

And it should a hard part, as a lot of other things should be easy when the interface is done right.

This description of transactions looks a lot like lock-free data exchange. This also involves read-copy-update, but no rollback.

Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/

Vote

C

Colin Paul Gloster 19 years ago

In news: snipped-for-privacy@k79g2000hse.googlegroups.com timestamped Fri, 06 Jul 2007 13:33:57 -0700, speedplane posted: "On Jul 6, 9:34 am, Ed Prochak wrote: > On Jul 6, 4:39 am, speedplane wrote: >

Do I misunderstand how transactions differ from cache coherence protocols for distributed systems without databases?

"[..]

The main advantage to transactions is that they are really easy to program and its virtually impossible to deadlock them or cause race conditions. [..]

[..]"

What is 4.b) described as?

Regards, Colin Paul Gloster

Vote

S

speedplane 19 years ago

Indeed transactions and some types of cache coherence protocols start to look very similar. The main difference is in the step 4.b. That step is typically called "rollback". The main difference between these systems and current threading systems is that this transaction interface is exposed to the programmer. They can use these transaction primitives to parallelize their code. With cache-coherence everything is hidden from the programmer which makes it very hard for the hardware developer to find ways to parallelize their code.

Vote

E

Ed Prochak 19 years ago

Well I certainly was thinking queued messages.

And the issue of being "done well" is to use the message passing features of the OS. Even small micro kernals have that feature usually. Why reinvent such fundamental functionality?

copy the whole DB? in hardware? what CPU are you using??

So how did the database suddenly become smaller?

and how does it do that without some locking mechanism?

The other solution is to commit if the user commands it. If there is another transaction, it is working on its own copy of the data, so no need to not write out the changes.

What you describe is one solution to transactions and deadlocking. Oracle for example calls it Multiversioning. (Well to be fair, theirs copies the unchanged data, allows the writes to happen, and discards the copy of the old data on commit.)

You silly boy. Even the holy Oracle DBMS can get these errors. Race conditions are easy: task A: update tableX set xcounter=xcounter + 1; task B: update tableX set xcounter=xcounter + 1; task A: commit; task B: commit; is the net effect an increase of 2 or 1?

locks are harder but still possible in such a transaction design.

I do not know what special hardware would be needed. Oracle and other DBMS products using this approach seem to run just fine on currrent hardware. What opcode instruction would you want to support this, speedplane?

Vote

S

speedplane 19 years ago

Jeez... I hate these newsgroups. Not only is everyone condescending, they're also wrong. I'm not going to explain this any further. If you are interested in transactional programming please read the literature:

Here's some info on a possible hardware implementation. It goes into detail about the copy stage (part 2) and the checking stage (part 3). A quick answer to part 3 is that they use a broadcasting system. L. Hammond, V. Wong, M. Chen, B. Carlstrom, J. Davis, B. Hertzberg, M. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun Transactional Memory Coherence and Consistency International Symposium on Computer Architecture(ISCA), 2004

formatting link

If you're more interested in how this effects performance / programmability read about the ATOMOS transactional programming language. B. Carlstrom, A. McDonald, H. Chafi, J. Chung, C. Minh, C. Kozyrakis, K. Olukotun The ATOMOS Transactional Programming Language Proceedings of the Conference on Programming Language Design and Implementation (PLDI), June 2006

formatting link

Those two papers will explain the basic concepts of transactional programming and you'll soon realize that it is different from ORACLE database transactions. Indeed special hardware is indeed not needed, but transactional programming may make designing massively parallel processors more feasible.

Vote

C

CBFalconer 19 years ago

... snip ...

Maybe you should consider improving your descriptions. The above collision can cause a fault, and meets your earlier specified criteria.

cbfalconer at maineline dot net

Vote

Debugging RTOS problems in embedded systems

Join the Discussion

Didn't find your answer?