Arm development systems

- L
- langwadt
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 12:13 PM

My comment was directed at Don wanting to use three threads to move data form point A to point B

-Lasse

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 12:21 PM

Real blocking, i.e. with no timeout, is a recipe for a flaky product. So one has to maintain state and keep general track of stuff anyway.

Once you become sufficiently paranoid about deadlock and timing holes, the main remaining difficulty with multithread is debugging it. (I've been writing multithreaded apps since 1992, and got my first SMP machine in about 1996, an early IBM Intellistation with dual Pentium Pros.)

If you have a good debugger, one that can at least bring up a source window for each thread at each breakpoint, you can find stuff pretty readily. Otherwise it's just iterated code reading and printf(). I've never used multithread in embedded code, though--this is actually my first foray into MCUs bigger than a PIC or an ATMega.

In embedded code for those little MCUs, I usually put timing-sensitive stuff in a timer ISR and do the housekeeping loop thing in main(). That way I'm always within a cycle or two of having deterministic timing. With the ARM, I haven't figured out which method is best yet--I'm usually doing finely-interleaved control and data acq, so DMA isn't very suitable.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net

- N
- Nico Coesel
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 1:28 PM

No. I was saying I never came across a C compiler where the 'char' type is the same as an 'unsigned char' type.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 1:46 PM

Even different gcc ports have different defaults, e.g. iirc the avr and ppc versions are unsigned by default. I always add the appropriate compiler flag to the makefile--nail the problem to the floor and move on.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 2:39 PM

On a sunny day (Thu, 09 Feb 2012 13:28:51 GMT) it happened snipped-for-privacy@puntnl.niks (Nico Coesel) wrote in :

OK, that I can understand. Thanks.

- G
- Gerhard Hoffmann
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 5:10 PM

Have you seen this one?

Looks interesting for small embedded projects.

It even might be self-supporting.

regards, Gerhard

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 5:26 PM

Looks like fun, but it's never going to fit a handheld form factor or run on two AA batteries. I'm trying to fit almost-almost-everything inside the processor, so it'll be a 100-pin LQFP on a really small board, probably with the analog stuff on the back and an aluminized paper shield.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 6:30 PM

Think about what you've said: blocking WITH a timeout is OK. There are two ways of doing this: one is to spin on a timer while affecting a non-blocking call; the other is to use a blocking call and let SOMETHING ELSE (effectively) spin on that timer (i.e., the OS) RELEASING your block when the timer expires.

In the first case (Nico's case as well as the example in the PDF I posted), the task assumes responsibility for deciding when to give up on the spin-wait. Presumably, it can consult a timer maintained by "someone" (including itself) to determine when enough time has been burned waiting.

In the second case, you have a mechanism to cleanly release the blocking state. As such, a CONTROL THREAD (thread1 in my example) can just as easily release that block (in addition to a timeout *or* in PLACE of a timeout).

This allows an *algorithm* to determine when a blocking thread is released. Time, user killing task, phase of moon, etc. And, *how* that algorithm goes about this is nicely documented by the API. Not ad hoc measures ("Let's set a flag and have the spinning task examine the flag to see if it should stop spinning" "No, why don't we just ZERO the timer that the task is already waiting on and get the 'release' FOR FREE?" "What if we just disable the task and restart it?" etc.)

OS's are all about moving common used mechanisms into a formal structure that can be exploited by tasks. Forcing EACH task to assume the details of these mechanisms themselves (e.g., tracking time, *implementing* timeouts, etc.) makes extra work for them and represents opportunities for error to creep in.

You write *any* program with debugging in mind. E.g., to debug the file copy example I posed, I would debug each thread independently in a "friendly environment" (a PC, ICE, mainframe, etc.). The read and write can be redirected to real files -- or the "console". The role of the missing worker thread (consumer or producer) can be faked -- with something that appears to remove or insert data from/to the buffer. Etc.

Implement black boxes ("flight recorders") and push status into them with an "#if DEBUG" enabled macro (this allows them to be removed later, if you don't want to ship the product with that). Being internal mechanisms (i.e., no costly I/O), they have minimal impact on performance. And, they can easily be sized to cover more or less event-time.

I have a set of debug macros that allow me to switch between this sort of internal black box *or* spit "progress reports" out a virtual serial port (when using an emulator, I arrange for that serial port to display color-coded messages on the console: task1: started task5: waiting on file task3: terminating task1: allocated 4332 byte buffer at 0x3008 task1: spawning slave task 'slave1' slave1: started task2: got timer event task5: wrote 238 bytes So, as I bring tasks on-line, I can see that they appear to work ("Hey, where's task6? Did I forget to start it?") Then, as I get confidence with them, I can just turn off their individual progress reporting leaving only the tasks of interest to me.

With the multitasking framework, I can have a task that actually snapshots black boxes and passes their contents to me *while* the system is running. "It's just another task. No big deal."

The problem with that approach is things tend to creep into the ISRs until the temporal size of the ISR becomes an "issue". I like ISR's to be "slicker'n snot" -- do the absolute minimum that *could* be done (which is often different from the minimum that your implementation NEEDS to be done) and get the hell out of there.

E.g., one of the nastiest tasks to write is a barcode decoder that processes "live video" (i.e., a pulse train). You can't ignore it (unless you have a user interface that only allows barcodes to be scanned at certain times: push this button, then scan barcode) so it's a live input that can steal resources from you at ANY time.

The time between edges (which are the significant events in that video stream) can be VERY short. E.g., a 0.007" nominal bar width scanned at 100 ips (which is within the manual scanning range of your GRANDMOTHER) means events happening at 70us intervals (assuming no ink bleed -- which could make a space narrower).

Not only do you have interrupts at that frequency, but you also have to *process* that data at a comparable average rate -- otherwise data comes in faster than you can process/discard it!

When I process barcodes, ALL the ISR does is take a snapshot of the system timer (to mark "when the edge was noticed") as well as a snapshot of the "how-long-has-this-ISR-been-pending" timer, if the hardware supports it (this lets me LATER offset the recorded "when the edge was noticed" timestamp to determine "when the edge OCCURRED"). Then, set the ISR to look for the *opposite* edge and you're done.

I don't waste clock cycles computing the "time since last edge" (which is what the algorithm will ultimately want) because some background task can do that OUTSIDE the interrupt context -- as long as I provide it with the timestamps on which to operate.

In the late 70's, I designed using the "foreground background" style. It was *always* hard adding new features (when you think about user interfaces and multiple concurrent "machine/mechanism" activities). Trying to get timely responses from multiple

*competing* activities/requirements leads you to more brittle solutions. To "hard" RT instead of a more robust "soft" RT approach. (i.e., "if I *don't* get this done in time, the product is broken. I need a faster CPU. I need another ISR. etc.")

No doubt your algorithms are driven off a clock. ISR trips, you do what needs to be done *then*, that *instant*, and then hope the background figures out what you need for "next time".

If the background can't RELIABLY keep up (because you underestimated the resources required for some other aspect of the product that is operating concurrently), you move more of that background activity into the ISR. This leaves less time for the background... which means less gets done, increasing the tendency to NEED to move more into the ISR, etc.

[In the barcode example, a user could INTENTIONALLY run a barcode label across the scanner at very high rates of speed *continuously* -- like shaking a can of paint. The system would grind to a halt as resources were shifted to recording and processing those edges. When the user's arm got tired (it takes very little time to tire when you are doing that sort of thing), the system just picked up where it left off.]

I worked on a LORAN-C position plotter in ~1980. Because of resource limitations (back then, 2716's were $50/each... having

*6* of them -- a whole 12KB! -- in one product was extravagant!), you couldn't come up with sophisticated control solutions (IIRC, we had 2x128 bytes of RAM and used much of that for the floating point computations required to project the hyperbolic grid geometry onto an oblate Earth model to convert to lat-lon).

So, things ended up in ISRs for all of the above reasons -- you couldn't control the X&Y stepping motors for the pen recorder in a background task at independent rates at a fast enough rate. So, you moved the decision making (reduced to Bresenham's) INTO the ISR that actually steps the motors. You don't want the motors to drive the mechanism into the limits so you have that ISR watch the end-of-travel sensors. And, since those could be faulty (salt water on a ship's bridge), you have to track where you *think* the mechanism might be ("Gee, the mechanism is only 20,000 steps wide and I've already moved 25,000 steps without seeing the other side! Am I just grinding gears? Has a belt broken?

You can't decode keystrokes in the background (too sluggish) so you have the keypad scanning ISR do the key decoding, as well. While you're scanning keys, you might as well do the display multiplexing. And, of course, you have to periodically obtain the current LORAN coordinates so you can figure out where you

*are* (usually done every 10 GRI's). Etc. [Recall, we're talking about a 2 or 3MHz CPU I.e., memory cycle times of about a microsecond :> ]

It just doesn't make sense to saddle yourself with imposing and maintaining all of this "mechanism" at a time when resources are so plentiful (in all but HUGE volume products -- mice, keyboards, etc.)

If your control decisions are tied to your *immediate* observations, you have to ensure that you can process the data in time to make that decision. (Or, can have a "safe", default decision that works "well enough" -- less than ideal control)

When designing control loops, I try to decouple the sampling process from the control/actuate process. If they must run in lock step, I arrange for the sampling function (DMA, ISR, etc.) to update the control -- but N sample periods "later". (i.e., read sense4 and output control3; process sense4 to compute control4; queue control4 to wait for ISR; read sense5 and output control4; etc.). I.e., it looks like a fixed, predicatable lag in the control.

If I can't guarantee that I will have enough time in EVERY "sample interval" to compute the next control value, (to accommodate momentary overload by something EVEN MORE IMPORTANT), then I might increase the fixed delay to *two* sample periods with the expectation that I might not be able to get the first control value computed in the first interval... BUT, I will be able to get it AND the next computed in the interval that follows!

(of course, this all depends on the process being controlled, field characteristics, etc.)

Anyway...

You can use an ISR to drive those "events": "time to make the donuts... NOW!!!". *Or*, you can have an event driven preemptive multitasking system that virtualizes that ISR. (many MTOS's claim to be preemptive but are really only pseudo-preemptive)

I.e., the timer ISR can signal a timer event. This can "make ready" all tasks waiting on that event. *You* decide which of these is "most important" (priority value) and the OS activates it. The time (latency) from ISR to task activation is fixed (if you design the control structures correctly), regardless of how many tasks are waiting.

Whether you update the control variable in that task *or* move the actual: output(control_value); into an ISR and use the timer event to restart the "compute next control value" task is a question of how much jitter you can tolerate in that "setting" time. (i.e., if you can tolerate *none*, drive it with DMA! That's how I like to drive high performance stepping motor designs where you need to impose an acceleration profile)

I am convinced that this sort of approach GREATLY simplifies design and maintenance. I suspect I could write the LORAN plotter application, from scratch, in far less time without changing the algorithms (assuming I had the extra RAM to support the control structures) just by NOT having to manually juggle so many issues at the same time. ("We need to mark an X on the chart at this point. Tell the motors to stop tracking the vessel's motion. Lift the pen. Move over a little. Drop the pen. Move diagonally. Lift. Move. Drop. Move. Lift. Go back to where we were. Let the motors resume tracking the vessel's motion. -- ALL in the ISR!") Every new feature was a new challenge. :<

(In its defense, it was the state of the art, at the time)

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 10:14 PM

You don't use spinlocks for individual tasks on a uniprocessor, at least I don't. (A state-machine design with one big overall loop doing housekeeping is the ticket.)

When there's an operating system available, you use blocking calls with timeout parameters, e.g. select(). When there isn't, you poll once, set the relevant interrupt enable, and go back to the housekeeping routine. In the ISR, you set a flag to say that, for instance, there's a character available in the UART register. Tasks in the housekeeping loop check to see if there's a flag set that gives them a job to do, and if not, return immediately, so that loop spins fast.

Or the main loop can consist of a function--basically a thread scheduler on a diet--that looks at the flags, manages the state transitions, and calls the relevant service functions. When you have to go that far, a lightweight RTOS starts looking pretty good, though I've never used one myself.

If there's a way of making the call return unsuccessfully, that's functionally the same. You have to have error recovery someplace, and that requires saving state.

Anyone who's worried about ad hoc stuff should swear off embedded programming. ;)

Sure. At least above some threshold of complexity. For my stuff, the reason I'm using a MCU at all is that I need to interleave control and data acq, which means maintaining control of the interrupts to attain timing coherence. That's hard with a RTOS, no?

Good luck debugging clusterized simulators thread-by-thread. My parallel FDTD code would execute half a time step and then croak. If you have threads doing essentially unrelated things, so that you have N programs running concurrently, that can work fine.

For events that you're expecting. Hitting a memory corruption bug isn't that friendly, unless you have a mechanism like mudflap that maintains a copy of the call stack.

Sure. Probably most of us who have done serious multiprocessor work have those.

That can be useful. In my simulator code, each daughter process has a bidirectional socket connection to a supervisor process (which is also doing its own computation). That way if something goes wrong in thread M running on host N, the overall supervisor process on the front-end host gets all the info and can enforce an orderly shutdown of the whole multi-host run.

Yup. For serial, I usually just cram the character into a circular buffer, set a flag, and exit. You have to update the overall state machine, but I like to do that with a wrapper function around all of the housekeeping routines, so there's no code duplication.

Sounds like a good approach.

For something like that, I'd probably want to use multiple processors. Fast unis are harder to manage and make the gizmo more brittle and harder to expand.

Normally the background produces a list of things for the ISR to do, so when it does one, it just increments a pointer and then exits.

Eventually your approach will run out of resources too. Threads are not a silver bullet, especially when you have to contend with a brain-dead thread scheduler like Linux's. (*)

Okay, but that isn't an argument in favour of multiple threads, just lean ISRs.

Must have been a _slow_ housekeeping loop. Mine typically loop at kilohertz rates, sometimes faster. Of course I'm not using a 1702.

If I had all that foofaraw to worry about, I might well use an RTOS too. But that's not enough to justify your blanket trashing of all housekeeping loops.

Normally not, but sometimes I may need to wait for a polygon edge to cross a laser beam, or deal with an explosion of junk data caused by a beam glinting off a bolt head, or that kind of stuff. Usually a moderate-size circular buffer is fine.

That's where a separate small processor can make a lot of sense.

As I said, I think everyone agrees that there's some threshold above which having an OS makes sense. I just don't agree that the threshold is as low as you claim.

Cheers

Phil Hobbs

(*) My simulator runs on both Windows and Linux clusters. It's about

20% faster on Windows because in Linux there's no way to express the fact that one thread is the most important in the process, except to make all threads real-time and therefore bring the machine to its knees. Ergo you can't control the communications latency between hosts, ergo they spend a bunch of time waiting for the next host over to get its act together. I'd be quite happy for all the compute threads to run at a niceness of +19 and the comms threads at +18, but nooooo, Czar Linus doesn't think I can be trusted with such power. Brain dead. (See, I get to rant a bit too.) ;)

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 11:01 PM

Most of our products have a main run-forever loop and a periodic interrupt service routine, and sometimes a serial interrupt. The main loop just does whatever it can, with some things running in response to a period-interrupt flag, and some (like a command parser) when the serial IRQ sets a flag to indicate that a buffer is ready. The main loop can also have a table-driven dispatcher near the end, to run some state machines more or less often than others. IRQs can do that, too.

No program states are saved anywhere because nothing is ever suspended. Everything just blasts through state machines.

It's nice to have a few counters that the periodic IRQ ticks down to zero. The main loop can use these for various handy things.

The problem with an RTOS is the task switching time and the need to manage a bunch of stacks. And the very tools available can actually complicate system design.

Even a dinky ARM, like an LPC1754, can run a pretty hairy ISR at 100 KHz.

--

John Larkin, President
Highland Technology, Inc

jlarkin at highlandtechnology dot com
http://www.highlandtechnology.com

Precision electronic instrumentation
Picosecond-resolution Digital Delay and Pulse generators
Custom laser controllers
Photonics and fiberoptic TTL data links
VME thermocouple, LVDT, synchro   acquisition and simulation

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Feb 9, 2012 11:59 PM

The point I was making addresses Nico's (and the PDF's) "non-blocking" approach. You test and wait. Whether that wait is an OS provided service or your implicit "return to mainline code" (go do something else -- while "spinning" -- and come back and TEST again, later)

*You* are doing the testing, explicitly. And, *you* are deciding when to give up -- by examining an "elapsed time", etc.

This is a common activity. A common *service* that an OS can provide!

So, each timeout requires an ISR? Let an OS virtualize that for you. HUNDREDS of timeouts from a single "service". "Tickle me in 23.5 ms" "Unblock this function call if it hasn't completed in 30ms", etc.

Again, this makes *you* do all the testing. If, instead, you tell an OS, "wake me when this THING (event) happens", then

*you* don't have to code for it.

You not only need a mechanism for *reporting* the "error exit", but you also need a mechanism for pulling attention (execution) away from whatever statement(s) is doing the actual block (whether it is waiting for an IRQ, an event flag, etc.)

That's the appeal of working in a structured environment. There are "established ways" of doing these things. You

*don't* have to think about "tricks" to coax the implementation to do something for which you recently discovered a need. "Gee, the file copy hangs indefinitely. How do I *abort* it?" or, "Crap! it's an endless STREAM that I've told it to copy. I surely don't want to wait around for THAT!"

Depends on what guarantees you want from the RTOS (note that I've only been talking about MTOS's, so far). Instead of picking hardware resources to drive your algorithms (ISR's, etc.), you pick appropriate VIRTUAL resources for the same thing. The RTOS gives you guarantees -- just like the silicon gives you guarantees -- about how long you will have to wait before your code is activated in a particular set of conditions (e.g., the hardware says IRQ latency is X *once* your IRQ is enabled and of sufficient priority and the executing instruction has completed execution, etc.)

You have to design with decoupling in mind. I.e., the file copy example will work regardless of which thread you choose to "single step". It just works very SLOWLY! :> (like debugging a processor design at DC).

One of the projects I am working on currently has dozens of processors coupled loosely or tightly. Before I started the design (both the hardware and software ends of it), I thought about how I would *economically* be able to debug algorithms where one part of the algorithm was executing on one CPU and another was executing on another CPU both under the control of a *third* CPU, etc.

(It is *dizzying* to sit between three workstations talking to three different processes on three different nodes running three different pieces of code ... and trying to keep track of what's happening, where! But, the code's expectations of timeliness have been crafted so that I can do this without breaking it)

Of course! But the black box gives you a cheap way of seeing how far you got -- and logging values of interest -- without seriously impacting performance (which could alter the correctness of an RT algorithm).

It's the equivalent of attaching a logic analyzer to trace "whatever". But, since you can execute *code* to decide what (and when!) to store, you aren't as constrained as you are with a passive LA approach.

My current OS lets external agents "control" tasks via mechanisms that I (or the task) can pass to those agents. So, I can look inside a running task and watch it execute, kill it, free its resources, etc.

(of course, this is a protected interface)

For (traditional) serial ports, I have a variety of strategies based on the performance level required and resources available to me.

Outgoing data is placed in a shared TxFIFO. That action implicitly (or explicitly) notifies a housekeeping task that is enabled whenever the transmitter is idle. This task primes the transmitter with the "oldest" character enqueued (e.g., I/O need not be character at a time) and enables the TxEmpty interrupt. Once completed, this task dies (it will be reenabled when/if the transmit FIFO ever empties).

Thereafter, each TxIRQ pulls a character from the TxFIFO and places it in the transmitter. So, the transmitter runs at the full data rate. When the TxFIFO empties (i.e., a TxIRQ finds nothing in the buffer), the TxIRQ is disabled and the transmitter housekeeping task is reenabled (so it can watch for an appropriate "write()")

If I have lower performance requirements, the TxIRQ is not used and a background task feeds the transmitter directly (obviously limiting how quickly data can be transmitted).

Hardware handshaking and software flow control complicate this process. In some cases, this is embedded *in* the ISR -- which complicates the ISR and makes things more brittle -- while other cases can move this to a background task (e.g., "Ooops! Buffer is filling up. Let's signal the other end of the link to stop sending!"). Break generation and other synchronous control signalling options present other opportunities for variation (e.g., do you *queue* these as "commands" that the ISR interprets: "generate a long BREAK at this point in the data stream" or stall the ISR at the appropriate point and move that functionality into a background task, etc.)

Reception is handled similarly. ISR receives an incoming character and places it in a shared (but separate) RxFIFO -- along with the status codes (overrun, parity, etc.). The RxIRQ always watches for RxFIFO overrun -- which it reports as yet another "status code" tied to the character that caused it. Pacing is handled in the same variety of ways above.

Any task wanting to *consume* a received character waits on the RxFIFO and removes as many characters as it needs -- blocking if need be (and requested).

I.e., the serial device looks like a genuine "service" that the system manages. (In some designs, there is actually a cdevsw sitting above this access, etc. and a task open()ing such a device can cause other devices wantint it to automatically wait)

I begged for a second processor on the first such machine. There were subsystems that ran continuously eating up huge parts of the CPU (data acquisition). They seemed ideal for "spinning off". But, we had a philosophy of pushing the processor into overload instead of adding hardware.

In one instance, I had 8 oscillators that I had to sample (count) continuously (moving things into the frequency domain is a favorite trick for trading off low data rate data acquisition). The oscillators ran at a nominal ~100KHz. I had configured 8 *16b* counters to accumulate these counts. And, had to *justify* this over using

8 *8b* counters and "counting the overflows in software".

Remember, this is in the generation of the 2-3MHz 8b CPUs. I think the carry out ISRs would have eaten up ~50% of the CPU by themselves!

(Can you spell "Resource Starved"? :> )

Of course! I'm an advocate of multitasking for COMPLEXITY REDUCTION. Make it easier to get the implementation *right*.

But you extend the same philosphy upwards into the higher levels of the application. I.e., cut the application into little pieces that are easy to "get right" instead of trying to push them all together into a monolithic block of code.

E.g., note the barcode example in the PDF I posted. Several *trivial* (fast, lean, easy to get right!) tasks in addition to the ISR *just* to "decode" a barcode label. And that doesn't assign any *meaning* to it (just makes it available to the application as a "barcode" event)

Yes. Because damn near EVERYTHING was happening in ISR's! If you can't get around to "polling" (housekeeping loop) the keypad at a decent frequency, you won't "see" key closures (membrane keypad so you can't count on a key being "held down")

I'm not "trashing all housekeeping loops". Rather, I'm praising the benefits of using more structured programming environments. I could grow my own vegetables and raise my own beef. Or, I could go to the grocery and *buy* those things. What do I want to spend my (limited) time doing? How much risk do I want to take on (what happens if the few cattle get sick and die? do I go hungry??)

The executive I outlined in that PDF was used for 100KB+ binaries. WRITTEN IN ASM! Granted, it was more than 25 years ago (the first pass of the article bears a 1987 date and it *followed* the actual use of the technique) but that doesn't mean there weren't better ways of doing things at that time. Folks would argue about whether to use 0315 or 0xCD -- I would ask, "What's wrong with 'Call'?"

(Amusing that one of our products hosted a BASIC interpreter...

*under* that crippled multitasking framework! Talk about making life hard for yourself... :< )

I look at it as, "When would you be willing to GIVE UP the MUL opcode? Or JSR/BAL? etc." I.e., they all exist to make life easier. None are *required*. In an industry (software) where everything is so expensive and quality is often dubious, it seems silly NOT to do everything that you can to improve your outcome!

(esp when so much of this can be reused/leveraged for other projects!)

Of course, YMMV! ;-)

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Feb 10, 2012 12:37 AM

You just crossed the line from fandom to irrationality. Of course not, silly.

Only if you do it your way. The normal approach doesn't need that--you just poll and move on.

But that's a manufactured problem. If you don't wait, you don't have to worry. Just keep going, and the next timer tick or UART interrupt will let you know what to do.

AFAIK it only gives you _maximum_ wait times. For timing coherence, wait times that are too short are just as bad.

How would you do that in the simulator? It works only in a restricted problem domain.

Try a few hundred processors.

Modern MCU development systems do that for you already. There are silicon resources in the ARM for all of that--no coding required.

Sure. But you can just as easily do that in a housekeeping loop, like everybody else does.

Okay, sure, lots of words describing what everybody does in one way or another.

I think that's a dumb philosophy. CPU power is dirt cheap, programming resources and (especially) field failures aren't.

I used to sum linear and logarithmic detectors so I could get more dynamic range out of a 10-bit ADC. Back in the day you had to do things like that, but we're talking about 2012.

That just hides the interactions and makes them harder to figure out when they occur. A housekeeping loop that's too slow is blatantly obvious, but latent race conditions that only show up when the system is stressed are not obvious. Obvious is good. Obvious is worth a lot. (Did I mention that obvious problems are better than non-obvious ones?)

Your fanboy style is unbalanced enough that you could have fooled me.

Right, but again, this is 2012.

That's up to you, of course. To me, it looks like the case of the guy that only had a hammer.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net

- N
- Nico Coesel
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Feb 10, 2012 1:40 PM

At what cost? The guru-level programmers I know avoid multi-threading like the plague. There are too many pitfalls especially when novice programmers need to be able to work on the code as well. At some point you will want to move on in the company and not being held back by ancient projects only you understand.

Like I said before. An RTOS can be usefull if you have tasks which take very long to complete and need to be interrupted to process other tasks. One can still argue whether the long tasks could or should be divided into shorter tasks so they can be serialized.

In a lot of cases (signal processing for example) there are only two tasks: signal processing and I/O handling (reading switches, user input). In such cases I run the signal processing task entirely inside an interrupt and the rest from main. This basically gives me two independent threads without any OS overhead.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Feb 10, 2012 4:26 PM

That might be a bit strong, at least on PCs. Nobody wants to go back to the pre-WinNT days of single-threaded GUIs, when a badly behaved user interface could effectively freeze up the machine. (Are you listening, Mozilla???)

And there are lots of other reasons for multithreading. For instance, I have a serial comms class that I've used for 15 years, that has a high-priority thread encapsulated inside it to handle the hardware interaction and make it look like a UART with an infinitely deep buffer. (Doesn't work in Linux, of course, on account of that stupid thread scheduler.) And in the multicore world, there's no other way of harnessing that performance in a single process.

That has all the instability problems of normal threading, though, with none of the guarantees--you usually don't have a separate stack in the ISR, for instance, and functions called from an ISR have to be reentrant.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net

- N
- Nico Coesel
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Feb 10, 2012 6:21 PM

Thats more like a design flaw. I'm talking about having more than one thread in an application. Sometimes you can't go around having more than one thread.

Linux has several schedulers these days. Ever tried to use a different one?

Actually on ARM7TDMI you have seperate stacks for normal IRQ, fast IRQ, exceptions and your application. On Cortex you can choose to have a seperate IRQ stack.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Feb 10, 2012 7:20 PM

Hi Phil,

You said:

"you poll once, set the relevant interrupt enable, and go back to the housekeeping routine"

This begged the question, "So what happens when some OTHER aspect of your product requires a CONCURRENT timeout?" E.g., you're blinking a light (1 sec on, 1 sec off), waiting for user input (with an "accept default if not overriden in X seconds" and implementing a timeout on a disk access. Do you "set the relevant interrupt enable" for each of those three independent activities?

"Of course not, silly!"

Such an approach would lead to serious constraints on your design

*or* having to deal with unavailable resources ("Sorry, I can't blink the light right now. Please try again, later.")

My question is based on actual experience. I've encountered systems where a hardware timer *was* used for all timing related "events" -- but not in the "typical" manner (of virtualizing "software timers").

Instead, the timer was programmed to expire at the nearest dead-line/event and the timer ISR then reloaded the (hardware) timer from a delta queue. Much *finer* control of time (though often unnecessary) but at a higher cost.

What makes "polling" the "normal approach"? Do you really think your cell phone polls everything that "might be/not-be ready"?

*You* have to implement policy and mechanism. You should only have to worry about *policy*. If you are implementing mechanism in each individual instance, you carry a higher cost for that. Did you remember to check to see if power is failing so you can abort the wait *early*? Where *else* do I need to inject the code to do that?

The minimum is ZERO from the time of the triggering event. You happily live with that same zero when the triggering event is a hardware interrupt. So, what's the difference if that same zero comes from a VIRTUAL interrupt?

You wouldn't set up your interrupt to trigger some unpredictable amount of time before the time at which you *wanted* to begin your activity. So, don't have the virtual interrupt trigger before that time and you don't have a problem!

The problem with the "virtual machine" that multitasking environments provide is that the timescales are different. You can (reasonably) guarantee that an ISR will be invoked within microseconds of the actual "hardware event" that tugged on the IRQ line. You can see the (small number) of other such "activities" that might be competing for the "foreground".

But, in the MTOS environment, those competing entities can be many more and many more complex. Instead of a handful of "interrupt users", you can have scores of "tasking users". If you've not considered how those tasks co-operate, it is hard to predict what might be competing for specific resources (e.g., the CPU) at specific times.

This is akin to not keeping track of which IRQ's might be active (and at which priorities) at any given time.

Work in an environment with scores of (active) IRQ's and you face a similar problem.

[But that doesn't mean it can't be managed. That's why interrupts AND tasks have "priorities"]

File copy? (assuming no disk) Replace write() and read() with dummy routines that move bytes to/from a large FIFO (or the console, etc.). Watch the contents of that FIFO as the program executes. Or, break execution and have a look at periodic/random intervals.

It depends on the nature of the dummy data that you feed in. E.g, if your read() merely grabs the current time of day (to the second) and returns that as the data read from the "disk", just look through the FIFO and verify that the most recent "writes" resemble the time displayed by the clock on your wall.

That depends on the type of application hosted on those processors. E.g., you would debug a symmetric (or nearly symmetric) algorithm different than one in which each processor was doing something specific/different.

E.g., when the home automation system is complete, here, there will be upwards of 70 processors running at any given time. Most, small little "motes" (CPU's are cheap!). If I notice that the system is failing to announce "arrivals" properly (i.e., someone just rang the front doorbell), I would *not* watch the mote that monitors the doorbell "button" to notice the contact closure. Nor would I watch that closure being debounced, recognized by the daemon charged with watching it. Nor any of the various mechanisms that would result in its transfer to the "automation controller".

I wouldn't watch the daemons in the controller recognize that incoming event. Nor trace the execution of the DBMS as it determines how to respond/notify that event. Nor, the speech synthesizer gluing together diphones to tell me what's just happened.

I wouldn't look at the steps that the localizer undertakes to figure out where I *am* (physically) in the house. Nor would I watch the audio be packetized for transmission *to* "me".

[Nor any of the other little boxes that have to participate in this process]

Instead, I'd monitor the log (black box) at the controller while pressing the doorbell. "Ah! It recognized the incoming event. So, I know the sensing mote, network fabric, etc. on that side of the application appears to be functioning. Now, check the black box for the localizer to see where it

*thinks* I'm located. Hmmm, that explains the problem; it thinks I'm in the back yard and has routed the announcement to the transmitting mote that services the back yard! Obviously too far away from my current location for my earpiece to pick it up! I guess I'll need to focus on the localizer subsystem..."

When I designed the system, I considered how difficult it would be to debug a "process" (in the abstract sense) that was distributed over multiple, physically separate, nodes. I know I (personally) can't juggle more than three "contexts" effectively in my head. So, I discouraged functionality from involving more than three devices AT A TIME.

Yes, this goes back to earlier x86 implementations, etc. But, using those requires you to either run the application at reduced speed

*or* limit how much "history" you can track.

The former isn't always possible (without moving effort into a "good simulation"). The latter requires you to be able to identify a suitable trigger that will be proximate to the activity of interest.

The black box approach lets you litter your code with "log writes", decide how many resources you want to devote to the caching of those writes, and then let the code "just run". You can push lots of extra detail into the black box that you can later chose to ignore. Or, selectively disable reports that are not of interest to remove some of that clutter (and effectively increase the size of the FIFO).

In resource starved environments, you can change the macros that implement this so that they simply push bytes to a fixed address (even if that address is already "allocated" to something) and monitor that address (with a logic analyzer of an external "WOM").

If you move it to a background activity, then you run the risk of the UART exhausting its available data (buffer) when the comm rate exceeds the rate at which your "loop" can get around to re-servicing it. It also means you have to dick with the interrupt mask more often (since it is the entity that enforces atomic access to the shared resource, in that case)

When it's NOT your name on the building, you usually don't have much choice! :> Even folks coming from engineering backgrounds seem to *quickly* forget the sort of issues that affect design -- once they move into managerial positions. This is especially true of new technologies (putting processors *into* consumer products was a relatively novel idea in the 70's)

How does it "hide" them? You know where a given task's inputs come from (you know where the characters that you receive from your interrupt driven UART come from!). Why would you suddenly become "undisciplined" in tracking the relationships of tasks? Don't you know what the various (physical) processors in a multiprocessing application are doing? The relationships between their data?

No. Use what you *need* to get the job done. But, don't use less than you *can* thinking you're saving something!

You wouldn't use small signal transistors when you could use an op amp to solve the same problem (unless you could reduce the problem to a few discretes *and* had significant cost pressure on you to do so).

You wouldn't use fixed BINARY point (i.e., not just natural integers) math when you could use floating point (unless you could reduce the problem to whole integers and had significant resource pressure to do so).

People seem to shy away from OS's in embedded applications more often than they should. I haven't been able to figure out if this is due to a lack of experience, being at the mercy of an OS vendor or just "not wanting to move out of their comfort zone".

I've had to "fix" too many products over the years that someone "threw together" with a pile of spaghetti code absent any real structure that a more formal environment could have provided. Almost always the foreground-background split. It works fine -- until it doesn't. Until the customer wants a feature that it didn't anticipate. Until the load on the processor is increased and the solution fails to scale.

Many years ago, I had to recover some sources for a product that the owner had "misplaced". So, I was faced with a reverse engineering task *and* a "product enhancement" task.

The enhancement was trivial, conceptually. Of course, clients see *everything* as trivial and expect that to be reflected in what they will have to *pay*! :>

The product itself wasn't complex. Something that I could have knocked off "in no time".

I bid a comfortable number for the documentation recovery, and the "enhancements". I was spot on regarding how much effort it would take to reverse engineer the code. And, the effort it would require for me to document that code (remember, there are no names for variables, functions, entry points, etc.).

But, the code that was revealed was *so* poorly written that the enhancements proved to be painful to implement. Since the product was already designed and deployed, I couldn't change the resources available to the application (damn near every *byte* of ROM and RAM was spoken for. And changing the processor would have been pure fantasy!)

So, I was stuck with a bad implementation that I now had to "fit in". What had looked like a walk in the park turned into a project from hell.

"*Will* I be able to make this work in this framework??"

(thankfully, it was only a few months)

You glossed over the example. Imagine how *you* would add a "scripting language" to the "one big loop" design approach. Of course, you *can* do it -- as evidenced by the fact that we ran a BASIC interpreter on this bizarre platform *alongside* (concurrent with) the other tasks that ran the instrument.

But, the design of that interpreter was needlessly (?) complicated by the crippled environment. I.e., it had to EXPLICITLY relinquish control of the processor (preserving NOTHING on the stack -- even a record of "what it was doing at the time") routinely to ensure the other tasks weren't delayed in their processing.

Do you interpret a single scripted statement? How long will that take? Do you constrain the user so that he can;t write "expensive" statements? No transcendentals in expressions, etc.? Do you interpret *part* of a statement and (manually) keep track of where you were in that process ("OK, I have computed the cosine of the angle. Next time I get a chance, I'll compute 2*a*b. Then, the time after that, I'll work on squaring a. And squaring b, the time after that! Then, put it all together and tackle the square root.

*Then* i can advance to the next statement in the script...")

When you have a "virtual machine" (afforded by a multitasking environment), you concentrate on the problem you are solving (interpreting the script) and NOT the mechanism of "sharing the processor".

When I'm driving *nails*, I surely don't want to heft a LARGE ROCK if I've got a hammer nearby! ;)

Let's just agree to disagree. If your solution works for you, great. Mine works for me. Let's just each avoid *maintaining* the other's projects! :>

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Feb 10, 2012 7:32 PM

Me too. If you have one thread in a GUI app, then if the program gets blocked on e.g. a slow network socket, the GUI become unresponsive. Having a separate thread for the GUI is a very great help to good application design.

AFAIK there's the original heavyweight one, where a thread is actually a separate process, and the lightweight pthreads one. Do you know of one where I can have realtime and normal threads in one process, or even set different priorities for different threads in a user process? The pthreads docs imply that you can, but you actually can't--none of the priority or thread scheduling options (round robin, priority, etc) actually works for a user application, despite what the docs say.

I'd (reluctantly) be willing to run the simulator as root, so as to be able to have realtime threads, but as soon as you make one thread real-time, they're all real-time, *even the compute-bound ones*. Having compute-bound real-time threads makes the computer totally unresponsive to user input.

All I want is to be able to express the notion that some threads don't need as fast response as others. I can do that on every OS except Linux.

Fair enough, that helps a lot. They still have to be reentrant, though.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Feb 10, 2012 8:04 PM

There's no such thing as a concurrent timeout on a real life uniprocessor, regardless of whether you're using an RTOS or not. You just have to adjust. That's why it's silly.

Your average ARM has, like, 8 separate timers. It's pretty unlikely that you're going to run out.

I don't mean sitting there checking its email every 10 microseconds. I mean "you go look, and then if there's nothing to do, set the interrupt enable and move on".

This is embedded code we're talking about, not a big-iron accounting system. "Policy" and "mechanism" are concepts imported from bigger iron.

It isn't _constant_, which makes it entirely useless for my purposes. Try running an FFT on data taken with nondeterministic timing, and you'll see what I mean. If I control the interrupts, my maximum timing uncertainty is equal to the execution time difference between the longest and shortest instructions, i.e. 1 or 2 processor cycles at worst, and I can use three lines of assembler to iron that out if I really need to.

You obviously don't do a lot of data acq.

Microseconds might as well be years. Even with a 1 kHz signal, 1-us jitter will limit the noise floor to -60 dBc or worse.

Again, we're in violent agreement that RTOSes are useful for certain things, especially once the task becomes very complex. You just want to use it for everything, which I think is insane.

No, not file copy, fine grained interaction. Sim threads have to exchange data thousands of times in even a short run. You can't debug that your way. No way, no how.

Still a dumb philosophy, then and now. Repeating the mistake doesn't improve it.

The example I gave was a latent race condition. Those are all over the place in embedded systems, even in well-designed code--the buffer of a UART is an example. You have to get there before it fills up. Change the system parameters, e.g. the baud rate, and it craters. That's a lot harder to find in threaded code. That's a simple example, but there are lots and lots of others. Housekeeping loops make that stuff a *lot* easier to spot.> >> I'm not "trashing all housekeeping loops".

So your original data copying example that started this whole exchange couldn't safely be done without an RTOS? Give me a break. Fanboyism, pure and simple.

You're nuts. Of course I would--transistors are a great deal more predictable. I do discrete designs just about every day.

Of course I would. Accounting s/w has worked in units of $0.01 forever.

You don't think it's a bad thing to be dependent on vendors? I sure do. People go out of business, get aquired by rivals and killed off, and so on and so forth. Look at the EDA world.

So it was crappy code. So what? All RTOS code is picture-perfect?

I'm not particularly interested in the example, since it illustrates what we already agree on--namely that above some complexity threshold, in the absence of time coherence requirements, RTOSes can be useful. But that's not what we disagree about. There's no way I'm going to try putting a scripting language in my little handheld measuring gizmo.

Fair enough.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510
845-480-2058

hobbs at electrooptical dot net
http://electrooptical.net

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Sat, Feb 11, 2012 11:15 AM

On a sunny day (Fri, 10 Feb 2012 14:32:42 -0500) it happened Phil Hobbs wrote in :

It is often a must.

What I often do is have some live widgets, for example a VU meter, or a scope, or signal strength, or some graphics display. The real acquisition runs in one thread, and sets memory locations and flags that the GUI thread then uses to update the display when it can.

The exception is if you can have non blocking for example ioctls(), in that case the GUI thread can acquire data itself.

This uses a GUI thread, an internet thread, an audio thread, and a color thread (to send UPD color control data):

formatting link

The GUI thread just updates the GUI with data made available from the other threads.

This one only has the GUI thread, and uses non-blocking ioctls()

formatting link

The signal level meter and the pie signal to noise display are alive in real time.

I use xforms to program these things, fast, and small code size.

For things approaching this complexity level maybe the best is just run Linux on a small board. Then you can program in C and use all your existing libraries and code. Add an FPGA (maybe with Linux in it) to do the time critical things.

Especially if you have the space and only expect to sell a few, then the hardware cost is not that much of an issue relative to the development cost in hours.

Well, my 2 cycles worth.

- J
- josephkk
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, Feb 13, 2012 3:16 AM

A whole lot depends on the environment you are working with. Sometimes quick and clean (including overlapping buffer areas) beats out fully safe and controllable. In hard real time environments, fully thought out and documented trade-offs are involved.

?-)