The number of systems manufactured is irrelevant; other aspects are more important, particularly time-to-market and item cost.
You use hardware (e.g. ASICs) if: - the latency between stimulus and response is shorter than can be guaranteed with software - the time interval within which a response is required is shorter than can be guaranteed by software (where "guaranteed by software" includes all the uncertainty introduced by hardware, e.g. cache misses and, for more complex processors, TLB misses.)
In most cases, typically the hardware is "armed" by software to do something when a hardware event occurs. The "arming" is still an HRT requirement, but it can occur within a larger time window.
For example in a cellular base station controller I implemented, frames had to be sent once every
*Someone* claimed excess capacity was OK -- even desirable! I pointed out that it is not (we don't all design nuclear power plant controllers that number in the scores with 10 digit price tags over a 50 year timespan -- talk about "high volume"... NOT!).
Does a PDA need RT capabilities? Can it just push a bit at a time out the Ir link to a printer to print a spreadsheet? No timing dependencies in those protocols? Does it have dedicated hardware that interfaces to the host PC via USB and just presents complete "file images" that it clocks into FLASH at its leisure? Does it scan the touchpad without regard for time and *hopefully* deduce that you drew a 5 and not an S?
OK, well, maybe PDAs don't qualify as "cheap".
I guess cheap cell phones might not, either! ?
What about a UPS? Do you think it just switches to battery power whenever it gets around to it? And, detects loss of power by monitoring some DC voltage level on a pin (vs. watching zero crossings -- "Hmmm... I wonder when I should expect that next zero crossing to come along)? Do you think it generates a 32Hz waveform at some times and 79Hz at other times? Do you think it reports to the host (USB or serial) a bit at a time (and hopes the host can sort out what it intends, in the absence of signal timing)?
What about a setback thermostat? Is the time that it displays something you *hope* is correct? "Yeah, I would like you to turn the temperature UP at 7AM so the house is warm when I climb out of bed. But, if you've missed a few deadlines and have, thus, lost time, then I guess 9AM would be OK, too!"
What about a mouse? Do you think the quadrature detectors
*hope* they see the right signals to determine the proper direction of motion? And, that the BT radio sorta-kinda decides when to hop to the next frequency whenever it feels like it? Or, the USB interface works whenever the mouse decides it wants to put stuff on the signal pair?
[Note, we're now in the sub $30 retail market. I.e., DM+DL well below $10]
What about an electronic *toaster*? Or, toaster oven?
Or, a DVD *drive* (not a "player")? Hell, it's user interface is a light and a button!
Of course, it's unlikely that they use "heady" techniques like RMA for scheduling. They probably treat everything as HRT and DON'T CARE if they *often* miss deadlines. But, that doesn't make them NON-RT designs!
If those ASICs are mask programmed MCU's, you're probably right! (kinda hard to imagine they would put an FPGA in a device like these -- since so many of them are end-user REPROGRAMMABLE -- and MCU's are *so* much cheaper and OTS!!)
OTOH, if they are genuine ASICs, then that would be a validation of my claim that excess resources are shunned -- do you think they design ASICs with UNUSED counters, gates, pins drivers, etc.? "Hey, lets put some extra silicon in here so we can be glad we've got some to spare!"
Of course, my opinion is a bit biased as I've taken apart lots of various types of devices -- just to see what's inside! :> Try it sometime! You;ll be amused at what you find (e.g., unencapsulated die, DIPs on single sided boards, trim pots (ick!) etc.)
- when the functionality is well defined and unlikely to require change (e.g., a dimmer in an electronic light switch) - the "processing" required exceeds the throughput of an affordable processor (e.g., digitizing raw video -- definitely HRT!)
Note that on "cheap consumer devices" (not my words) you often don't have cache, MMU, etc. I.e., you can actually "count cycles" (or have a development tool do it for you) and get a deterministic value.
On a cruder scale, a UART fills a similar role for a similar reason (you wanna twiddle an I/O *pin* to implement a 115Kbaud serial port? What about a pokey 9600 baud??
To illustrate the pressure on hardware costs...
I had a manager once try to twist my arm to pare down a group of 16b counters to *8* bit counters to save $2 on a $300 (DM+DL) device. Even though the processor would then spend 15% of real time (no hyphen) doing:
If buffering is allowed then as long as the throughput is higher than the frame rate, there's not a hard deadline for encoding a specific frame. Some frames or subframes can take longer than others.
If some subframe is very complex and encoding takes too long and it can't be mitigated by buffering, the encoder fails and there is an artifact in the video. Cheap consumer video stuff has encoding artifacts all the time, though I don't know if this is the reason. From the observation that consumers don't complain too much about occasional artifacts though, it sounds like encoding is an SRT problem.
It looks to me like (some) HRT processors go even further and have no pipelines in addition to no caches, making timing of each instruction completely deterministic.
I've fooled around with an HRT language (Atom) which has no "if" statement (something that selects between two blocks of code based on a condition, and executes just one of the blocks). It instead has a "mux" statement, which unconditionally executes both blocks, giving results r1 and r2. It then uses the condition to select one of the two results and discard the other. That way the statement takes the same amount of time regardless of the condition.
Video processing is quite often a SRT issue. In the simplest case, a missed deadline can be handled by repeating the previous frame once and no one will notice. Only after consecutive loss of deadlines this will become evident (freezed frames).
In MPEG, a missed (bidirectional) B-frame is no real issue. Missing I or P frames in decoding will cause artifacts during the GOP sequence (typically 0.5 s). On the encoding side, loosing a P-frame generation is no big deal, you can just accumulate the differences to the next P-frame (slightly more jerkiness). Failing to generate the I-frame will cause artifacts for the next GOP (less than 1 s).
I was trying to draw attention to the "unlikely to change" issue. I.e., even during development (as turning the crank on another iteration can be expensive -- in terms of dollars and calendar time). E.g., tens of kilobucks and a month or more. (not the sort of approach you want to take when the marketeering guys are likely to come in and say, "Why don't we make it *blink* as the intensity level is changing?" :> )
Yeah, I guess so. I was specifically thinking of a configurable (dot clock, frame geometry) video digitizer I worked with that digitized video at dot clocks of up to 200MHz (back in the 90's... unlikely you're going to do that "in software" even if you had flash converters that fast! Assuming you *could* implement the sampling PLL with enough precision "in software" :> )
One approach would be to ignore the effect of the cache and assume all references were misses. I.e., if your scheduling, etc. assume worst case times for each I/D fetch/store AND GUARANTEE TIMELINESS, then if they happen to occur a bit quicker, on average, you err on the "early" side? Use any "gains" to increase the likelihood of SRT deadlines being met on time (vs tardy) or even "early".
My day to scrounge around the discards. Always a tricky balance bringing home toys and risking the ire of SWMBO! Maybe I'll find an electric wheelchair that I can instrument!
No, you still have a hard deadline. Just that you can release the task for the next frame, early. (and, "throughput" is "throughput" regardless of how it may be skewed in time -- if the processor can process X pels per second and the video stream has X+1, sooner or later the processor will drop a pel... fail to "see" it, entirely)
(I wasn't talking about "encoding" video. Rather, *digitizing* it -- moving from the analog domain to the digital)
I have two solutions to that priblem: - don't have a SWMBO (daughter doesn't count :) - have a house that is *full*. Then you know if you get anything new you have to achieve the impossible: throw something else out. Great way of saving money :)
This could be a 1000 to 1 slowdown on a big x86, or maybe even more. Assume L1, L2, and L3 caches all miss on the different levels of page table pages as well as the address itself, plus the TLB misses.
Ah, but how could you determine what degree of pessimism should you apply? IIRC the i960 allowed you to execute code then lock down its cache so that it wouldn't change and would therefore be repeatable.
Even on an i486 with its minimal cache you could see 10:1 variations.
And if you have to be that pessimistic, why pay for all that extra power and cost for the cache and OOO hardware - which is, by definition, unnecessary!
No daughter(s) -- that I *know* about! -- so that's not a problem (though I imagine they *could* be!)
I've already been hard at work trying to "lighten the load". Too much stuff accumulated over the years and not enough time
*left* to make use of it all! :> But, finding "good homes" for everything (instead of The Dump) means it's a fairly complex problem to shed weight! :<
Ha! No. I'd like to be able to take advantage of the rest of the home automation/instrumentation to enhance "mobility" within the home/edifice. I.e., get the rider from point A to point B without requiring the rider to finely control the motion of the chair.
A year ago, I was offered a nice, small chair. But, it was way too fast! 6 MPH! Put that in a home and you'd have to replace all the walls before you got the control algorithms anywhere near workable! :< (would have made a great little outdoor vehicle, though!)
[Apparently, they have controllers in them that can be used to tweek the acceleration/velocity profile. But, I'm not sure how much actual control is possible. Perhaps just chopping the battery supplied to the motor and let the inductance of the windings shape the output? ]
If you can do this, then put key ISR's or oft-used parts of the RTOS in those cache-lines,
I design with very few HRT tasks -- and even fewer whose deadlines absolutely *can't* be missed. So, this gives you assurance that the HRT works (always) and all the acceleration brings the SRT load along for free/cheap.
[The key is avoiding HRT as much as possible and learning to optimize the performance of the SRT tasks -- so that they are always "as good as possible". In that case, you can put a "dial" on the system oscillator and dial the level of performance that you are willing to pay for (since the SRT tasks are those that have variable worth!)]
Disable cache and verify that the HRT requirements are met.
100 % for HRT. If there are some unused capacity in this case, then you can safely assume that some other non-RT or SRT tasks could be executed at "spare" time. This also gives a pessimistic guess, how much capacity is available. With cache enabled, this will decrease the HRT loading, leaving more time to non-RT and SRT processing.
Locking down areas in cache or virtual memory can easily have adverse effects, especially at low level caches, where fully associative mapping is not available. A locked line might be an alias with a frequently used application cache line, causing cache misses for any memory references to that area.
Frequent interrupts, OOO, long pipelines and huge cache hierarchy do not match very well. The interrupt causes some kind of (mini)context switch, requiring some of the processor state to be saved. In addition to flushing OOO and FIFOs, at least some registers need to be saved or at least an other register set must be activated. After exiting the ISR, the pipelines must be reloaded, possibly with additional cache misses.
For instance when handling large number of serial lines (8-32 lines/PCI card), there is not much point trying to run this with character level interrupts on big x86 processors. In practice, all the cards are scanned for input and output data in every system clock interrupt, which might occur every 1 ms or every 10 ms. This is not so bad for full-duplex traffic, such as TCP/IP over PPP, however, but the throughput drops drastically with any half-duplex protocols are used, especially with 10 ms poll rate.
IMO, trying to do low level time critical operations with a big general purpose processor is not very productive.
Any time when the there is some unused time after the HRT tasks have been handled, is a bonus for SRT.
One should remember that in most RT systems, there is well specified _constant_ amount of work to be done every second. If the system is "overspecified" so that the CPU duty cycle of only 50 %, if you then drop the CPU clock frequency to one half, the duty cycle is going to increase to 100 %, the energy consumption will remain the same as will the heat generated !
The only way you are going to save energy consumption and heat generation, is that it may be possible to reduce the operating voltage with a lower clock frequency. This can be quite significant, since the active state power consumption is proportional to the square of the operating voltage.
But without lowering the operating voltage simultaneously, just dropping the clock speed does not help much in a typical situation.
Have you ever said -- or heard anyone say -- "I did SINCE it was easier than doing "?
This is a common practice. Yet utterly and completely WRONG. (I'll leave it to you to search for the proper definition) Has the dictionary changed the formal definition of "since" to align itself with this POPULAR misusage? How often have *you* misused the word?
People CLAIM you can't use dynamic memory allocation in RT systems. And, design entire certification methodologies predicated on this fact. Which is obviously *false* (the fact that people who use dynamic memory allocation often don't do so properly is a different issue!).
Zilog claimed the Z80 had 256 8bit I/O ports. In fact it's (documented) I/O space was 64Kx8 or 256x16 or many other valid -- though possibly esoteric -- manipulations thereof. Common belief (I recall arguing this point decades ago) is that 256x8 is Gospel. Nor has Zilog fixed (after the fact) their documentation to make this "feature" more prominent. Does the fact that the documentation doesn't agree with the documented reality make the reality something else?
If you polled the population of programmers regarding the definition of "real-time", wanna bet THE MAJORITY would say something equivalent to "real fast"? I.e., would fail to mention the word "deadline"? So, those of us who use the term in that MINORITY SANCTIONED manner are "not really concerned with communicating"?
Note that the outline he presents decouples the relative worth of working towards a task's "goal" from the deadline for achieving that goal (i.e., saying some tasks are worth continued effort EVEN AFTER THEIR DEADLINE HAS PASSED -- SRT -- while others are foolish to pursue after that event)
[This seems a worthwhile way to model that aspect of all temporal tasks -- is it worth continuing work on my tax filing even after April 15th has passed? is it worth continuing to compute the trajectory for the incoming warhead AFTER it has exploded?]
Then, deciding what criteria you are going to use to determine optimum "timeliness" (that aspect that makes RT different from nonRT). INCLUDING THE REAL POSSIBILITY OF ACCEPTING MISSED DEADLINES -- WHETHER HARD OR SOFT!
[This also maps to reality. If you miss the deadline of intercepting the first incoming warhead, is your system still considered as meeting its functional capabilities? If you claim missing that one HARD deadline means you are broken, then do you have a SECONDARY specification that covers how you operate *while* you are broken: "The system MUST meet ALL it's hard deadlines" (Oh, and when it *doesn't* then it must do ...) (And when it can't do *that* it must...) (... How do you systematically and provably design something that meets this squishy non-specification?]
And, separately, determining what the *consequences* of each of these issues might be...
The methodology and taxonomy that he sets out can always be
*crippled* to include your subset of RT (or HRT). But, your taxonomy prevents other, LESS BRITTLE solutions from being created.
You've claimed he's an academic (not my opinion but you're entitled to yours). Perhaps you would care to engage him in a discussion of your beliefs? Assuming he's an academic, he may feel motivated and patient enough to "educate" you as to why your view is too narrowly defined (for you to turn your design methodology into a science instead of a superstition).
*I* have no desire to waste any more effort on the subject with you. I'm not an academic. In fact, I'm one of your *competitors*! As I see it, I have a noticeable edge in terms of technical capabilities with *this* approach -- not subjected to the same ARTIFICIAL constraints you choose to impose on such systems. (I can always redefine the *parameters* under my taxonomy to emulate yours!)
I'll leave you the last word. And, google and the USENET archives can record how you choose to respond (and others can think about the arguments presented here to judge what makes sense). Think carefully how you want to "go on the record" ;-)
Exactly. "Things can only get *better*" (this is actually a little lie -- but, can almost always be ignored, in practice.)
Or, treat the cache as a resource that you can deploy selectively to improve particular aspects of your implementation. This is akin to NOT using floating point in certain tasks to eliminate the (often asynchronously implemented) overhead of the added (possibly defered) extra context that needs to be saved/restored.
Or, *only* allowing floating point resources to be used in certain tasks so the FP context need *not* be saved/restored to allow that resource to be used in other tasks.
[Repeat for any other "expensive" resource access to which could significantly change the performance of the system -- easy to test/verify, etc.]
E.g., I am trying to develop a common hardware/software platform that I can apply to 21 (?) different "designs/products" (because I don't have the time or resources to develop 21 completely *different* designs/products!). Do I optimize each final design to make best use of the resources (time/space/etc.)? Or, do I optimize the
*core* portion of the design (RTOS, network stack, VM, VMM, etc.) so the optimization applies across "products" -- even if this means some product might be sub-optimal? (what happens when a different app is loaded on that product? do I then re-optimize??)
I'm greedy in how aggressively I "fit" a design to its hardware platform. But, I'm not obsessive -- take the big wins and don't sweat the little details.
Bringing it back to this issue, if hardwiring the cache lines to ISR's gives you some measurable increase in PREDICTABLE performance, it might not be the best you could achieve (given infinite time to tune) but at least it's *an* improvement and you can conceptually evaluate how further changes to the system (ISRs and else) are likely to be "received" -- without undertaking that "infinite tuning" again.
It can also change the order that tasks get scheduled (because the completion times of some tasks are altered more than others). Or, interact with other resources in non-obvious ways.
Some processors have IRQs designed for "streamlined" ISRs. E.g., the SA's FIRQ has an incredibly low overhead (but also means you can't do quite as much without ADDING overhead)
Older processors tended to make the user more aware of the cost of the context switch and offer hacks to allow more easy exploitation (e.g., the Zx80's EXX and EX AF; PUL/PSH on the 6809, etc.). But, then again, they only risked the cost of a short pipeline and the portion of the state preserved.
---^^^^^^^^^^^^^ agreed. Don't treat it as HRT since it almost always *isn't* (folks just want to treat it that way because it makes it easier to think about the consequences! :> )
And, if your goal (i.e., mine) is to map damn near everything into the SRT domain, then you have much more flexibility in how you "solve" the problem (application). There's almost always some way to handle a missed deadline -- it just usually takes more
*thinking* about the solution!
E.g., my "network speakers" are largely HRT. If the next audio packet isn't here before I need it, there will be an audible artifact (dropout, click, etc.). I can't force the server to give it to me when I need it (though it has been designed with that explicit goal in mind!). Nor can I prevent "something" from interfering with the network transmission (noise from a nearby flourescent light's starter coupling to the network cable and corrupting a packet on its way to *this* device -- though possibly not others!).
So, to say the system is broke because it misses a hard deadline (hard: not worth pursuing once it has past) is silly. Chances are it will ALWAYS be broke (because you can't control the entire environment!).
A naive implementation would deal with this by putting a large buffer on the device so there was a longer time interval in which a packet could be "retried", etc. (bigger buffer == bigger cost) Ah, but now the server has to be "ahead" of the speaker (client) in terms of "REAL (chronological) time". It has to deliver audio packets long before the speaker needs to reproduce them! I.e., greatly increased latency (imagine speaker is reproducing audio that accompanies a video presentation -- now we have to artificially delay the video to ensure the audio will be in sync with it!). (i.e., MORE cost -- but, at least it's not in the "network speaker", eh? :> )
And, you're *still* at the mercy of the proverbial shit hitting the fan: your overly large buffer not being enough to overcome a prolonged anomaly in the system! (what happens if the server is momentarily overloaded? Or, do you over-specify the server's hardware so this "can't happen"?? See where this ends up going?) I.e., for all that extra cost, you're still BRITTLE!
My initial implementation had the client request a packet that didn't appear in a timely fashion (the server just pushes packets to clients for each "subscription", normally; a client needn't sit there constantly requesting packets -- wasted bandwidth and processing in the clients AND the server!).
But, that meant I had to move up the "rerequest" deadline so there would be enough time to get the reply to this rerequest before it was actually needed. And, meant the server had to deal with all this extra *incoming* traffic -- which would further hinder its ability to handle its primary *outgoing* role! (imagine a dozen clients all clamoring for dropped packets... and, that effort causing other packets to miss their *local* deadlines, etc.)
Second iteration the server designated a "backup" client for each client. I.e., some other client that was getting the same feed (or, that it could command to accept the feed). If a client failed to receive a packet, it would contact it's backup (buddy) -- in the hope that the backup client had the packet. This kept those requests from flooding the *one* server that was trying to deal with all these clients.
[I've since fine-tuned this protocol so there is even less overhead -- since overhead effectively moves a deadline closer (or entices you to increase a buffer's depth)]
Point of (long) example is you think of ways to react to your *expectation* that some deadlines will be missed.
[There is also deterministic handling of the case where "too many" deadlines are missed: you don't want to shut off the speaker because it missed *a* deadline (brittle). Nor do you want it "stuttering" as it meets some, then misses some, then meets some, then...]
When the world is SRT, you afford yourself these extra possibilities to improve performance WITHOUT adding resources (buffer memory, latency, etc.). *But*, you then have to assume the responsibility for dealing with these situations -- instead of just saying "system is broken".
Note that with SRT designs, you implicitly acknowledge performance can *vary*. I.e., a system can become more heavily utilitized in the short term and shed some capability, accuracy, etc. -- yet regain it "later" when the cause for the "heavier load" has disappeared. Because you can reevaluate your ability to meet a missed deadline instead of UNCONDITIONALLY dismissing or enforcing it!
E.g., I've designed systems that could be pushed beyond 100% utilization, inherently shed responsibilities that they
*couldn't* meet (this isn't as important as that), then resumed them once the short term load returned to normal. All the while, continuing to meet their stated design requirements (even as performance appeared to suffer in the short term).
For example, you'd much prefer your ABS brakes to work (HRT) than your ingition firing (also HRT) to continue "optimally"! Yet, once you pulled out of the skid, you'd like to regain the same fuel economy that you had prior to entering it! (and, do so without having to add excess capacity for these infrequent sorts of events)
[You want to BEND instead of BREAK -- flexible, not brittle]
Sorry, I wasn't meaning that you did this, in fact. (Though you can also idle a processor that isn't needed for anything "now")
Rather, I was illustrating that performance now becomes something you can tweek to fit your resources. E.g., instead of an X MHz processor that costs $Y, you can spec one that is only capable of operating at Z MHz for a cost of $W.
E.g., in my immediate case, I have several different "products/designs" that I would REALLY like to share a common hardware and software base. There is *big* value in this! At the same time, I don't want the needs of the "most demanding" device to determine the *cost* of the LEAST demanding!
So, I'd like to be able to spec different grade parts (same family or base part number) for the "same" (hardware) design as befitting the needs of the device that will actually "infect" that board. Having to make provisions for external memory, for example, means the core design has to be compromised to allow for that extra real estate, power consumption, pin utilization, etc.
[If I was a *business* approaching this, I would care much less about these issues. But, if I want others to be able to reproduce my efforts *economically*, the more I can do to make things "the same", the better the result for them!]