Computation effort specification

Hi,

Posted a variant of this in c.realtime but the cavernous emptiness there is deafening! :>

I'm looking for "device-independent" units of measurement to specify computational requirements of tasks. I.e., to perform task X, you need Y "(standardized) operations". From this, I'd like to be able to make scheduling decisions. I.e., task X must be done by deadline T and requires Y operations to be performed in that time. As such, if I don't have the capability to perform those Y operations in the time allotted, it is foolish to consider scheduling task X (if X is HRT) *or*, the expected value of missed deadline will be V (if X is SRT).

Said another way, how would you tell "The Hardware Guy" how big/fast a processor you require for a given "job"? Or, convince your employer that the resources that have been allocated to your job are insufficient??

Now, repeat this exercise in finer-grained increments and at RUN-time, instead of development time!

Reply to
Don Y
Loading thread data ...

First: I don't think there's any precise way to do this, because it's going to depend heavily on a bunch of factors that vary, among them:

-- processor core

-- coding style

-- optimizer efficiency

-- peripheral speeds (if they're needed)

-- care and feeding required by peripherals

ROUGHLY, then, you may be able to reduce a "purely processor" task to Dhrystones (or whatever they're called), or perhaps Dhrystones + floating point ops. Operations that depend heavily on the peripheral are going to be harder to quantify.

Said your other way: I would go out and buy a demo board with a candidate processor on it, code up some test code (I would purposely make it somewhat inefficient, because code always ends up bigger and slower in the end product than when you start), and I would measure the execution time with an oscilloscope. Once you have a rough idea of how much slower or faster you are than needed you can take better aim with the next one.

Given the prices of most demo boards these days, there's a good chance that if your manager objects to the price of hardware you can say "we just spent more of the company's money talking about this than the demo board costs".

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com
Reply to
Tim Wescott

Of course! You can elaborate this list in other dimensions as well. Accounting for variations in data types, implementations, sizes, etc. And, it's rare that code relies solely on itself. Any services with which it interacts, "external events", etc. all factor into the mix.

But, there still is some basic amount of "work" that is being done. E.g., digging a 3ft x 3ft x 3ft hole is the same task regardless of whether you are using a spoon, shovel or backhoe.

Peripherals are all hidden behind services. So, the cost of using them is not a part of the equation. Dhrystone/Whetstone metrics give you a rough idea of "effort". Ditto VAX MIPS, etc. You can never get a fine-grained definition of the load unless you further qualify all the myriad details of the particular implementation (e.g., does using shorts over longs lead to increase/decreased efficiency? Cost of unaligned references? etc.)

If you have a static load/application running on a *particular* box, this assessment is relatively easy. OTOH, if you have a variety of loads/apps running on a variety of boxes, it becomes more complex.

E.g., should you run the weekly payroll program on the slow "accounting" computer -- even if it takes the better part of an evening to do all of the processing, there? Or, schedule it on the "engineering" mainframe which will do the job in an hour -- *but*, you may have to wait until 8PM to "get time" on the machine?

What if an engineer needs to suspend your job in order to address some higher priority need of his department (it *is*, their machine! :> )? Will your job end up getting done in time? What if you opt to play it safe and move the balance of the job back to *your* accounting machine -- will you be able to finish the portion of the remaining work in time?

Now: How do you make those decisions? How do you know how much work is required for your task? How do you quantify the amount that has been "done"? And how do you evaluate the *time* required to complete the balance on any of the "compute resources" at your disposal?

Reply to
Don Y

Well, I think you missed my point, or I missed yours.

If you want to play number games, then play number games and ignore anything I'm saying. If you want concrete answers to your questions, then you need to trim your questions until they're answerable in concrete ways.

I simply do not believe that your problem is separable in the way that you want it to be. To use your own metaphor, a cubical hole 3 feet on a side is always the same volume, but the effort is different from it being soft soil vs. granite, and if it's dry sand then the problem of making a cubical hole changes drastically.

Hence my suggestion for a concrete way to communicate to management and hardware whether a particular task can be accomplished with a particular processor.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com
Reply to
Tim Wescott

From my initial post: I'm looking for "device-independent" units of measurement to specify computational requirements of tasks. I.e., to perform task X, you need Y "(standardized) operations". Breaking that down further:

- "device independant" because the system consists of many nodes having different capabilities/resource allocations. A "given workload" (forget what that is, for the moment) might require T units of time to execute on node 1 with CPU-type C running at an effective "rate" R. That exact same workload, dispatched to node 2, instead, may take T/2 units of time with CPU-type C running at an effective "rate" 2*R. Or, 2*T when dispatched to node 3 with CPU-type D running at an effective "rate" R/2.

- "units of measurement" can be anything that is expressive enough to address a wide variety of computational tasks. Dhrystones reflect integer operations; Whetstones more akin to FLOPS; etc. But, it can be anything that is repeatable and applicable.

- "specify computational requirements" is an a posteriori effort. An algorithm is coded and then *quantified*. The analogy I made of the hardware guy and picking a processor was meant to illustrate how we do this all the time -- fit an application to a hardware implementation.

From this, I'd like to be able to make scheduling decisions. I.e., task X must be done by deadline T and requires Y operations to be performed in that time. As such, if I don't have the capability to perform those Y operations in the time allotted, it is foolish to consider scheduling task X (if X is HRT) *or*, the expected value of missed deadline will be V (if X is SRT).

- "scheduling decisions" indicate which of a suite of tasks are chosen to execute at a given time based on a set of numerical criteria. As above, each task has been "measured"... quantified. So, the scheduler knows how much "work" is required to do the job (e.g., rate X for time T or rate 2X for time T/2, etc.). There are also memory requirements but those are considerably easier to quantify.

- As this is a real-time system, there are timeliness constraints as well. A task may not have the luxury of taking 2*T time units to execute if "now + 2T > deadline" (assuming HRT wherein the task *must* complete before its deadline, else don't bother starting it!).

So, the scheduler and workload manager must look at the requirements of each task "made ready" to execute to determine where and when it is best to dispatch the task. Dispatching it on a node that has insufficient AVAILABLE memory (note that other tasks are executing on that node so the memory complement isn't 100% available for this new task) is foolhardy; it will eventually abend and need to be restarted (somewhere).

Dispatching it on a node with rate "X (available)" is acceptable if the (HRT) deadline can be met at that "rate". Or, if the projected lateness doesn't devalue the task unacceptably (SRT).

A hole in clay/granite wouldn't be the same WORKLOAD as a hole in sand/mud. Just like finding the first 5 primes exceeding 3 is considerably different than finding the first 5 primes exceeding 2430823409224485734 (or, any "arbitrary number")

As I said (above), each task is quantified. Just like the ISR in your , the UI code, etc. You *know* at release time, how much "effort" is required to service that ISR, keystroke, etc. How you currently express that effort in your existing designs is probably in units of *time* -- i.e., you assume the "rate" is it is for your current implementation. If your employer indicated he wanted to save a few pennies and move to a slower processor (or slower memory, etc.), you could give a shirtcuff estimate of the sorts of problems you would

*expect* to encounter.

If you were charged with porting an existing, measurable application to another hardware platform (vendor goes out of business, fab burns down, etc.), you would consciously and subconsciously be looking at prospective replacement processors in terms of their "capabilities" wrt your current "reference design": "This processor has a different instruction set than the previous one. And, operates on 2N bit words instead of N. And the clock is F/2 instead of F. But, all in all, it seems like it is *roughly* equivalent in terms of horsepower -- operations per unit time. So, I'm reasonably confident that the existing algorithms can be ported to this new architecture/platform and still achieve the same performance that the original product achieved (i.e., we won't find ISR's too long to get serviced in the time allotted; user actions too sluggish to be considered responsive; etc.)"

No (organic) "management" involved. Instead, it's data fed to algorithms that make these decisions dynamically at run-time. Deciding how best (if at all possible!) to match the existing computational demands on the hardware currently available (or that can be brought on-line). But, you need some sort of units of measurement for every "resource" that a task employs in order to make those decisions!

It's comparable to grid scheduling but at a finer grain and with an RT framework. (grids tend to just make "best efforts" with no timeliness guarantees; they also tend to have fixed resource sets -- no ability to bring additional resources on-line, on demand.)

Reply to
Don Y

That's a ridiculous analogy. The energy/effort inputs to those 3 tasks are very different despite the end results being the same.

Big O analysis 8-)

Seriously, there is no good way to estimate "work" ... in most situations there are too many variables - too many conditional paths to consider.

The only reliable way is to measure "interesting" metrics under realistic conditions (inputs, loads, etc.), and then use those measurements to improve scheduling later.

There are systems that collect execution statistics and use them for optimization: e.g., trace based OS schedulers that schedule by (estimated) # of instructions to be executed, "hotspot" JIT compilers that optimize functions (compile units) differently depending on frequency of execution, etc.

Checkpoints. All the world is a (sub)transaction.

You don't unless all your resources are identical, or unless (as above) you have existing execution metrics from each of the possibilities.

George

Reply to
George Neuner

Hi George,

Caught up, yet? BTW, I saw a news article that your *snow* has f> >

There are inefficiencies in *how* the fixed mass is moved but the mass is the same and the displacement is (or can be) the same.

A 32b multiply on a machine that has 32b registers and 32b ALU is more efficient than on an 8b machine with 8b registers. But, you still end up with an N-bit result.

Yet, we each "magically" decide that our "finished products"

*will* work... just because we've watched them work "many times" in the lab!

And, we base resource decisions for subsequent projects largely on past experiences: this "feels" like a job of complexity similar to product Q; product Q required A, B and C so that's a good starting point for our estimation of what *this* project will require.

Isn't that what I said: "specify computational requirements" is an a posteriori effort. An algorithm is coded and then *quantified*.

Consider how RT scheduling decisions are made. Aside from decisions based solely on time of deadline (e.g., EDF), how do you determine "Earliest Finish Time", "Least Slack Time", etc. A PRIORI?

Or, extent of a processor reservation to put in place before beginning execution of said task?

One of the advantages of my implementation is that I don't need an explicit checkpoint. Assuming a job hasn't crashed, I just pause it and *move* it, intact.

But, you still have no way of knowing if checkpoint X represents 1/3, 1/2, or 3/4 of the "work" involved. I.e., how much *more* effort remains to finish the job. I.e., how to schedule the *balance* of the job.

I've asked for units by which to tabulate those. E.g., most folks only know *their* code on *their* hardware platform in *their* mix of other jobs "seems to execute in the time allotted". I.e., rarely even putting numbers on how quickly events are recognized, processed, etc. (if it's not an ISR, the framework looks too fuzzy to instrument, so don't bother).

If you have 12% of a processor that is capable of performing "N units of work per second" available, how long do you expect a task that requires "M work-seconds" to take to execute?

I want to be able to give the workload scheduler criteria by which it can make *an* optimal decision for this particular job in this particular execution environment. Do I bring another node on-line to add compute resources to the system? If so, which one (given that they can have different resource complements and costs)? How do I shuffle existing jobs around (keeping in mind that there is a cost to move a job) to take advantage of the existing -- or *new* -- resources? Does it make sense to even *start* a job -- or allow some other job to continue consuming resources -- in light of a known deficiency (why *add* a competitor to the mix if it won't be able to meet its deadlines -- and, possibly compromise the ability of other tasks to continue to meet theirs?)

The difference between, e.g., a grid scheduler and my approach is I don't have a NoW that I can dedicate to individual "jobs" but, rather, have to shoehorn jobs onto nodes that are already executing other jobs. E.g., a node may have X% capacity available, now... and Y% capacity available at some later point in time. To the scheduler, one node appears to be a "different" node than the other (like running two identical workstations at different effective clock rates)

--

I'll have to followup via email on your project status.  Off to 
MD, presently (cripes, one thing about getting older is you 
replace *one* "medical professional's" name in your address 
book with *many*!) 

[I guess that's fine -- until you replace *those* with that of 
*one* mortician!  :> ]
Reply to
Don Y

Still digging out, but getting there. 3 weeks lost to jury duty is a

*long* time.

Had our first (and only so far) 90 degree day on Sunday. Promptly fell back to low 80s.

and you know how reliable those estimates are: 85% of all projects run long and go over budget. ~30% of all projects are abandoned before they are finished.

Because it's almost impossible to do the reverse. The cost - in energy, in time, in needed resources (ALUs, FPUs, etc.), etc. - depends on the target and the coding itself assuming the general case that there are multiple codings that produce the same result.

You *do* know that if the checkpoint encodes it. Checkpoints are not only restart points, they also are meta data about the state of the program.

- checkpoint A before loop - loop from 1 to 100,000,000 : checkpoint L(n) after loop iteration n - checkpoint B after loop

You know if you reach B, you've passed the huge loop. You know if you're at L(n), you're somewhere in the loop.

This can be applied even just for monitoring without the restart semantics. Think about a rally race: you know (approximately) where any car is on the course by the last checkpoint it passed.

You can't answer that without knowing how many "work seconds" the processor does at 100% utilization. Consider processor A is a 1st generation Pentium MMX and processor B is a Haswell i7 Xeon. Vastly different capability, virutally impossible to compare meaningfully because there are SO many differences.

You need execution metrics from all the _types_ (and _speeds_ if different) of processors. Then you look at the loads and decide if some processor in your "working" set can handle the new job.

Load isn't easily equatable to clock speed - and I would hesitate to think about it in that way. Unless you really *are* varying clock speed, I would just consider load as "start delay".

If your have a run scoreboard that indicates a currently loaded machine is due to free up shortly, and it can meet your new job deadline (if it really frees up), then certainly you can consider that. However, that would be a quite advanced scheduler ... and you still need real execution metrics to make it work.

George

Reply to
George Neuner

Better you than me! :> In fact, if I get served for a GJ, I'll recommend you in my place!! (no need to thank me!)

80? What's that? I think last time we saw 80 (for a high) was back in January. I think our nightly lows are probably close to that! :-/

Watch your mail for the balance. Trying to take advantage of the humidity to tweek some of the other biscotti Rx's -- so it will be a long night! But, first, have to run fetch the other liqueurs before the packy closes! Probably should grab another dozen eggs, too! :-/

Reply to
Don Y

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.