OO example

D

D Yuniskis 16 years ago

Hi,

OK, to put the "C++" vs. "C" discussion in more realistic terms... here's a concrete example of a recent application that I *thought* would be perfect for a C++ (i.e., a more formal OO model) implementation:

The goal was a Rubin-esque (tee hee hee... play on words!) gesture recognizer. As "gesture" is a highly overloaded term (trust me :> ), I'll explain for folks who aren't familiar with Rubin's work.

The recognizer attempts to match 2-dimensional "drawings" to prestored "templates". Folks who have used PDAs with things like Jot and Grafitti will understand. I.e., if the user draws something that "looks like a '2'" (based on templates), the recognizer will produce a '2' in response. Likewise, if the user draws something that looks like an 'L', the recognizer will produce a 'L'.

Rubin's approach was to identify "features" in the gesture. Then, to weight the value of each of these features given the "glossary" of valid gestures in such a way that you could *score* an arbitrary input "drawing" to produce an indication of what "gesture" it most closely matches (if any).

This is called an on-line recognizer in that it needs to "watch" the actual drawing being made. Or, more specifically, it needs to see a time-ordered set of (x,y) pairs from which it can deduce the appropriate "features".

[contrast this with an offline recognizer which just looks at the resulting 2 dimensional *image* -- like the post office trying to recognize the ZIP code on your mail without watching you *draw* each of the digits]

The "features" that are measured seem to be somewhat ad-hoc. For example, you can record the total amount of "ink" in the gesture (i.e., the total distance traveled by the "pen"). Or, the aspect ratio of the overall image (width / height). Or, the "initial trajectory" (the absolute angle of the first "stroke" in the gesture). Or, the "curviness" of the gesture (the sum of the absolute value of each angular displacement between the "microstrokes" in the gesture).

There is no "right" set of features. The features chosen depend on the character set you are trying to support, the resources you have available to process (in real time), and the robustness you desire in the algorithm.

This means that it is typically foolish to come up with *one* set of features and expect them to apply universally to any set of gestures (characters).

[apologies for playing loose with terms, here]

So, this suggests an implementation that can *build* a recognizer at *run* time can be a big win. If, for example, the application is only expecting a small subset of gestures at some particular point in its execution (e.g., perhaps it is expecting the user to enter a three digit *number* so "letters" would be meaningless!), then it could rebuild the recognizer on-the-fly to support *just* those gestures and the features that best "recognize" them.

With this in mind, you can create a base class called "Feature" which all derived features inherit from. It would include methods to instantiate itself, update its current "value" (based on another "Point" being provided to it from the "gesture input device"), report its "value" (in *some* sort of units suitable for weighing in the overall recognition algorithm), etc.

A "Recognizer" would then consist of some collection of Features and a wrapper method that passes each "Point" from the input device to each of them for evaluation.

A list of "Gestures" would be given to the Recognizer to represent the class of valid templates to be examined. Eventually, Recognizer.resolve() would be invoked (implicitly or explicitly) to cast its judgment -- by conferring with its constituent Features -- as to the identity of the Gesture presented to it.

Sounds simple. *And* elegant! (lots of details omitted)

The problem is, there is just *way* too much overhead in this sort of approach.

For example, almost every feature would like to see deltaX and deltaY -- perhaps in addition to X and Y -- for each successive Point. The distance Feature would use them to compute Euclidean distance -- sqrt(deltaX^2 + deltaY^2). The curviness Feature would use them -- along with *remembered* copies of the previous deltas (stored, perhaps, as a "Vector") to compute the angle between these two "Vectors" which can then be accumulated in the "Curviness accumulator". etc.

So, instead of passing Point's to each Feature.evaluate() method, you might want to pass the Point and the Vector (from the previous Point).

But, some Features may rely on the values of other Features for their computation. How much do you pass to each evaluate() method? How much does this bastardize the encapsulation that you want to exploit in the language?

The *clean* way of doing this is to have each Feature compute everything it needs *internally*. No dependencies. No ordering of operations/evaluation, etc. If every Feature needs the deltaX,deltaY Vector (from previous Point), then every Feature computes it, etc.

But, that gets incredibly expensive. If you are running the loop at ~500Hz and doing all this redundant math and extra calling (method invocation) overhead, you quickly run out of CPU.

So, you start cheating. E.g., you don't compute true Euclidean distance but, rather, you compute distance^2 (i.e. to save the sqrt() invocation). Or, you fall back to fixed point math to avoid the cost of float/double (keep in mind, many of the deltas are very small, here. You may have a dX of 1 and a dY of 12. Or, a dX of 2 and a dY of 3.)

In each case, you either add complexity to the algorithm to compensate for the overhead *or* you cause the algorithm to deviate from "ideal" (how does computing the sum of distance^2 compare, realistically, to summing distance itself?)

A non-OO approach wins (performance) because it doesn't try to put all this structure in place. But, it is just as fragile for *other* reasons.

The C++ approach is much more elegant (given infinite resources) as you can freely overload operators (e.g., if I opt to use a

10.6 fixed point representation for Points, I can overload my arithmetic operators so that they recognize this format and ensure the resulting "values" are renormalized to that 10.6 format -- or whatever!

And, I can overload the value() member so that its results can be linearly weighted in some reasonable manner to come up with a "score" for the gesture. The code *reads* much cleaner (e.g., you can say A + B instead of add(A,B) everywhere). And, it is easier to see what is going on because all the machinery is buried exposing the algorithm itself.

I suspect it would be much easier for someone to maintain code in that form than to have to worry about the little details that accompany the "C"-style implementation.

Unfortunately, real world constraints make the elegant approach unrealistic.

Vote

B

Boo 16 years ago

This has absolutely *nothing* to do with C++ and OO design.

As a counterexample, I once worked on a bar code reader where there were individual software modules that decoded each type of barcode recognised. Ie there was a module to regognise Code 2of5 barcodes, a seperate one (which I wrote) to recognise Interleaved 2of5 barcodes, yet other modules for Code39, EANx etc etc.

In all at the time I worked on the software there were around 9 or 10 or so different barcode supported, each of which had it's own decoding module.

The software scheme was exactly as you describe :

Each decode module was given the input stream (a list of the times between light and dark transitions on the input sensor) in real time as the data became available. Each module attempted a decode as far as the data would currently allow and as the individual modules decided the barcode was not for them, they removed themselves from the list as for your gesture recognition modules.

This system worked perfectly, in real time, for hand-swiped barcodes with light-dark transition times down to around 50 microseconds or so.

Would it have scaled to a system with several hundred or several thousand different barcode schemes ? Not a chance ! The architecture you describe is not suitable, and for the reasons you state : it is very inefficient for each module to duplicate the work of the others. You pay a price for modularity when implemented in this fashion.

But what does that have to do with 500 MHz processors, OOD and C++ ?

Jack. The system I described was designed and written back in the 1980s in 8051 assembler language on a machine running at around a half a million instructions per second. At that point in time OOD was but a glint in researcher's eyes and certainly had not made it into any 8051 assemblers which I am aware of.

Your preposterously long, meandering, tedious and ultimately pointless post amounts to a statement that there are modular designs which can only be used where efficiency is not an issue. Duh.

Boo

Vote

D

D Yuniskis 16 years ago

I'm sorry you think so. I, however, think it is a *perfect* example of an *ideal* OO approach to a problem that could benefit

*greatly* from OO techniques. The question is whether it can exploit the formal structure that many OO languages impose (vs. just an ad hoc OO design)

"Features" are ideal objects to encapsulate. It allows each feature to decide what is important to itself and how best (including "how most efficiently") to compute and represent the "value" of that feature in that instance. Any "state" that a Feature requires is preserved *within* the Feature object itself (e.g., Initial Trajectory needs to stop looking at data after the first N Points have been examined -- hence the term INITIAL trajectory).

"Gestures" (templates/criteria for categorizing Feature sets corresponding to a particular "drawing") similarly allow the details of a particular "drawing" to be bundled together in a form that hides them from other "drawing definitions". Note that a Gesture might be an active *or* a passive object, depending on the overall design of the Recognizer. In addition to formal definitions (and REdefinitions) for themselves (templates), Gestures may need/want to track their frequency of use, other Gestures with which they are often ambiguated, etc.

"Sets of Gestures" can be dynamic in nature -- treating them as collections which can be built on-the-fly by the application. E.g., if, at a particular stage in the application's execution, it can only legally recognize gestures for the digits 1 through 3, "ENTER", "CANCEL" and "?HELP?", then only these Gestures need be included in the "current gesture set".

From that set of gestures, the Gestures themselves can indicate which Features are pertinent to their recognition. From this, a Feature Set can be created -- dynamically.

A Recognizer can then examine the Feature set and determine how to order the computations within it so that all dependencies are satisfied.

I, for example, have a glossary of over 100 Gestures to call on. And, any subset of these may be active at any given time. The only realistic way of handling this is to treat each gesture as a distinct object and its requirements, equally so. And, to create a Recognizer dynamically at run time to handle that particular subset of gestures.

E.g., the only other alternative is to assume *all* gestures are valid, process *all* of the features that might be required for *any* of the gestures and then decide which results you can throw away. This isn't very smart engineering. It also won't work -- if '0' and '1' are the only "valid" gestures at a given point in the application, the Recognizer may happen to decide that the "circle" that the user has drawn is an 'O' (oh) instead of a '0' (zero). If you later rule out the 'O' (oh) as being "unaccepted in this context", then the Recognizer appears not to have recognized *anything*!

Yes, you can rank all of the gestures: maybe 'O' (oh) is at 0.95, and 'circle' is at 0.80 and '0' (zero) is at 0.75 [N.B. the differences between them might be attributable to Aspect Ratio of the overall Gesture -- 'circle's being rounder than zeros and oh's], but, then you have added more complexity to the algorithm as you now have to rank all of the gestures, discount those that are "not acceptable" and *then* make your decision. [You can, of course, discount unacceptable gestures before ranking and processing... but, you'll still be doing ALL of the Feature processing that all of the potential gestures might require]

I implemented this in C instead of C++ for the reasons outlined in my "pointlessly complex and absurdly wordy examply [sic]" but, apparently, *you* missed the point. My implementation

*is* OO -- since doing it any other way (see above) would be clumsy and inefficient.

Instead of ranting about it, you might sit down and actually *try* to structure an algorithm to do this in a non-OO approach. Then, think about how unwieldy that approach will be to maintain AT RUN TIME. And, remember, smart algorithms *learn*. The time for dumb algorithms was decades ago! (the *device* should be trained, not the USER!)

So, you have to be able to account for the Recognizer adapting itself to the "gesture style" of the user and modifying itself so that it learns which Features are more/less important for each individual Gesture on a per-user basis.

And, do it without devoting most of the processor to this banal task! How much of your processor would you devote to debouncing keys on a keyboard?? :-/

However, the costs of the C++ implementation (specifically, those costs associated with running the Recognizer on each incoming data Point) made it too expensive. The equivalent algorithm in C meets the runtime goals.

But, there is no free lunch. The tradeoff, in this case, was vastly more complex "setup" requirements to build a recognizer. You can't just take advantage of the mechanisms that the

*language* (e.g., C++) makes available for the encapsulation of objects to piece together feature sets, gesture sets, recognizers, etc. Instead, you have to do this manually. You have to build the appropriate scaffolding, hang the desired code fragments and data structures on it and then wrap it in something executable.

Furthermore, you can't easily debug it because you've just pasted together a bunch of "bits" (literally? :> ). You can't rely on the same level of support from the debugger to show you what your code is trying to do at any given stage because the "source" doesn't really exist for this "module".

As a throwback to the other thread that inspired this one, it goes to the nature of how well a (future) developer can support the codebase in each (C vs C++) type of implementation (spaghetti code is not an option). As in that other thread, I contend that both have huge pitfalls and a "clear winner" is a tough engineering decision to make.

I *think* it does. I'm sorry that you can't see that distinction. Perhaps you can think of a non-OO way of approaching the problem?

Not really the same sort of design. You're just running one set of "data points" through N *procedural* implementations and waiting for one of them to "hit". I'm trying to run a set of data points through a *varying* set of "Feature calculators" which, in turn, are used to drive another set of "Gesture calculators" to recognize a particular "drawing". And, the context in which this occurs varies -- making the results

*now* independent from the results a few seconds from now (when the application has advanced to another stage with different input constraints).

Gestures never remove themselves from consideration -- until the "drawing" is complete and the Recognizer decides who the winner is. [N.B. There is a second level recognizer that further refines this decision -- in a much more computationally expensive way -- but that is intended just to improve accuracy/robustness]

Presumably, your barcodes wouldn't, either. The barcode readers I've designed only discard bars (white or black) that they think can't apply to "them". I.e., they discard the oldest portion of the list of bar widths (times, in your example) that they know can't be included in a valid "label" that would satisfy their format requirements. The more recent bars remain active candidates until: all of them are ruled inappropriate (at which time, the particular barcode decoder remains active but has a "valid input queue" containing 0 bars); or, some decoder "claims" the "label" as successfully decoded (which essentially flushes all of the individual queues for each decoder. Note that this is really just resetting a pointer for each "Decoder" as there is only *one* queue that *all* decoders operate on).

The difference is, when decoding barcodes, you don't dynamically reconfigure the decoder to handle different codes out of a list of scores.

For example, there is nothing to be gained (in most decoders) by saying "only recognize symbols 'startA', odd digits and '2' in ABC Codabar; lowercase alphas, up to and including 'p', in Code128; and the digits '1', '3' and '8' in 3-of-9". Because, each *code* has the same processing requirements (i.e., get transition times, compute bar widths, normalize for processing latency, determine module size from bar widths, decompose bar sequence into modules, match modules against predefined module sequences, recognize symbol, advance to next symbol, verify framing symbols (if req'd), compute and verify checksum (if req'd) then signal success)

Even between different codes, much of the processing can be shared among different Decoders. E.g., all of the preprocessing of bar widths, computing equivalent module size, etc. (ABC Codabar doesn't fit this mold as nicely as the other codes. Nor would PostNet or PDF417 -- for obvious reasons! :> )

OTOH, the types of calculations that must be performed to recognize Gestures (i.e., the Features that are important for particular Gestures) varies with the gestures that you are interested in. For example, if you are trying to differentiate '1' from '0', then Curviness and Total Ink might be good candidates -- '1' has much less (normalized) total ink than '0' and '0' has a much higher Curviness than '1'! OTOH, the Bounding Box Feature might be meaningless (a '1' drawn at a slant can have the same overall Bounding Box as a '0').

Since the goal of the Recognizer is to be efficient *and* reliable, it doesn't want to waste time making computations that won't materially contribute to accuracy. (e.g., Curviness is expensive! If your current set of valid gestures is 'horizontal stroke', 'vertical stroke' and 'diagonal stroke', then Curviness doesn't buy you anything but it *costs* a lot to compute!)

This is rather easy to decide at run-time; you look at the coefficients for the various Features and how they apply to the Gestures that are "currently valid" and, from that, determine which have *any* impact on the outcome. And, since this can be done when the recognizer is dynamically configured/built -- i.e., not during "recognition" -- it moves the cost of making those decisions out of the time critical region!

I don't believe there are similar criterion that can be applied to decoding barcodes (i.e., things that can be done to reconfigure the entire *process* outside of the actual decoding of data). Well, maybe if you want to include 2D codes in your list of potential codes and then *discount* them in a particular "reconfiguration" then you could change the overall structure of the decoding process.

So, while you might have cause (WITHIN an application?) to discount certain codes, the processing that you do will remain unchanged. Being able to add or subtract codes from your list of valid codes AT THIS INSTANT isn't the same sort of optimization that can be available to gesture decoding. Being able to add/subtract individual *characters* from within a particular code borders on trivial!

By contrast, an OO *structure* to a gesture recognizer can yield

*huge* wins at runtime. Consider that a 300ms latency is almost useless in user interface design. And, for gestures -- which can be issued (or, at least *initiated*) in a small fraction of that time -- low latencies can make a huge difference between what the user considers usable vs. annoying.

For example, if the user starts to reissue a gesture because you were too slow to acknowledge his previous gesture (think about it... if you fail to recognize a gesture, you typically provide

*no* feedback so the user assumes that a "lack of response" means "try again, I didn't recognize that") then you could potentially end up having to ACT ON *two* gestures from the user which is probably NOT what he wanted. As a result, the user grumbles and issues some sort of "undo". And, here it is "undo YOUR mistake, not *mine*!"

In most barcode applications (that I have been involved with), the "usage process" tends to minimize this possibility -- either because there are specific places in the process where barcodes are recognized (e.g., read barcode, perform action, read next barcode, etc.) *or* because the time required to initiate another barcode swipe is much longer than the few hundred ms we're talking about here. (I think even a keyboard user has a hard time deciding to restrike a key in that sort of time interval)

Most commercial barcode readers deliberately impose a delay to prevent a barcode from being rescanned until it has completely left the field of view and reentered it (vibrating mirrors). Wands require lots of effort to swipe a second label (or the

*same* label) in short proximity to the preceding one (I've never been able to "overflow" a barcode reader that I designed by rapidly *shaking* the label in front of the reading element. Your arms just get too tired!)

Yeah, I think *most* of us have done barcode decoders. I recall 0.007" minimum bar widths (before accounting for ink bleed) at 100 ips to be a big yawn back in the 70's. Nowadays, that's almost laughable (even for small processors)

No, the *architecture* is PERFECTLY suitable! Using C++ is the problem (because of the method invocation overheads and the encapsulation that it promotes).

But that's the point! OO doesn't *mandate* that you "implement it in this fashion". It is only C++ (in this example) that makes this requirement.

(Think about it. It should be fairly obvious how you can keep an OO design and abandon C++'s "structural requirements". If you can't make that intuitive leap, see the description at the end)

Who said anything about 500MHz processors? (And I think I have already addressed the OO and C++ issues)

(sigh) Apparently you failed to see the issue. Or, perhaps you are stuck thinking *inside* the box :-(

To *try* to extend the analogy to a reconfigurable barcode reader, imagine supporting PDF417, PostNet, Code128, UPC, etc. in a single "barcode reader" design (this is commonplace, today).

*And*, imagine your application dynamically changes which codes will be recognizable at any given point in time. *And*, imagine the cost of decoding a code -- of arbitrary length (e.g., Codabar with continuations) so you can't just statically allocate a giant buffer to hold *everything* while you leisurely process it) -- is substantial (in terms of time and resources). *And* that decoding barcodes is just a tiny aspect of your product (like debouncing keystrokes from a keyboard) -- unless you are building a standalone reader and have *all* the system's resources available to you (I never have had that luxury) *And*, that the user expects a valid decode to be the equivalent of "pushing a button on a keyboard" (i.e., your *keyboard* is a set of barcode labels, nothing more! EVERYTHING is done by scanning barcodes -- including typing in your name, password, etc.).

My current barcode reader doesn't handle PDF417 or PostNet (nor any other 2D code). So, it doesn't have to address those completely "foreign" processing concepts :-/ And, I've found that in all but *tiny* environments, it just isn't worth the trouble to reconfigure the reader to allow/disallow certain codes (easier to tag the results and discard results that you don't like -- but, that's because supporting more than one code usually increases the cost enough that supporting N codes isn't considerably more expensive)

Now, it makes sense to treat the individual Decoders as objects. Each decoder has attributes that the "Reader" (for want of a better name) can examine to determine how to configure itself. So, if you are only using linear codes (as in my case), it restricts itself to processing simple bars and spaces.

One *shared* portion of the Reader accepts transition times from the front end, normalizes for processing latencies and computes bar/space widths (I may fail to strictly adhere to this bar/space terminology and, instead, refer to them as simply "bars" :< ).

It queries the "currently supported Decoders" (at configuration time, not scan time!) to determine the range of possible barcode "widths" (number of bars) accepted by each of them along with the number of "modules" that they expect in a character (symbol).

Then, it processes the raw bar widths in varying size groups of bars (depending on the characteristics of the codes supported... typically 7 - 9 intervals) and, furthermore, divides the total width of those N(i) bars into the appropriate sized module.

Note that it is more efficient to do most of these computations "at the same time" as the algorithm need not re-fetch data that it already has "in its hand". E.g., I can compute the total width of bars 3 thru 9 (assuming a 7 bar symbol width) by taking the computed width of bars 1 thru 7, adding bars 8 and 9 and subtracting bars 1 and 2 (even bars are spaces, by convention, so, you drop a bar and its following space to advance the algorithm to the "next" possible start of a label) instead of adding bars 3 thru 9 directly.

It then makes these module sizes (for each particular "number of bars in the codes supported") available for the Decoders to access.

Each of the Decoders looks at the computed module size for the next set of potential bars/spaces (e.g., 1 thru 7, 3 thru 9, etc. for a 7 bar code) and applies it to the actual bar widths to "digitize" the bars. Then, looks them up and decides if a valid symbol has been encountered. And, if a valid symbol can be added to the list of symbols that *this* decoder has already processed, does the resulting "candidate label" comply with the rest of the rules for a label in that Code?

If not, the Decoder abandons some number of bars/spaces at the head of the raw bar width queue (typically, one bar and the following space) and looks at the module size computed for the next group of *7* bars (as made available by the preprocessing done in the Reader's data acquisition and normalization).

When a Decoder can make no further progress, it stalls. At that time, it may have marked some number of "raw bar widths" as expendable.

Other Decoders run and do the same sort of thing. A Decoder that expects *9* bars in a symbol looks at groups of 9 bars/spaces at a time (out of the same "raw bar width" queue) and the computed "module size" for that "group of 9" (which will probably be different than the module size computed for the "groups of 7" bars that are included therein.

Note that each Decoder could conceivably run in its own thread. This simplifies writing the code but wastes resources that need not be spent on something that can easily be stalled at well defined points in its execution!

When all Decoders have taken their turn analyzing the data, the Reader looks to see if any of the "raw bar width" data is disposable (by checking to see if its reference count has dropped to 0) and, if so, discards it and reuses that space in the queue (circular FIFO) for new bar widths arriving from the front end. Note that the feeding of raw bar widths into the FIFO need not be coordinated with the examination of existing data *in* the FIFO. In my case, I have a separate producer that feeds the FIFO just by grabbing a mutex controlling the FIFO. Everything is fine unless the consumer takes too long (in which case, the producer inserts an "insanely wide bar" in the data stream so that the Decoders all "abort" when they encounter it)

Or, until a Label has been successfully decoded (at which time, the buffer can be flushed UP TO AND INCLUDING THE LAST BAR/SPACE IN THE RECOGNIZED LABEL). And, the Reader restarted.

*This* is an OO design! If you look at the *details* of the implementation (as outlined here -- much elided), each Code is an object. Its attributes include number of bars, list of valid symbols, number of modules, etc.) Each Decoder processes a particular Code.

Unlike *your* example where you feed raw bar widths to each decoder and hope for the best, this one interleaves the *methods* of each Decoder and caches results so that the next Decoder's methods need not recompute values that have been previously computed by some *other* Decoder. (see following)

Beginning *after* the raw bar widths have been computed and placed in this (circular) FIFO, *your* approach (presumably) tries to present the entire FIFO to each Decoder "subroutine". Or, perhaps, presents one bar at a time to each. Same difference. Each of your decoders looks at N consecutive bars (N being defined by the characteristics of that decoder), computes the overall length, divides by the number of modules, (assuming it has N complete bars to work with) and then tries to "digitize" the actual bar widths. From there, it tries to promote each digitized group of bars to a valid symbol. And, a sequence of valid symbols to a valid *label*. Depending on how your application is structured, it stores its current "decoding state" so that it can resume where it left off when additional bars become available.

*My* scheme doesn't treat the Decoder as a monolithic "subroutine". Rather, the Reader invokes the "NumberOfBars()" member of each Decoder AT CONFIGURATION TIME (i.e., when you aren't "pressed for time!"). It then queries each Decoder's "NumberOfModules()" member. From this, it builds a list of unique tuples: (numberofbars, numberoftuples). I.e., no duplication.

Then, the *Reader* (*not* the Decoder "subroutines") goes through the queue of raw bar widths and totals 'numberofbars' consecutive bars and divides that total by 'numberofmodules'. For each needed combination of bars and modules as defined by the tuples. This computed "modulesize" is then associated with each corresponding (tuple, setofcontiguousbars).

[My Reader also computes other parameters that contribute to the reliability of the decode; I have a very high first pass read rate *and* a very low error rate! e.g., the effective scan rate for each symbol in the label since dramatic changes in scan rate from one symbol to the next can suggest that the conclusions are false -- people can't change the speed that they move their arms *that* quickly*; and, a vibrating mirror scanner can't do that at *all*! :> ]

This eliminates the overhead of having each Decoder process the same set of bars over and over again. (e.g., Decoder1 sums bars 1 through 7, computes module size. Decoder2 sums bars 1 through 7, computes module size, etc.) It also lets the Reader exploit computational efficiencies in computing the *next* module size for the next group of bars *or* for the next *larger* group of bars (e.g., computing the module size for bars 1 - 7 is a subset of the problem of computing the module size for bars

1 - 9 as this latter set *includes* the previous set).

If you disallow reconfiguration at runtime (as I mentioned earlier) lots of complexity falls out of this! E.g., you

*know*, a priori, all of the (numberofbars, numberofmodules) tuples and can hard code them along with the data structures for storing the intermediate results.

But, the design is *still* OO. I just chose a different set of methods and a different order of applying them (interleaved instead of sequentially) to gain efficiencies at runtime.

The Gesture Recognizer is the same sort of problem. *Except* the calculations performed for each Feature are very different from other Features. (barcode Decoders are all, basically, the same fundamental algorithm -- although I have to handle ABC Codabar as a special case) *And*, the most significant aspect is that you *want* to be able to reconfigure the Recognizer at runtime because:

- the valid set of gestures changes from second to second

- the costs of restricting your processing to only those "valid" gestures decreases geometrically.

If you look at the description of my barcode reader, you

*should* be able to make some educated guesses about how a gesture recognizer can be done in C. But, it becomes terribly oppressive when you are trying to debug and maintain it -- the code gets "smarter than you" real quick! :-( ("Why the heck is it looking at *that* piece of data?? Ohhhh... *now* I see!")

Vote

F

Frank Buss 16 years ago

This depends on your programming skills :-) If you need to calculate deltaX/deltaY multiple times, of course you would use a Size-object and calculate it only once (bad example, maybe not for deltas, but for more complex things, like vector angles). This could be attached to the time sequence objects. Maybe would need only a small amount of additional memory than a compact, but messy, C implementation and not much slower.

Frank Buss, fb@frank-buss.de http://www.frank-buss.de, http://www.it4-systems.de

Vote

D

D Yuniskis 16 years ago

Yes. But, the problem is, deltaX/deltaY may be *embedded* in the calculations of a particular Feature. I.e., not a "first class object" that you would typically "export" for others to use. And, it may be an expensive computation. (e.g., atan(deltaY/deltaX)).

But, more significantly, it may not be a computation that you need to make at all!!

If, for example, the current set of Gestures are all "straight line segments in varying orientations" (e.g., stroke up, stroke down, stroke left, stroke right), you can look at their "definitions" and see that they all "rely" on Curviness (sum of absolute angles between successive sampled data Points) to the same extent. I.e., Curviness will not help you differentiate one "stroke" from another "stroke".

So, why bother to compute it at all?? Its expensive and it doesn't change the results of the recognition algorithm! It's just wasted overhead!

On the other hand, some computation (buried within a Feature) may be useful to some *other* Feature's computation. So, if both of those Features are "active" in a particular Recognizer configuration, you would ideally like to be able to exploit the results of one for use by the other. You need to be able to reach into the one Feature and *grab* the computation's result (as a "friend", of sorts).

[Of course, this means using some skill when crafting the Feature implementations so that these potential intermediate results *are* accessible/exportable]

It's quite frustrating because the OO structure really is a win. But, depending on implementation language, poses different challenges in realizing a working design. :<

Again, the trick is knowing, a priori, what computations make sense to perform "always". E.g., deltaX and deltaY seem to be used a *lot* in the Features that I have defined. So, there is a strong incentive to making those available to all Features -- even though the savings brought about by their presence is laughable (you save a subtraction and a couple of data fetches).

My goal has been to spend lots of CPU cycles setting up a good algorithm (Recognizer) in the hope that the continually running algorithm (there's nothing that says the input device will ever be inactive so you have to plan on 100% duty cycle) will make up for this "excess" in the inner loop(s).

I'm *sure* there is a clever way to do it in C++. I just haven't stumbled upon the "right insight", yet!

Vote

B

Boo 16 years ago

You are insane.

Boo

Vote

D

D Yuniskis 16 years ago

Ad hominem attacks: the last refuge of the truly incompetent!

;-)

Vote

T

Tim Wescott 16 years ago

Your last statement is the true one for anything. Engines would be lighter and more powerful if only we could find frictionless, refractory and infinitely strong materials to build out of. Ditto electric motors if only we could get room-temperature superconductors and magnets so strong they bend light (infinitely strong materials with infinite permittivity would be nice, too).

Your complaint embeds the answer, if only you'd look at it -- some features depend on others, so find a way to structure your objects such that a feature links in to all the other features it needs, in such a way that it's computation only needs to happen once per iteration instead of getting duplicated all over the map. This sounds like a typical thing that one does with OO: find a linkage like this, then wrap it in a manner that lets you use the compiler to enforce the correct rules of usage rather than depending on the programmer to avoid mysterious failures at run-time.

The _nice_ thing about doing this in OO, with an OO language, is that if you know the necessary interface from the get-go then you can do it the "easy slow" way first to demonstrate that it really _does_ work, then work toward the "fast hard" way incrementally (hopefully) without breaking the whole edifice in the process.

www.wescottdesign.com

Vote

A

Albert van der Horst 16 years ago

You may find inspiration in the "blackboard design pattern" to do this sort of thing. Blackboards are an AI concept, where different pieces of code post information onto a blackboard. Based on what is there, request for information refinement are made.

Groetjes Albert

--

Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Vote

C

Chris Stratton 16 years ago

Though a downside of this is the the computational cost of overloaded operators may not be as apparent to someone who comes along later as it would be if the operation was accomplished by an "unfamiliar" function for that type - the thinking being that something obviously custom is more likely to get attention as to its potential cost than something seemingly pedestrian like "+".

Vote

D

D Yuniskis 16 years ago

Yes. This was the point I made in the other OO thread. Using languages like C++ *can* be a great win -- *if* you have good personnel designing *and* maintaining. I contend that C++ is far less forgiving of "personnel deficiencies" than C (?)

Vote

D

D Yuniskis 16 years ago

Sometimes you really *can* use the elegant solution. I could, for example, if I devoted two or three times the resources to this task! But, that impacts cost, size, power, etc. It's just not the *right* place to spend money... (e.g., debouncing keys -- *that* analogy is almost correct in that it shows that you have to keep scanning the keyboard and debouncing keys AT ALL TIMES even if the code isn't waiting for keyboard input)

Well, (to be pedantic) I'd have to create smaller objects called "Computations". It is *these* that should be shared. Features might use one or more of these in their formulation.

For example, deltaX and deltaY are (trivial) computations that many Features use as building blocks -- call them the components of a delta "Vector" (object). The magnitude (length) of that Vector is also useful to some Features. As is its *direction*, to others. Other Features might use some combination(s) of these, etc. Or, variants of computations that some other Feature performs (e.g., TotalAngle sums the signed individual angles between every pair of "vectors" whereas Curviness might sum the absolute value of that).

If Features are tiny (complexity wise) enough to make the overhead of a formal OO language "expensive", then these Computations are even more so. :<

My C implementation pieces together "computations" (lowercase 'c' there to signify these aren't formal objects) to avoid the overhead that would otherwise be present.

Trying to tackle this in a formal OO language yields worse than just "C++ overhead". Note that the binding is *really* delayed in this case to *run*-time: a static executable has all linkage resolved at compile time; a "dynamic" executable has all linkage resolved at "load time"; *this* executable delays binding until the *executable* decides how things should be bound (resolved).

It would be *great* if each "Object" could just declare its dependencies and they were automatically resolved. That's what much of the C scaffolding that I have does. In my (current) approach, it is possible for the developer to have dependencies that can *not* be resolved at this late binding. While the software can detect this as it tries to build a Recognizer, there's not much the software can *do* about it! (i.e., if you want to recognize Gestures x, y and z and they require processing Features a, b and c using Computations l, m and n --> then l, m and n had better exist *somewhere*!)

I'm not sure if a compiler/linkage editor could truly protect a developer when it comes to this, though. It would have to know about all of the Objects (Computations, Features, whatever) so that it could make a dry run at resolving all *potential* names "at build time". If one were to develop a Feature and just declare some dependency on an "external" object, there's no guarantee that those other objects would be available at run-time/load-time. I.e., *something* would have to be able to catch an exception thrown signaling "referenced object missing" (that's what my C executable does. It is acceptable since the mechanism to build the Recognizer has to return a result to its caller -- part of that result, naturally, is an indication of whether or not the Recognizer was able to be built!)

I.e., you would have to essentially *try* to statically link everything together even though they *won't* ever be statically linked.

I first wrote the Recognizer in C++ -- encapsulating each Feature in the "obvious" way. I verified the design, validity of my algorithms, etc. But, quickly noticed the overhead that was being spent invoking all of these methods and recomputing things that had been computed by other Features, etc. At the same time, the idea of dynamically reconfiguring the Recognizer struck me as a *huge* win. So, I was stuck trying to solve two competing problems (typical engineering scenario :> ). Neither one particularly straightforward.

(note that this doesn't even address the issues of how you do the actual computations efficiently!)

So, I'm stuck with a C implementation that *works* -- but is going to be a real nightmare for anyone to modify, later. I think if I could settle for a static reconfiguration (i.e., at compile time :< ) then I could easily get the performance I need just with clever use of macros (i.e., have each computation encased in a macro and, preface that macro with a test to see if a macro by that name already exists; if so, replace this instance of the macro with /* */ )

Or, maybe just add another processor devoted to this silly task I just need to "think smarter"! ;-)

Vote

OO example

Join the Discussion

Didn't find your answer?