Hi,
I have a process that emits element_t's at some rate. The number that will be emitted is not known a priori. And, will vary with each invocation.
Another process accumulates these element_t's into an array (list?). When the final element_t has been delivered, this second process will be responsible for determining if the accumulated array matches one or more *const* element_t arrays "hardcoded" into it's implementation.
[That gives a fair bit of leeway as to *how* they are encoded!]*Or*, any of a number of arrays that have been dynamically created over time.
[Preference is given to the dynamic arrays over the const ones]I want the second process to deliver a result of the form: {NO_MATCH, PARTIAL_MATCH, EXACT_MATCH} along with a pointer to the const/dynamic array that the input best/first fits.
Assume you have something *like*: result_t compare(element_t input, element_t pattern) that returns: {IS_BEFORE, IS_COINCIDENT, IS_AFTER} to give an ordering to the two element_t's that it examines (i.e., "input result_t pattern")
[E.g., if these were character arrays, then "ABC" would be an exact match if "ABC" was encountered in one of the const/dynamic selections; it would be a partial match if "ABCD" was encountered in the selections; and NO match if "AFRH" was (the only) selection available.]Assume most invocations will result in some form of "match" (this can have an impact on the choice of algorithm)
You *could* accumulate element_t's as they are emitted and, later, process them as a completed entity -- instead of handling them one at a time (pay me now; pay me later). But, that means assuming the cost of a (dynamic!) structure *just* to accumulate them (in addition to any other structures that the algorithm requires/builds)!
And, of course, handling them as they are emitted can give some performance advantages (e.g., take an early FAIL exit in some cases; and, probably, ALWAYS get a final result quicker -- from the time of the first process's termination)
The choice of encoding scheme for the const/dynamic selections is flexible -- trade space for processing time, etc. (this leaves a LOT of options on the table!)
What I am most concerned with is the time from the delivery of the "final" element_t to the declaration of a result. And, the resources consumed getting to that point (e.g., walking a tree requires accumulating state to know how to climb back *out* when you chase a failed branch!).
Assume this runs on a 32b architecture (e.g., pointers cost you
32b). And, element_t's will definitely be larger than char's (I think I can limit them to 32b -- worst case, *element_t's at the expense of an indirection)As the "dynamic" selections are accumulated at run-time, the costs of any algorithm that imposes a specific structure on them should be taken into consideration (esp because writable store is more expensive -- and in shorter supply -- than R/O store!). E.g., merging them into a sorted list, tree/trie, "compiled" representation, etc.
For folks who shudder at abstractions, pretending element_t's are char's allows you to consider the problem as one of "comparing strings" -- against a fixed dictionary *and* a dynamically maintained dictionary. Except, of course, the cost of compares is considerably higher!
[This is the approach I am currently taking as I chase down applicable algorithms. I've made a first pass through Knuth and have now started looking a bit further afield for ideas (e.g., compiled regex's)]Thx,
--don