PIC based Data Logger - Need slick search algorithm - any ideas ?

Hi

I'm hoping someone out there might have a good idea how I might solve my dilema.

I have a PIC18F4620 and am using it as a data logger. Storing logged data in an external SPI bus connected Serial Flash memory. The serial Flash is 16Mbits or 2Mbyte.

The Serial Flash is to be used as a circular buffer and end users will request the logged data by Record number or by timestamp. (Timestamp is DWORD [4byte] seconds since 1990)

i.e. return all records from timestamp A to Timestamp B or all records from record A to Record B.

Any way I need a really slick way of searching the logged data records by record number or timestamp and then return the data as quickly as possible.

If any one has any ideas I'd really appreciate them.

many thanks

Anbeyon

Reply to
Anbeyon
Loading thread data ...

On Tue, 13 Nov 2007 10:08:44 -0800, I said, "Pick a card, any card" and Anbeyon instead replied:

You need much more RAM than the PIC has to do it quickly. You could use a reserved portion of the Flash but to sort even 200,000 random records will involve a very high number of rewrites in that Flash causing its early demise. Try some static RAM.

-- Ray

Reply to
Ray Haddad

Are the records fixed length? Lets say they're variable length. The format could be this:

start some string \n some string \n ... end

Start and end are pointers stored in flash. You have to be careful about the order of writing to the flash so that things don't get corrupted during power outage:

write start first

write new string

write end last

The data will be valid if the power goes out between any of the steps above.

The data is all ASCII including the timestamp and record number so that you can easily find the record boundaries by looking at the newline character \n.

The records just wrap when the end of the flash is reached. When end hits start, start is advanced to the beginning of the next record so that you always have complete records.

You can always dump the entire log by reading from start to end.

So lets say you are looking for record with timestamp 'key'. The records are in order, so use a binary search for O(log2(N)) lookup speed:

low = 0 high = end - start old = -1

loop: mid = (high-low)/2 goto_start_of_record(mid) if mid == old goto done old = mid if get_timestamp(mid) >= key // Will find first record of set with same key high = mid else low = mid

done: while mid != end - start && get_timestamp(mid) == key print_record(mid) goto_start_of_next_record(mid)

Goto_start_of_record searches backwards through the log to find the first character of the record. It stops at NUL or start.

Goto_start_of_next_record searches forwards through the log to find the first character of the next record or end. It could be combined with print_record.

--
/*  jhallen@world.std.com AB1GO */                        /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
Reply to
Joseph H Allen

--
/*  jhallen@world.std.com AB1GO */                        /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
Reply to
Joseph H Allen

Firstly, how would you measure "as quickly as possible"? We need a number to be really helpful.

Maybe I'm missing something but exploiting some known properties of record number and timestamp will give you fast indexing *if* you use fixed length records. You obviously are maintaining your head and tail pointers in RAM and even possibly in external flash memory (you'll need wear leveling if you maintain your pointers in flash) so the pointers provide a frame from which you can calculate where the records can be found.

Knowing record size is fixed, record numbers are sequential, and record numbers increment by one, you can easily calculate where you expect any record to be located. That's the simple case.

The more difficult case is searching by timestamp. I'm guessing records are not necessarily evenly spaced in time. Knowing timestamps are sequential, you can do a binary search to find the timestamp (or timestamp range) you're searching for. Depending on record size, that's definitely no more than 22 search branches before you find your target (16M, 24 bits, minus timestamp size, 2 bits = 22 branches). That's freakin' quick! :)

If the binary search method meets the "quickly" requirement, a binary search by record number would suffice as well.

The only way that I can think of to improve the read speed is to cache blocks of serial flash into RAM so you don't need to get it again once you found it. Classic access time vs RAM tradeoff.

JJS

Reply to
John Speth

First of all if you know

1/ Maximum memory size

2/ Records are fixed length

You don't need to store record number *IF* your timestamp is always valid.

If using circular storage method you can find highest and lowest record number by binary search on Timestamp (effectively record number in manner of speaking).

Keeping records as fixed length and INTEGER sub-multiple of Flash page size WILL help you as no doubt you will have to erase pages before writing new records. Otherwise you have to handle partial records across Flash page boundaries where half of the record is *erased*.

Use the erased state to determine full/erased pages (which you will need to keep track of). WHEN unit gets powered up do sanity check at power up. Hopefully a Flash page erase time is LESS than the amount of buffering space you have for records waiting to be written.

Binary search in a circular buffer for lowest and highest Tiemstamp gives first and last record.

As buffer is circular (with possible gap), with fixed size records it becomes easy to list record A to record B.

With Timestamp it becomes a case of read highest and lowest (you will probably be keeping a record of this anyway for updates and parameter checks) and work on you binary search from there to find start and end timestamp record numbers.

If you are logging at regular intervals Timestamp A to Timestamp B becomes even easier when you know the first and last records' time stamps.

That way you are maintaining very little information in RAM!

You only need to read the timestamp of the record when doing searches.

-- Paul Carpenter | snipped-for-privacy@pcserviceselectronics.co.uk PC Services Timing Diagram Font GNU H8 For those web sites you hate

Reply to
Paul Carpenter

Thanks every one for you comments and thoughts.

Anbeyon

Reply to
Anbeyon

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.