Nice MCU for small jobs

IIRC that's part of why the linked-list insertion poops out compared to vector resize-and-insert for large data sets, the naive linked-list is not particularly cache-friendly for large data sets

Reply to
bitrex
Loading thread data ...

Binary search time-complexity is O(1) best-case and O(log n) worst-case, a hash table is O(1) best-case and O(n) worst-case.

A naive binary search implementation can certainly beat a pathological hash table implementation in the average; one would hope most hash tables aren't pathological, though, and binary search is only O(1) in the average for pathological use-cases of binary search.

Reply to
bitrex

The reason a hash table is faster is more-or-less because you already did the work of the search and stored it, encoded in the hash table. You trade off space usage to save time. "Searching" for something is easy if you already know where it is.

Reply to
bitrex

When I go to the library to find a novel I want here's what I do. I mentally split the library floor space in half and check every book in the first half one by one. This only takes approximately twelve days. If it's not in the first half I....fuuuuuk!!!! :( :(

Reply to
bitrex

One algorithm that seems to combine the best behaviours and avoid the worst case behaviours is the skiplist algorithm.

Reply to
Tom Gardner

bitrex wrote

OK, yes, exactly.

Just did look up where I used hashing, I did actually use hashing for message IDs in my newsreader, to stop cross posted repeats from being listed again and again. and also in my videotext / ceefax program for storing pages. Both applications are not really that time critical and the data sets are small.

Reply to
<698839253X6D445TD

Big O is really only useful for analysis comparisons for very large data sets, where for example your data set is >> CPU cache size on desktop architectures. what's actually better in practice for small datasets where the set-up/tear-down overhead of the algorithm and execution time of things like branches becomes on the order of the time you spend actually carrying out the full search.

For "small" data sets sometimes just linear search beats everything else hands-down. What the specific uses cases it will are and what the precise definition of "small" is for a particular architecture endlessly debated; x86 is so complex it's often anyone's guess, you have to profile to see.

as usual the answer to "what's the best way to..." is "it depends"

Reply to
bitrex

For example if your arch of choice has truly exceptional branch prediction binary search can be faster even in cases where an on-paper analysis says it shouldn't be, just cuz a binary search can be written where it's a lot easier for a the branch predictor to suss out the taken branch than a naive linear search or hash table lookup.

Reply to
bitrex

I have been doing that for the financial sector for 15 years, even crazier volumes. Cutting edge technology, partially invented by myself. My background in lock-free algorithms and low-level programming was the enabler here. We were also pioneering the GPU-based read-mostly databases -- throughput was stellar, but latency killed the entire idea. It should have been done on FPGA, but the management failed to recognize this opportunity. Many brilliant people were involved, perhaps the most advanced stuff I've ever been doing in my professional career, by several orders of magnitude.

Best regards, Piotr

Reply to
Piotr Wyderski

tirsdag den 4. september 2018 kl. 03.35.04 UTC+2 skrev Lasse Langwadt Christensen:

I like the STM32F051 32kB devices, comes in at below 32 cents in volume (ST slogan: 32 bits for 32 cents)

4 times faster than the LPC802, double the IO, much faster ADC, better DAC. Should I continue?

Cheers

Klaus

Reply to
ellenhultmann

On Sep 5, 2018, snipped-for-privacy@nospam.org wrote (in article ):

On average, hashing is way faster than anything else for large datasets, but must be used correctly, and the pathalogical cases avoided or addressed. Most importantly, the hash array must be large enough that collisions are reasonabley rare. Each hash array location contains either null (no data here) or a pointer to a short linked list of data items.

To load the array, take every data item in term and generate the hash index from the item?s key, whatever that may be. This process is called hashing. Any hash algorithm that spreads the data items more or less uniformly in the hash index array will do. A very common approach is to divide the key by a prime number (this being the length of the hash array as well), and use the remainder as the index (address) into the hash index array. The quotient is discarded. There are lots of hashing algorithms in the literature, but keep it simple.

At the just computed location in the hash index array, the value will either be null - no data value hashed to that location, or there will be a pointer to a small data item containing the full hash key and the data being saved (or a pointer to the data, if the data is large). In either case, add the new data item to the list of data items at that hash location.

To use the now-loaded array, go through the same index address computation as used when loading the array, find the short list at that location in the table, and look through it for a match. If the database is large enough, things may be arranged to allow the list at a hash-table location to be searched by succesive approximation, but usually it?s cheaper to just traverse the list linearly, because if the table is large enough, very few of these lists are longer than two or three items long.

Hash tables in 2D are common for searching 2D maps of what is close to what geographically.

For a table of ten million IPv32 addresses, one would take the 32-bit address (without punctuation) and divide by 10000019, and use the remainder to index the hash table. Use a Poisson distribution to determine the fraction of cells with 0, 1,2,3, ... entries.

Spellcheckers also use a large hash table, one with one-bit table entries - if such a word exists (is spelled that way), the bit will be set. Collisions are ignored - the bit remains set. If the hash table is large enough, false positives (misspelled word marked as correct) will be rare.

Joe Gwinn

Reply to
Joseph Gwinn

Why don't you just start a thread about your favorite color?

Reply to
bloggs.fredbloggs.fred

Because he talks about electronics and not the weather in Japan?

Reply to
Gerhard Hoffmann

You also have to consider the data base usage, is it search only or also frequent insertion of new record. A well balanced binary tree is fast, but if there are a lot of new insertions, you easily end up in pathological cases, with bad search times. Inserting new records may require a lot of heavy operations keeping the tree well balanced.

Reply to
upsidedown

how is that goes to work? hashes don't support in-order access

--
     ?
Reply to
Jasen Betts

how is that going to work? hashes don't support in-order access

--
     ?
Reply to
Jasen Betts

Joseph Gwinn wrote

Yes, I'v done that, several places in this newsreader, here for the groups list (there are thousands of groups on some severs, at least there ware in the old days):

int hash(s)/* form hash value for string s */ char *s; { int hashval; for(hashval = 0; *s != '\0';) hashval += *s++; /* sum of ascii value of characters divided by tablesize */ return(hashval % GROUPS_HASH_SIZE); }

Those groups, like for example sci.electonics.design, are in a linked list.

struct group *lookup_group(char *name) { struct group *pa;

if(debug_flag) { fprintf(stdout, "lookup_group(): arg name=%s\n", name); } /* fprintf(stdout,\ "lookup_group(): looking up name=%s at grouptab=%d loc=%ld\n",\ name, hash(name), grouptab[hash(name)]);

*/ for(pa = grouptab[hash(name)]; pa != 0; pa = pa -> nxtentr) { if(strcmp(pa -> name, name) == 0) return pa; /* found sequence entry */ }

return 0; /* not found */ } /* end function lookup_group */

etc etc... int walk_grouplist_and_display(struct newsgroup *pa) { char sformatstr[4]; /*char *ptr;*/

/* Note: filter has already been applied, only allowed groups are in this list.

*/

/* This routine prints out the tree recursively: walk left node, print value of this node, and walk right node

*/ .....

Indeed for this specific purpose (newsgroups) a binary search makes no sense I think.

That could work, but testing shows that in this case, with IP addresses in incremental order, (pre processed list) it takes ont log(n) or basicaly about what was it I tested? far less less than 12 iterations anyway that are only a number compare. Hashing makes no sense here I think. If you can write hashing code that can do that faster I would like to see it. Since ip_to_country has to run in real time faster than a bad actor can hit the server, it needed to be _really_ fast.

I think I got that working.

Yes there it makes perfect sense.

Reply to
<698839253X6D445TD

Yes, a binary search is O(log n) on average (as well as worst-case), and it is easy to achieve. Hash algorithms need to be tuned to the application, with hash tables of a size to suit the data size and hash functions that give good distributions. Done well, you achieve very close to O(1) on average.

Reply to
David Brown

On Sep 6, 2018, Jasen Betts wrote (in article ):

Unlike a line, there is no concept of numerical order in a plane. The metric of ?near? is instead euclidean distance.

So, given a 2D hash index array, one finds the X and Y index values of the cell into which the current item will fall, and search that cell and also the eight abutting neighbor cells. And maybe the sixteen cells that abut the eight direct abutters. X and Y are the hashed longitude and attitude respectively, or a local planar map of same. Joe Gwinn

Reply to
Joseph Gwinn

The dev boards came in, and they're pretty slick. For the $27 you get not only the MCU and so on, but two shields, one for capacitive touch and one full of buttons and switches and jumpers and stuff, for use with the programmable logic unit. The PLU is a small CPLD with 26 5-input LUTs that you can program in Verilog. Super handy where hardware timing is needed, which it usually is in instruments.

The other thing I really like about the LPC804 is that it has this giant pin multiplexer, so that you can put just about any peripheral function on any GPIO pin, which is a huge time saver compared to the assing around you have to do on other MCUs to try to get the best possible combination of limited pinout choices. You can mix and match inputs and outputs, e.g. you can put a capture input and a UART TXD on the same pin if you like. Some combinations are silly, but some are potentially useful, and it Probably the big MUX works a lot better at 15 MHz than 200.

They basically give the dev board away, but (I imagine) charge just enough that hobbyists aren't so tempted to use it instead of an Arduino or RasPi.

Fun.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC / Hobbs ElectroOptics 
Optics, Electro-optics, Photonics, Analog Electronics 
Briarcliff Manor NY 10510 

http://electrooptical.net 
https://hobbs-eo.com
Reply to
Phil Hobbs

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.