Just wanted to delve into the groups accumulated knowledge. I've got an app where I must find the index of the maximum of 40 or so (unsorted, of course)20-bit numbers. --the specifics aren't that important, just helps me think. It's pretty time critical, with throughput more important than latency, unless latency becomes absurd, and not area critical.
The "brute force" solution of cycling through the 40 numbers and doing a compare-store on each is pretty slow for FPGA implementation, and cascading comparators is pretty ugly.
I did some googling and literature searching and came up with plenty of CS-style results, but very few hardware results.
This was one...which was seemed the FPGA equivalent of a very clever IEE paper entitled "Efficient Parallel Pipelineable VLSI architechture for finding the maximum binary number" by F. Daneshgaran and K. Yao. which solved the problem using tristates.
The bitwise parallel-max approach is pretty attactive and reduces the clock cycles to the bitwidth, e.g. 20 (though I guess you could look at m-bits at a time, but it becomes more complex) but I think it will get somewhat hung up on the "check for no candidates" logic.
I was wondering if there were any other "clever" approaches around to this sort of thing from an FPGA synthesis viewpoint? I'll be the first to admit to it being a while since I've looked at an algorithms book, so feel free to club me over the head with any idiocies.
--Josh Model Asst. Technical Staff MIT Lincoln Laboratory