Max and Argmax across 1,000 unsigned 10-bit numbers

Hello,

I recently requested advice as to performing bit-matrix multiplication on bit matrices (bitwise AND followed by a population count), one matrix stored onboard an fpga in block rams, the other one (first operand) streaming in row by row.

It was thought that one of the latest pcie cards would be able to provide dot-product throughput limited by the pcie input speed of 16 Gbps for pcie x8. The adders would be 80-bits wide (80 bits arrive per cycle at 200 mhz over pcie x8) and each column of the onboard matrix would be stored in 80 block rams.

My question is: How much more difficult is the problem if I must find out the maximum dot-product, and which column produced it, for each input vector? This operation must be performed for each input row, yielding 1 max and argmax for every 1000 input bits. The input vector is 1000 bits long, of course, and finishes arriving over pcie after about 12 pcie cycles. Is there a fast enough way to argmax 1000 numbers that are 10-bits (representing each columns dot-product)? What would the cost of the argmax operation be in fpga space as compared to the column adders (which are probably 80-bits wide for each column)?

Thanks for your help. I want to make sure that the max and argmax functions will not be a limiting factor in the design of the bit-matrix multiplier.

Also, thanks for so many helpful comments that have gotten me to this level of understanding of the problem.

- AndrewF

Reply to
andrewfelch
Loading thread data ...

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.