Optimized bitcounting on FPGA

Hi,

I am currently working on a circuit which has to perform Hamming distance computation between large bit vectors (>500 bits).

I was surprised not to find much information on how to implement this type of operation *efficiently* on FPGA technology.

So far I have been investigating two approaches (combining tables and counter for the bitcouting part). I observed that the choice of table size (3 or 4 address bits) had a significant impact (20%) on the area cost of the operator.

I feel that there are many subtle trade-offs in such implementations, and I was wondering if anybody had been looking at this problem (most of the articles I stumbled accross dealt with the correcting code issue, rather than focusing on the Hamming distance realization in itself).

Thanks in advances for the input.

Regards,

Steven

Reply to
Steven Derrien
Loading thread data ...

Most Modern FPGAs have block RAM that can be used for look-ups. Use these as larger LUTs to get partial counts, and then sum up the partials. If you can afford a multiplied clock, you can time multiplex segments of your input vector into a smaller number of RAMs followed by accumulators.

Reply to
Ray Andraka

Thank you, Ray. Let me just add that a dual-ported RAM can then be used as two separate look-up tables, by addressing the two ports independently, using common data. This is obvious once you think about it, and it cuts the number of BRAMs in half... Peter Alfke

Reply to
Peter Alfke

There was a lengthy discussion about bitcounting in the FPGA forum on mikrocontroller.net:

formatting link
formatting link

Reply to
Andreas Schwarz

You find the background in theoretical text books on parallel computing. Prefix sum is a fundamental sub problem and a lot has been written about the implementation of sums and prefix sums on various architecturs including networks of operators. The results translate directly to FPGAs. (Counting bits reduces to summing numbers, albeit small numbers).

Kolja Sulimma

Reply to
comp.arch.fpga

Hi Ray,

That's indeed what I did, except that BRAM are somewhat overkill for the purpose.

So far, the best approach I came up with was to store in a LUT the bitcount of a 3 bit wide vector (after xoring the 2 operand for Hamming), and use an adder tree ot obtain the Hamming distance value.

Using larger tables (for vectors of 4 and more bits) ends up in a signifcant area penalty (~20%).

Steven

Ray Andraka a écrit :

Reply to
Steven Derrien

mikrocontroller.net:http://www.mikrocontroller.net/topic/34231#newhttp://www.mikrocontroller.net/topic/34250#new

For the English-only readers: There is nothing of value in that German discussion, just repetitive trivialities. You don't miss anything if you cannot read it. Peter Alfke

Reply to
Peter Alfke

You mean two LUTs for the 1's and 2's bit of the sum? That is the first level of a carry save adder tree.

For a true carry save adder tree you add the 1's to form a second set of 1's and 2's, add the 2's to form a set of 2's and 4's. Repeat as needed.

-- glen

Reply to
glen herrmannsfeldt

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.