Optimized bitcounting on FPGA

Oct 04, 2007 7 Replies

Rate this thread:

Steven Derrien 18 years ago

Hi,

I am currently working on a circuit which has to perform Hamming distance computation between large bit vectors (>500 bits).

I was surprised not to find much information on how to implement this type of operation *efficiently* on FPGA technology.

So far I have been investigating two approaches (combining tables and counter for the bitcouting part). I observed that the choice of table size (3 or 4 address bits) had a significant impact (20%) on the area cost of the operator.

I feel that there are many subtle trade-offs in such implementations, and I was wondering if anybody had been looking at this problem (most of the articles I stumbled accross dealt with the correcting code issue, rather than focusing on the Hamming distance realization in itself).

Thanks in advances for the input.

Regards,

Steven

Vote

Ray Andraka 18 years ago

Most Modern FPGAs have block RAM that can be used for look-ups. Use these as larger LUTs to get partial counts, and then sum up the partials. If you can afford a multiplied clock, you can time multiplex segments of your input vector into a smaller number of RAMs followed by accumulators.

Vote

Peter Alfke 18 years ago

Thank you, Ray. Let me just add that a dual-ported RAM can then be used as two separate look-up tables, by addressing the two ports independently, using common data. This is obvious once you think about it, and it cuts the number of BRAMs in half... Peter Alfke

Vote

Andreas Schwarz 18 years ago

There was a lengthy discussion about bitcounting in the FPGA forum on mikrocontroller.net:

formatting link

Vote

comp.arch.fpga 18 years ago

You find the background in theoretical text books on parallel computing. Prefix sum is a fundamental sub problem and a lot has been written about the implementation of sums and prefix sums on various architecturs including networks of operators. The results translate directly to FPGAs. (Counting bits reduces to summing numbers, albeit small numbers).

Kolja Sulimma

Vote

Steven Derrien 18 years ago

Hi Ray,

That's indeed what I did, except that BRAM are somewhat overkill for the purpose.

So far, the best approach I came up with was to store in a LUT the bitcount of a 3 bit wide vector (after xoring the 2 operand for Hamming), and use an adder tree ot obtain the Hamming distance value.

Using larger tables (for vectors of 4 and more bits) ends up in a signifcant area penalty (~20%).

Steven

Ray Andraka a écrit :

Vote

Peter Alfke 18 years ago

mikrocontroller.net:http://www.mikrocontroller.net/topic/34231#newhttp://www.mikrocontroller.net/topic/34250#new

For the English-only readers: There is nothing of value in that German discussion, just repetitive trivialities. You don't miss anything if you cannot read it. Peter Alfke

Vote

glen herrmannsfeldt 18 years ago

You mean two LUTs for the 1's and 2's bit of the sum? That is the first level of a carry save adder tree.

For a true carry save adder tree you add the 1's to form a second set of 1's and 2's, add the 2's to form a set of 2's and 4's. Repeat as needed.

-- glen

Vote

Join the Discussion

Have something to add? Share your thoughts — no account required.

Didn't find your answer?

Ask the community — no account required