algorithm: square operation

Dec 15, 2004 6 Replies

Rate this thread:

fwj_733 21 years ago

There is a lot of square operations in my FPGA projects (using Xilinx VitexE). Now, I complete them by multiple, but this method is slice consuming and slow. As we know, square operation has many characters, which multiple operation of any random numberic don' have, then could we utilize these features to calculate square operations more economical and more fast? Thanks for all advice. Best Regards

Vote

Ben Jackson 21 years ago

There may be shortcuts if you need a series of squares, for example. You can compute that as

sq = 0; for (i = 1; i < n; ++i) { sq = sq + (i

Vote

Ray Andraka 21 years ago

Now, I complete them by multiple, but this method is slice consuming and slow. As we know, square operation has many characters, which multiple operation of any random numberic don' have, then could we utilize these features to calculate square operations more economical and more fast? Thanks for all advice. Best Regards

You really need to give us more information. Are the squares sequential (eg squares of an arithmetic progression of values)? How many clock cycles per square are available? How many bits? Do you have BRAM available? Do you have area restrictions? etc. As with many other problems, there are many ways to approach this.

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Vote

fwj_733 21 years ago

Sorry for not giving sufficient info. The squares works parallel, that is, one clock cycle per square. Bits width of Operand are about 10~20. VirtexE do have BRAM, I use Virtex600E, and the remaining maximum capacity for square operation is 128 Kb. I hope square of a 20bit number can be completed by less than 30 VirtexE slices, and work above 70MHz. I still wonder, for a multiplier written as: use IEEE.std_logic_arith; ... C

Vote

Arash Salarian 21 years ago

Generally modern synthesize tools generate a very high quality result for basic operatinos like add, multiply. Both Xilinx and Altera have already highly optimized macros for these operations (remember LPMs?) and usually they are hand optimized and very efficient. The synthesize tool selects from a library of these pre-made macros for these operations. Also the output generated for C

Vote

fwj_733 21 years ago

Thank you for your reply. I made a test, two instance, one calculate A*B, the other calculate A*A, oprand is 20 bit. I found they are almost the same result. After map, both occupy 215 slices (VirtexE). And only small difference in the time delay. ISE translate the square operation to multiplier.

Vote

Jeff Cunningham 21 years ago

I think there was a post a year or two ago about doing an efficient square operation by splitting the number in half such that a N bit square could be done with some N bit lookups and some additions instead of a 2N table lookup. Such that a 10 bit square only required a 10 bit lookup table instead of 20 bit. I think it went something like this:

Take a 10 bit number to be squared and represent it as a+32b where a and b are 5 bit numbers. (a+32b)(a+32b) = a^2 + 64ab+1024b^2. Now you have to do 3 of the smaller table lookup mpys and some shifting and adding to get the result.

Vote

Join the Discussion

Have something to add? Share your thoughts — no account required.

Didn't find your answer?

Ask the community — no account required