algorithm: square operation

- F
- fwj_733
  
  Contact options for registered users
posted
19 years ago

Wed, Dec 15, 2004 2:04 AM

There is a lot of square operations in my FPGA projects (using Xilinx VitexE). Now, I complete them by multiple, but this method is slice consuming and slow. As we know, square operation has many characters, which multiple operation of any random numberic don' have, then could we utilize these features to calculate square operations more economical and more fast? Thanks for all advice. Best Regards

- B
- Ben Jackson
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Dec 15, 2004 3:47 AM

There may be shortcuts if you need a series of squares, for example. You can compute that as

sq = 0; for (i = 1; i < n; ++i) { sq = sq + (i

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Dec 15, 2004 2:26 PM

Now, I complete them by multiple, but this method is slice consuming and slow. As we know, square operation has many characters, which multiple operation of any random numberic don' have, then could we utilize these features to calculate square operations more economical and more fast? Thanks for all advice. Best Regards

You really need to give us more information. Are the squares sequential (eg squares of an arithmetic progression of values)? How many clock cycles per square are available? How many bits? Do you have BRAM available? Do you have area restrictions? etc. As with many other problems, there are many ways to approach this.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

- F
- fwj_733
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Dec 15, 2004 3:05 PM

Sorry for not giving sufficient info. The squares works parallel, that is, one clock cycle per square. Bits width of Operand are about 10~20. VirtexE do have BRAM, I use Virtex600E, and the remaining maximum capacity for square operation is 128 Kb. I hope square of a 20bit number can be completed by less than 30 VirtexE slices, and work above 70MHz. I still wonder, for a multiplier written as: use IEEE.std_logic_arith; ... C

- A
- Arash Salarian
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Dec 15, 2004 3:59 PM

Generally modern synthesize tools generate a very high quality result for basic operatinos like add, multiply. Both Xilinx and Altera have already highly optimized macros for these operations (remember LPMs?) and usually they are hand optimized and very efficient. The synthesize tool selects from a library of these pre-made macros for these operations. Also the output generated for C

- F
- fwj_733
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Dec 16, 2004 1:11 AM

Thank you for your reply. I made a test, two instance, one calculate A*B, the other calculate A*A, oprand is 20 bit. I found they are almost the same result. After map, both occupy 215 slices (VirtexE). And only small difference in the time delay. ISE translate the square operation to multiplier.

- J
- Jeff Cunningham
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Dec 16, 2004 2:04 PM

I think there was a post a year or two ago about doing an efficient square operation by splitting the number in half such that a N bit square could be done with some N bit lookups and some additions instead of a 2N table lookup. Such that a 10 bit square only required a 10 bit lookup table instead of 20 bit. I think it went something like this:

Take a 10 bit number to be squared and represent it as a+32b where a and b are 5 bit numbers. (a+32b)(a+32b) = a^2 + 64ab+1024b^2. Now you have to do 3 of the smaller table lookup mpys and some shifting and adding to get the result.