RE:can you please post a summary of your findings to the group?

-----Original Message----- From: Arrigo Benedetti [mailto: snipped-for-privacy@bologna.vision.caltech.edu] Sent: 02 July 2003 15:06 To: Aziz AhmedSaid Subject: Re: Does anyone know about hardware implementations of the SVD ? Hi Aziz, can you please post a summary of your findings to the group? I'm very interested in computing the SVD in hardware myself. Best,

-Arrigo

--------------------------------------------

I have implemented the Brent Luk and Van loan SVD systolic array described in the following paper: R.P. Brent, F.T. Luk, and C. Van Loan "Computation of singular value decomposition using mesh-connected processors" J. VLSI. Comput Syst, vol. 1, no. 3, pp. 242-270, 1985.

This systolic array can perform the SVD of a square N*N matrix in O(N logN) time using (N/2)^2 processors. This architecture doesn't compute singular vectors and is not suitable for (relatively) large matrices as it uses too many processors (it is the price for the speed).It suffers as well from inefficiency because each processor works for only third of the time.

What I did is first make some modifications in order to improve the efficiency, adapt the array to compute singular vectors and finally I implemented it using a High-level hardware design language Handel-C. The result was a higher efficiency (more than double) a reduced computation time (divided by three) and a completely parameterized code that can be used for any matrix size, word length and FPGA target.

Example: Target: Xilinx XC2000e, speed grade 6 Matrix size: 8*8 World length: 16 Area: 99 % Speed: 84 mhz Clock cycles per sweep: 3430 (for an 8*8 matrix, 3 or 4 sweeps are enough) Efficiency: 77.6 %