FPGA multiplier

S

sutejok 19 years ago

from the Xilinx Virtex4 spec:

=B7 XtremeDSP=99 Slice

- 18x18, two's complement, signed Multiplier

- Optional pipeline stages

- Built-In Accumulator (48-bits) & Adder/Subtracter

i'm not too familiar with dsp on fpga - what does it mean when it says

18x18 multiplier? is it a hardware multiplier? is there anywhere i can get informations on and how to use them? something specific to virtex4 would be nice

thx=20 tejo

Vote

P

Peter Alfke 19 years ago

Vote

A

Austin Lesea 19 years ago

Tejo,

formatting link

Yes, the 18X18 multiplier/accumulator is a hardened block, so that performing this function results in from 8 to 20 times less power than performing this function would if it was done in the logic of the FPGA (luts, dff, interconnect, etc.)

The above guide details use of the V4 for "extreme" DSP uses.

FPGAs are useful for tasks that DSP processors are too slow for, otherwise, DSP processors are generally far easier and better suited for DSP. For example, a video conference processor, where multiple streams must be encoded, decoded, combined, along with all audio processing is one such task where a FPGA would excel for both cost, power, and performance.

formatting link

Aust> from the Xilinx Virtex4 spec:

Vote

N

Nico Coesel 19 years ago

I doubt about cost and preformance. Developing such a device would probably take so much time that an ASIC is just as cost effective and uses even less power. An older PC is already capable of doing these functions with a development time that can be expressed in days, not years.

Even when dealing with loads of videostreams at high resolutions it is more cost effective, reliable and flexible to use PCs on a fast network to do the processing rather than building a custom FPGA/ASIC solution.

Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl

Vote

A

Austin Lesea 19 years ago

Nico,

We (Peter and I) have decided to avoid any marketing discussions.

On technical subjects, at least I can (usually) post something useful.

You decide when, where, and how to use Xilinx FPGAs.

I am happy if you use them at all, for whatever you find profitable.

The website speaks for itself: there are lots of customer testimonials for those who wish to read them. You decide.

Perhaps others can post why they use our FPGAs for extreme DSP?

Austin

Nico Coesel wrote:

Vote

D

David Ashley 19 years ago

Probably there are other constraints. Do you need 1,000 of these? Is it a mass market product? Does it need to fit in a rack? In a unit the size of a pack of cigarettes?

Getting something working in a lab in just any old manner, perhaps for getting funding, is one thing. Getting to a fieldable solution...

There is no way one can make a blanket statement about the best solution without knowing all the requirements.

In my case I worked on a project that had to encode live NTSC video to mpeg-2 I frames with a minimum of latency. The solution ended up being DSP based, blackfin DSP's, 4 video/audio inputs on a PCI card. It was a big project. In the end we were limited by the performance of the DSP's. The frame size ended up being 352x240 at 60 hz. There was just no way we could ever do 720x240 -- had to downsample in X.

Now if we'd opted for an FPGA, if it was big enough we'd have had lots of options to improve performance. We could have licensed some existing IP, modified it to suit. It would have been a bigger unknown since none of us had direct FPGA experience, but we did have low level programming experience. Going an FPGA route might have been a better investment in the long run...

Use the right technology/solution for the task at hand. No one size fits all.

-Dave

David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture

Vote

D

David Ashley 19 years ago

One more thing occured to me. With the Analog Devices blackfin DSP approach we found out the hard way memory bandwidth was a severely limiting factor. The DSP had some small amount of on chip memory that ran at the CORE clock frequency. The SYSTEM clock was a fraction of that, say 1/5th or 1/6th. Accessing external SDRAM took something like 6 SCLOCKS, which translates to 36 CORE clocks. Thats a *long* time in DSP space. The SDRAM controller never did burst accesses, as I recall.

What that means is you can't effectively do anything unless you use the on chip fast memory, which operates at the CORE clock frequency (600 mhz to 750 mhz for example). But that was a limited resource. And there was no way to improve the SDRAM controller, that was part of the chip. So the resolution *couldn't* improve, we didn't have memory for it, and no amount of optimization would work.

With an FPGA, on the other hand, one could have modified the external memory controller to burst out a whole 64 byte chunk of memory, or burst one into a BRAM. Then operate in the BRAM, then write it back out. Doing burst accesses would really speed things up, and the memory IO could go in in parallel with other processing. In short there would be almost unlimited ability to optimize, as needed.

With the fixed-cpu DSP approach, we found out the limits the hard way. Then we had to reduce our expectations. In the end it was OK, but it might have been a disaster.

-Dave

David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture

Vote

B

Ben Jackson 19 years ago

The TOC of that document refers to instantiation templates on p58/60 while the PDF seems to go from 56..63 directly. Trouble with my pdf reader??

Ben Jackson AD7GD http://www.ben.com/

Vote

R

Rob 19 years ago

Dave,

Funny you should mention this. We came across the same issue, but we found out the limitations of the memory interface before we decided to spin boards. We ultimately went with an FPGA. You can't be an FPGA for parallel processing. In our particular application an FPGA running at 67MHz out performed the BlackFin running at 500MHz, all because of the FPGA's inherent power of parallel processing.

Take care, Rob

Vote

R

Rob 19 years ago

Vote

D

David Ashley 19 years ago

No argument here. In this particular project we outsourced the hardware design, but did all the software in house. We had limited time to review the hardware designer's choice for chips -- he did some digging and we were content to trust his instincts. Getting to the point where we would have studied the datasheet in terms of memory bandwidth...there's just no way we would have invested the time in that then. There would have been an element of faith that AD would have designed their memory controller efficiently.

Realistically their SDRAM controller seems to be just an afterthought. I can't really see how it would be *that* hard to add in bursting...and it would completely solve the bottlenecks...

Note AD = Analog Devices + this discussion is in regards to their blackfin line of DSP products. Not really FPGA's. :)

-Dave

David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture

Vote

FPGA multiplier

Join the Discussion

Didn't find your answer?