Distributed Arithmetic

Hello everybody, am new to the group and have a question. Right now i am undergoing final semester training where my project is " Design and Implementation of IP Core for Generic FIR Filter using distributed arithmetic". I've already made the code, simulated it and tested it on xilinx virtex xcv 1000. The results agree with those of the DA MATLAB module i've designed. I've compared the synthesis reports of my core with that of xilinx coregen DA FIR V9.0. The area usage is pretty much the same but the frequency is almost half. Also a stark difference in the synthesis report is that my core's lut synthesizes into a Block RAM and the Xilinx core uses nothing like that. So my questions are:

  1. What does the core use for storing LUT contents?
  2. What can i do for speed optimization? Please send in your replies as time is running out real fast. Regards, Nimay Shah
Reply to
nimayshah
Loading thread data ...

Also i forgot to inform, the HDL used is Verilog and the synthesis tool used is XST. Could the frequency problem be due to the fact that i am using both the clock edges for different processes?

Reply to
nimayshah

If you use both clock edges, any path between the two clock domains has less than half a clock period available ( since it also must accomodate any duty-cycle difference from 50%.) That's not a smart design decision, if you want to achieve high performance. Peter Alfke

Reply to
Peter Alfke

Thank you peter, but do you have any idea about the block ram issue? That is more important to me right now. Nimay

Reply to
nimayshah

Hi Nimay, Here's an answer to 2)

The BlockRAMs (BRAMS) are slower than the CLB based RAMS. Check out the clock-to-output times for the BRAMs in the data sheet. Tcko. Or something like that. So, use two BRAMs and interleave between them. You could use both ports, but I guess you're using one port for dynamic loading? Cheers, Syms.

Reply to
Symon

Are you using 'LUT' here as a general abbreviation for Look Up Table, ie where you store co-efficients etc?

In FPGA parlance a LUT is a (usually) 4 input single output combinatorical logic module.

If you want to store hard assigned values in the FPGA you can use the flip- flops, these can be initialised to specific values on power up. These will operate much faster than Blockrams, but will use a lot more of the FPGA fabric. What do the P&R reports of the two cores show the logic usage as?

The speed your design runs at can also be down to the way you've structured the design, the amount of combinatorial logic between registers etc. This is the sort of thing a more experienced designer will take into account when doing the initial hardware architecture design. (It's a Hardware Description Language remember, not software).

Sorry if this is teaching you to suck eggs but it's not clear exactly what you mean in your question.

Nial

Reply to
Nial Stewart

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.