Area optimization (optimizing DSP48E usage)

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
I am trying to map, place & route a large design on a Xilinx Virtex 6 FPGA
Target Device  : xc6vlx550t
Target Package : ff1759
Target Speed   : -2

My mapping process fails with the following errors:

ERROR:Pack:2310 - Too many comps of type "DSP48E1" found to fit this device=
ERROR:Pack:2860 - The number of logical carry chain blocks exceeds the capa=
city for the target device. This design requires 100940 slices
   but only has 85920 slices available that allow carry chains.
ERROR:Map:237 - The design is too large to fit the device.  Please check th=
e Design Summary section to see which resource requirement for
   your design exceeds the resources available in the device. Note that the=
 number of slices reported may not be reflected accurately as
   their packing might not have been completed.

When I inspect the Mapping report file, I see:
Interim Summary
Slice Logic Utilization:
  Number of Slice Registers:               460,088 out of 687,360   66%
    Number used as Flip Flops:             399,848
    Number used as Latches:                      0
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:           60,240
  Number of Slice LUTs:                    388,284 out of 343,680  112% (OV=
    Number used as logic:                  384,856 out of 343,680  111% (OV=
      Number using O6 output only:         311,180
      Number using O5 output only:          10,716
      Number using O5 and O6:               62,960
      Number used as ROM:                        0
    Number used as Memory:                     114 out of  99,200    1%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:           114
        Number using O6 output only:           114
        Number using O5 output only:             0
        Number using O5 and O6:                  0
    Number used exclusively as route-thrus:  3,314
      Number with same-slice register load:      0
      Number with same-slice carry load:     3,313
      Number with other load:                    1

Slice Logic Distribution:
  Number of LUT Flip Flop pairs used:      584,470
    Number with an unused Flip Flop:       125,987 out of 584,470   21%
    Number with an unused LUT:             196,186 out of 584,470   33%
    Number of fully used LUT-FF pairs:     262,297 out of 584,470   44%
    Number of unique control sets:             233
    Number of slice register sites lost
      to control set restrictions:             854 out of 687,360    1%

  Number of DSP48E1s:                        4,800 out of     864  555% (OV=


I did a quick calculation on design resource usage such as LUTs versus DSP4=
8E1s from the Xilinx Coregen GUI:
1. Multiplier1 uses 86 LUTs vs 1 DSP48E1. The design uses Multiplier1 x96. =
So I am looking at either 96 DSP48E1s or 8256 LUTs.
2. Multiplier2 uses 142 LUTs vs 1 DSP48E1. The design uses Multiplier2 x470=
4. So I am looking at either 4704 DSP48E1s or 667968 LUTs.

I tried different options to synthesize my design using LUTs and using DSPs=
. Before I partition my design, I just wanted to check with everyone here, =
on how the multipliers can optimize the usage of DSP48Es vs LUTs. The curre=
nt mapping report indicates all the multipliers were mapped using DSPs, hen=
ce 4800 DSPs.20%

1. How can the XST tool or the mapping partition the usage of the multiplie=
rs using both DSPs and slice logic? Is this possible with some constraint?

2. The multiplier cores are currently set for Area optimization vs Speed op=
timization and I have used "use Mults" option. If I set "use LUTs" option, =
will the XST and Mapping process partition the multiplier usage between LUT=
s and DSPs?

Thanks in advance !!!

Re: Area optimization (optimizing DSP48E usage)
If you're instantiating a LUT-based multiplier core, you can't expect the t=
ools to turn it into a DSP48-based multiplier during the implementation. Th=
is is something that you need to do at the RTL level: instantiate as many D=
SP48-based multiplier cores as you can, and leave the rest as LUT-based one=

You may use the VHDL "generate" statement to selectively instantiate DSP-48=
 or LUT-based multiplier cores.


Site Timeline