Area optimization (optimizing DSP48E usage)

I am trying to map, place & route a large design on a Xilinx Virtex 6 FPGA Target Device : xc6vlx550t Target Package : ff1759 Target Speed : -2

My mapping process fails with the following errors:

ERROR:Pack:2310 - Too many comps of type "DSP48E1" found to fit this device= . ERROR:Pack:2860 - The number of logical carry chain blocks exceeds the capa= city for the target device. This design requires 100940 slices but only has 85920 slices available that allow carry chains. ERROR:Map:237 - The design is too large to fit the device. Please check th= e Design Summary section to see which resource requirement for your design exceeds the resources available in the device. Note that the= number of slices reported may not be reflected accurately as their packing might not have been completed.

When I inspect the Mapping report file, I see: Interim Summary

--------------- Slice Logic Utilization: Number of Slice Registers: 460,088 out of 687,360 66% Number used as Flip Flops: 399,848 Number used as Latches: 0 Number used as Latch-thrus: 0 Number used as AND/OR logics: 60,240 Number of Slice LUTs: 388,284 out of 343,680 112% (OV= ERMAPPED) Number used as logic: 384,856 out of 343,680 111% (OV= ERMAPPED) Number using O6 output only: 311,180 Number using O5 output only: 10,716 Number using O5 and O6: 62,960 Number used as ROM: 0 Number used as Memory: 114 out of 99,200 1% Number used as Dual Port RAM: 0 Number used as Single Port RAM: 0 Number used as Shift Register: 114 Number using O6 output only: 114 Number using O5 output only: 0 Number using O5 and O6: 0 Number used exclusively as route-thrus: 3,314 Number with same-slice register load: 0 Number with same-slice carry load: 3,313 Number with other load: 1

Slice Logic Distribution: Number of LUT Flip Flop pairs used: 584,470 Number with an unused Flip Flop: 125,987 out of 584,470 21% Number with an unused LUT: 196,186 out of 584,470 33% Number of fully used LUT-FF pairs: 262,297 out of 584,470 44% Number of unique control sets: 233 Number of slice register sites lost to control set restrictions: 854 out of 687,360 1%

Also, Number of DSP48E1s: 4,800 out of 864 555% (OV= ERMAPPED)

-----------------------------------------------------------------------

I did a quick calculation on design resource usage such as LUTs versus DSP4=

8E1s from the Xilinx Coregen GUI:

Multiplier1 uses 86 LUTs vs 1 DSP48E1. The design uses Multiplier1 x96. = So I am looking at either 96 DSP48E1s or 8256 LUTs.
Multiplier2 uses 142 LUTs vs 1 DSP48E1. The design uses Multiplier2 x470=
So I am looking at either 4704 DSP48E1s or 667968 LUTs.

I tried different options to synthesize my design using LUTs and using DSPs= . Before I partition my design, I just wanted to check with everyone here, = on how the multipliers can optimize the usage of DSP48Es vs LUTs. The curre= nt mapping report indicates all the multipliers were mapped using DSPs, hen= ce 4800 DSPs.=20

How can the XST tool or the mapping partition the usage of the multiplie= rs using both DSPs and slice logic? Is this possible with some constraint?

The multiplier cores are currently set for Area optimization vs Speed op= timization and I have used "use Mults" option. If I set "use LUTs" option, = will the XST and Mapping process partition the multiplier usage between LUT= s and DSPs?

Thanks in advance !!!