How to speed up the critical path (Xilinx)

Hi there,

I would be happy about some suggestions on how I could start to make my design faster. My design is a processor (18 bit datapath)and the critical path looks like this:

  1. Instruction register (containing number of register)
  2. Register file (distributed RAM)
  3. Mux (2-way, selects either register or RAM)
  4. Mult18x18 within the ALU
  5. Mux (ALU output selector)
  6. Register file (distributed RAM)

Target is a Spartan3 speed grade 4. I ran PAR at highest effort. Tim Delay type Delay(ns) Logical Resource(s) ---------------------------- ------------------- Tiockiq 0.259 EX_Instr_adr1_1 net (fanout=18) 2.114 EX_Instr_adr1 Tilo 0.608 regs_a10_Mram_RAM_inst_ramx_0.F net (fanout=2) 0.693 EX_Regs1do Tilo 0.608 data1mux_Mmux_q_Result1 net (fanout=5) 2.617 EX_Data1 Tmult 3.493 alu_Mmult_prod_inst_mult_0 net (fanout=1) 2.378 alu_prod Tilo 0.550 alu_result16 net (fanout=3) 1.061 EX_Data3 Tds 0.519 regs_a14_Mram_RAM_inst_ramx_0.F ---------------------------- --------------------------- Total 14.900ns (6.037ns logic, 8.863ns route) (40.5% logic, 59.5% route)

Now how could I start improving the design? I don't want to split this up into two cycles (because instruction level parallelism is low and I need one result to compute the next).

I notice that the net delay of the instruction register is quite high. Does this have to do with the fanout? Fanout is 18 (because the value is used as an address to 18 parallel distributed RAM LUTs). I've heard of duplicated registers. Would that help? And then, how would I achieve it? Automatically through a setting? Manually? Is there an elegant way to do it?

Another thing I've heard about is RLOC constraints. I never dared try them so far. Do you think I could improve the design, and by how much?

Of course, I highly appreciate any (other?) suggestions on how to speed up my design. I might also consider changing the architecture, if it doesn't mean I have to change the whole concept of my processor.

Also, I am looking for good literature on FPGA implementation.

Thanks in advance! K.B.

Reply to
starbugs
Loading thread data ...

What I forgot to say:

  1. In case you wonder what the multiplexer (3rd thing in the path) is for: operands can come either from registers or from a RAM (pipelined with a register before the mux), that's why I need the mux here.

  1. The scarce resource in my design is block RAM, so it doesn't matter if it uses more area as long as that makes it faster. For some reason, if I set the mapper's optimization goal to "speed" instead of "area", the design gets even _slower_.

Reply to
starbugs

(...)

Yes, your problem is fanout. Duplicating registers could be the solution, but it has mainly two drawbacks:

- It usuallly uses more FPGA registers

- It may only move the fanout problem to the routing *before* the register. You could kill this problem creating a new one.

Try experimenting with the synthesis/implementation options. For instance, you could try to set maximum fanout to 10. In general, don´t try to topimise by hand, let xilinx tool try to do it.

Set optimization goal to speed, use timing constraints... try everything if possible.

BTW, I would register the multiplier outpu, call it "productResultRegister" and use this resgiter as another one from the register file. This way, instruction set might grow a little but timing might improve. (Test before doing too radical changes!)

Reply to
Zara

One use the pipeline tech,which insert registers in the datapath,can improve the working frequency but increase the data delay. the other is parallel tech if the fpga have enough resource.

Reply to
zqhpnp

KB, How about adding another multiplier so that you can eliminate the first mux. Have one multiplier for each source. Then make the second mux one port wider to accommodate the extra multiplier result. The Xilinx CLB muxes are expandable without adding too much extra delay. HTH, Syms.

Reply to
Symon

How much faster? Logic optimization tricks won't make you gain more than ... let's say 10% (providing that you already use retiming technics and that they give the best possible result).

Eric

Reply to
Eric DELAGE

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.