3 input adder in Spartan 3E

Hi,

I have to design with 3 input adder, i.e. D = A + B + C, in Spartan

3E. The addition has to be finished in one 153.6MHz clock. When I do PAR, I met timing violations. Can anybody give me some advices on how to implement this design? (I can't upgrade to other device because of cost). Thanks very much.

best regards

skyworld

Reply to
skyworld
Loading thread data ...

You are probably way off timing depending on the bit-width of your words. You could use a pipelined adder, if you can stand the extra latency. If you need the add completed in a single stage, for let's say 16-bit words, your not going to hit 150 MHz on that device, but you could go faster by using a 3-2 compressor and then adding the final two terms using a carry look-ahead adder.

---Matthew Hicks

Reply to
Matthew Hicks

How large are the vectors you're adding?

Have you looked at the timing analysis of your path to sanity-check the timing violations? If you have long routing delays, that can be accommodated with relative placements to keep the register-to-adder and adder-to-adder routing delays down. Getting on and off the carry chain may limit you at this speed.

You may have to ask yourself if your requirement is *really* three adds in one cycle. What do you do with the data after the clock? Comparing to a constant, for instance, would allow you to take the difference of the constant with A in the first cycle and compare to a B+C result rather than a direct comparison to A+B+C. If the logic generating any two vectors is simple enough, the values could be generated and the difference taken before the first register.

I do know that a divider that needed to completely daisy-chain four

14-bit add/subtract stages per cycle was only happy at 66 MHz with some RLOC constraints. Your situation is a little better since you're not MSB carry-out to LSB, but not much. The time to get on and off that carry chain may swamp your results.

- John_H

Reply to
John_H

Hi John,

in fact this design is for sigme-delta transmission modulator. I need a filter to transform 10 bit parallel input data (15.36MHz) to one bit output stream (153.6MHz). The filter are composed by 3 stage adders. Each adder has three inputs, i.e., what I have mentioned D = A + B + C. Every adder has to finish A + B + C within one 153.6MHz clock so that 153.6Mbps data stream works well. I have tried to use pipeline adder, but for this structure failed. So I am searching a way for "fast adder algorithm", or "fast three input adder algorithm", which could be implemented in Spartan 3E and runs fast enough. Thanks very much.

skyworld

Reply to
skyworld

Would it be possible to build enough 3-input adders to produce enough data to meet the 153.6Mb/s requirement and operate them in a multiplexed fashion.

---Matthew Hicks

Reply to
Matthew Hicks

Still no answer to my questions: What is the size of the vectors? Have you sanity-checked the timing report for long routing? Is there somethings you can do before or after this "I have to have it now" cycle?

It's not obvious to me you can't simply move some things around. Where do the three values come from? Are the each results of previous 3-value adders? So terribly often, the problem can be repartitioned without compromising the system requirements. If you isolate your problem to a

3-value adder, you won't achieve your goals. If you expand your problem to the stages before and after or to the system level, you can make this work. You just probably can't make a 3-value adder work.

And if you do answer my questions or provide more details, you might also include the speedgrade device you're targeting.

- John_H

Reply to
John_H

As mentioned, if you're not feeding back your outputs into your filter, then you probably can pipeline this. Can you let us know why this doesn't work?

if rising_edge(clock) then A_plus_B

Reply to
Symon

Reply to
Peter Alfke

Reply to
Peter Alfke

Why can't you pipeline like always @(posedge clk) begin AB

Reply to
Uwe Bonnes

In the OP's second post he did say: "I need a filter to transform 10 bit parallel input data (15.36MHz) to one bit output stream (153.6MHz)." So I assumed that he meant 10-bits parallel data is input, then multipled and added, and then the ten-bit result is shifted out 1-bit at a time.

-Dave Pollum

Reply to
Dave Pollum

Then he would only need to add at 15.3 MHz which should be pretty easy to do.

---Matthew Hicks

Reply to
Matthew Hicks

Reply to
Peter Alfke

Hi Gentlemen,

Sorry to give you with an imcomplete question.

well, the design could be described as:

always @ (posedge clk) begin a1

Reply to
skyworld

skyworld,

I thought this was a 10-bit modulator, so why 32 bits? You cannot get two 32-bit adders at 156 MHz in your device for a speed grade you didn't mention.

Can you generate adders in a previous step to provide in11+in21 (should this be in11+in12?), in21+in22, and in31+in32?

You can add another step to the pipeline and perform the 2-input additions in separate cycles. If you ABSOLUTELY cannot accept the added 6 ns latency, tell us how the inNN vectors are generate. Perhaps there's a way to get the in21+in22 style result in the previous cycle where the inNN registers are generated. Sometimes simple logic can be pulled into an adder.

Also, what do you do with the a1, a2, and a3 values? You may similarly be able to push the accumulator into the next cycle and use the result combinatorially if your post-add logic is simple.

When you can't reach speed, either redistribute the code or rethink your system needs to allow more pipelining.

- John_H

Reply to
John_H

Use a pair of Carry Save adders.

A Carry Save adder takes three inputs, and produces two outputs, a partial sum, P, and a vector of carry bits, C. It's actually just an array of independent full adders, so calculation time is independent of the data width. To get the full sum add P to C using a normal adder.

Take two of these in series to add four inputs to two outputs. Send the outputs back to the inputs again and accumulate on each clock cycle. It only needs two four-input LUTS, one per output, so it's as fast as you like. Add up the total sum from the output latch whenever you need to.

Chain several sets of adders in a row, and compute the total sum of each using separate adders (Carry Lookahead or otherwise) in parallel.

Reply to
MGT78000

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.