addsubs on FPGA

Hi,

I have a query on the RTL designing for addsub based implementations.

I heard that addsubs are not preferred on FPGAs as they produce worse area and timing QoR. Is it true ? Is resource sharing not preferred in general o n FPGAs.

However, if I try a very simple design of addsub shown below it shows me no difference. May be in case of small examples, the difference in implementa tion might not be evident. That is why I wanted to ask a broader audience.

The reasoning & cases for both 'yes' and 'no' will help in understanding th e cause ?

Thanks Vipin

module addsub(a, b, oper, res); input oper; input [7:0] a; input [7:0] b; output [7:0] res; reg [7:0] res; always @(a or b or oper) begin if (oper == 1?b0) res = a + b; else res = a - b; end endmodule

Reply to
sh.vipin
Loading thread data ...

I first got interested in FPGA addition and subtraction in the XC4000 days. The XC4000 has a special carry logic that may or may not do this operation. The carry logic changed completely between the XC4000 series and later series, though.

In the pre-IC days, it was common to build logic, called ALU, which can implement add, subtract, and some bitwise logic operations using an optimal number of transistors or gates. Similar logic went into TTL.

Well, one possible implementation is adder and subtractor, followed by mux to select. But modern logic optimization tools should be able to do better. You could also write:

res = a + (oper ? b:-b);

which may or may not fit the FPGA better. (Seems to me closer to the way that the carry logic works, though.)

If you want optimal LUT use, or minimal delay, then you need to look more carefully at what it is doing. Otherwise, the logic minimization will apply to the whole system, such that it may or may not matter.

-- glen

Reply to
glen herrmannsfeldt

I

and

Such statements often heard about preferences in FPGAs are not always appli cable to all manufacturers' FPGAs or even all of the same manufacturer's FP GA families. What might not have worked well at some time months or years a go may not be an issue today with another FPGA family. Your tests seem to s how it works fine for your target FPGA and tools. Different synthesis tools (including different versions of the same tool) may also affect the reults .

On a slightly different issue, IMHO, creating a design where an adder and/o r subtractor is a separate module to be instantiated makes the larger proje ct's code less readable and understandable, unless you are specifically tr ying to re-use a given adder or subtractor's implementation (not just the c ode) to save utilization on the project.

Don't borrow trouble unless you have to. Write the RTL so that you can unde rstand the function it has to perform (not the way you'd design the hardwar e) first, then see if that meets your performance/utilization requirements (not your personal desire to make the "best" implementation). You'd be amaz ed what a good synthesis tool can do these days. The folks that have to mai ntain your design (which may be yourself in 6 weeks/months/years) will than k you for it.

Andy

Reply to
jonesandy

The above has an ambiguous carry out depending on how the -b is implemented.

If -b is implemented as ~b+1 then for subtract res = a + ~b + 1 which makes the carry out the result of the +1 increment and not the addition. A simple test case is when a and b are 0.

If the -b is a true -b then res = 0 Carry = 0 If the -b is ~b+1 then res = 0 Carry = 1

Might be better to restate the above as

res = (oper ? b:-b) + a;

which doesn't have this ambiguity.

I run into this a lot writing code generators for compilers.

w..

Reply to
Walter Banks

Yes. As I noted, there was a big change after the XC4000.

Hmm. Hard to say, but in the ones I work on, it is more readable as a separate module. But it might be that the OP was using this to show the question, and not actually code that way.

As far as I know, the tools first flatten the netlist, so it doesn't change the result at all.

It has always seemed to me that people who knew how to design hardware, knew about gates and such, wrote better HDL. That is, not think of it as writing software (like C), but as wiring up gates.

But yes, as with software, write for readability.

There are cases where the performance goal is "as fast as possible." In this case, compare the logic against the logic of a fixed adder. If it is the same speed, then use it. If it is a lot slower, then see why it is slow. Another possibility is to pipeline the complement stage before an adder.

-- glen

Reply to
glen herrmannsfeldt

Interesting, but since res, a and b are all the same size (in bits), in this Verilog statement, there is no observable carry out, so there is no ambiguity.

If res were bigger than a and b, then I'm not sure what it would do (but I'm sure it's defined somewhere). I use VHDL.

Andy

Reply to
jonesandy

new

ng

I'm almost the opposite. I see RTL written by very experienced HW (not HDL) designers, and it often reads like a netlist. Might as well have coded it in edif and saved the cost of a synthesis license.

It's not their fault. We don't spend time teaching HDL designers how a synt hesis tool analyzes their code, and why it infers a register, a latch(!), a RAM, or combinatorial gates. We teach all these cook-book approaches to de signing FPGAs and ASICs using the same primitive functions they used with s chematics.

We are sequential thinkers, not parallel thinkers. Therefore, it is best th at we describe the desired behavior (on a clock cycle basis) in a sequentia l context (an always block or process), and let the synthesis tool infer pa rallelism where it is possible (they're excellent at that). Use functions a nd procedures to break out subsets of sequential behaviors. Instead of thin king in registers (circuit elemenst), think in clock cycles of delay (behav ior). The registers are going to get shuffled around by retiming/pipelining optimizations anyway. The clock cycle delays will still be there. Just be careful around asynchronous inputs!

Of course, when the functionality is so complex that it cannot be easily ex pressed in a single sequential context, then it must be broken up into sepa rately instantiated parallel contexts (entities or modules), each including their own detailed behavior in a sequential context.

My point is, we can understand (and therefore express and maintain) more co mplex behavior when it is conveyed in a sequential context. Imagine a casse role recipe written in concurrent statements.

In my professional experience, such cases are pretty rare. But fun when the y happen.

Especially if oper and b are both available early!

andy

Reply to
jonesandy

I would have to look up the rule if I was actually doing it, but yes, verilog knows about carry if the register is wide enough, and it is supposed to ignore the carry if there aren't more bits.

I have found some synthesis tools that complain about the loss of the carry. Unlike most programming languages, verilog looks at the size of the destination (left side of assignment).

Well, I usually write continuous assignment, not behavioral assignment. I believe the rules are the same, but I am not sure about that.

Does VHDL have something like the verilog continuous assignment?

-- glen

Reply to
glen herrmannsfeldt

Yes, VHDL has concurrent assignment statements in several forms: direct, conditional and selected (like a case statement on the RHS), as well as concurrent procedure calls.

It is difficult to describe an iterative behavior, such as priority encoding or "counting ones," with concurrent statements; these are much easier with sequential statements.

Andy

Reply to
jonesandy

OK, but HDL is inherently parallel, and, more and more, software programming, as multicore systems get more and more popular.

I believe that C programmers, and other high-level language programmers, who know how to write assembler code tend to write better HLL code. They don't have to think about the generated code for each statement, but still know which constructs generate better code.

Some time ago, I was designing systolic arrays with the goal of at most two level of logic (two LUTs) between registers.

But registers are what make systolic arrays work, so there really isn't any ignoring them.

A systolic array is a long array, hundreds to thousands of stages, of fairly simple unit cells.

Mostly, I don't have anything against behavioral HDL, but am less sure about people who want to write HDL in C.

If you are building a factory to produce thousands of them a day, then you probably have to consider it in parallel. For home cooking, though, serial usually works.

(snip)

-- glen

Reply to
glen herrmannsfeldt

(snip, I wrote)

Verilog has the conditional operator (?:) like C and Java.

Not so hard, as I think I have done both of them.

The usual implementation of counting ones is a carry save adder tree. It isn't so hard to write, but, yes, the usual tools generate them pretty well.

Well, once I needed a ones counting that would generate, zero, one, two, three, or more than three from a 40 bit input, and with one pipeline stage. I wrote the logic for an 8 bit version, used five of those, a register stage, and then enough logic to combine the results.

Counting up to 8 bits is about as easy with and without a loop.

-- glen

Reply to
glen herrmannsfeldt

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.