How the synthesizer acutally works.

Hi guys, To know how the synthesizer behave,i wrote logic to add 4 vectors in three different ways.And i got differnet result from the synthesizer(used both ISE and synplify). These are the three different approchs i made

1.*************************************************************************************************** module add( input clk, input [1:0] a,b, input d, input [63:0] c, output reg [64:0] out );

reg [1:0] in1,in2; reg in4; reg [63:0] in3; wire [64:0] result;

always @ (posedge clk) begin {in1,in2,in3,in4}= {a,b,c,d}; out= result; end assign result= in1+in2+in3+in4;

endmodule

2.*************************************************************************************************** module add1( input clk, input [1:0] a,b, input d, input [63:0] c, output reg [64:0] out );

reg [1:0] in1,in2; reg in4; reg [63:0] in3; wire [64:0] temp; wire [64:0] temp2; wire [64:0] result;

always @ (posedge clk) begin {in1,in2,in3,in4}= {a,b,c,d}; out= result; end

assign temp= in1+in2; assign temp2= temp+in4; assign result= temp2+in3;

endmodule

3.*************************************************************************************************** module add2( input clk, input d, input [1:0] a,b, input [63:0] c, output reg [64:0] out );

reg [1:0] in1,in2; reg in4; reg [63:0] in3; reg [64:0] result;

always @ (posedge clk) begin {in1,in2,in3,in4}= {a,b,c,d}; out= result; end always @ (*) begin case({in4,in1,in2})

5'b00000: result= in3+3'b000; 5'b00001: result= in3+3'b001; 5'b00010: result= in3+3'b010; 5'b00011: result= in3+3'b011; 5'b00100: result= in3+3'b001; 5'b00101: result= in3+3'b010; 5'b00110: result= in3+3'b011; 5'b00111: result= in3+3'b100; 5'b01000: result= in3+3'b010; 5'b01001: result= in3+3'b011; 5'b01010: result= in3+3'b100; 5'b01011: result= in3+3'b101; 5'b01100: result= in3+3'b011; 5'b01101: result= in3+3'b100; 5'b01110: result= in3+3'b101; 5'b01111: result= in3+3'b110;

5'b10000: result= in3+3'b001;

5'b10001: result= in3+3'b010; 5'b10010: result= in3+3'b011; 5'b10011: result= in3+3'b100; 5'b10100: result= in3+3'b010; 5'b10101: result= in3+3'b011; 5'b10110: result= in3+3'b100; 5'b10111: result= in3+3'b101; 5'b11000: result= in3+3'b011; 5'b11001: result= in3+3'b100; 5'b11010: result= in3+3'b101; 5'b11011: result= in3+3'b110; 5'b11100: result= in3+3'b100; 5'b11101: result= in3+3'b101; 5'b11110: result= in3+3'b110; 5'b11111: result= in3+3'b111;

endcase end

endmodule

And the results for these from the ISE are

1.*************************************************************************************************** Selected Device : 4vlx15sf363-12

Number of Slices: 105 out of 6144 1% Number of Slice Flip Flops: 134 out of 12288 1% Number of 4 input LUTs: 128 out of 12288 1% Number of bonded IOBs: 135 out of 240 56% Number of GCLKs: 1 out of 32 3%

Minimum period: 5.212ns (Maximum Frequency: 191.872MHz) Minimum input arrival time before clock: 1.445ns Maximum output required time after clock: 3.921ns Maximum combinational path delay: No path found

2.*************************************************************************************************** Selected Device : 4vlx15sf363-12

Number of Slices: 76 out of 6144 1% Number of Slice Flip Flops: 134 out of 12288 1% Number of 4 input LUTs: 72 out of 12288 0% Number of bonded IOBs: 135 out of 240 56% Number of GCLKs: 1 out of 32 3%

Minimum period: 4.793ns (Maximum Frequency: 208.616MHz) Minimum input arrival time before clock: 1.445ns Maximum output required time after clock: 3.921ns Maximum combinational path delay: No path found

3.*************************************************************************************************** Selected Device : 4vlx15sf363-12

Number of Slices: 712 out of 6144 11% Number of Slice Flip Flops: 135 out of 12288 1% Number of 4 input LUTs: 1329 out of 12288 10% Number of bonded IOBs: 135 out of 240 56% Number of GCLKs: 1 out of 32 3%

Minimum period: 6.377ns (Maximum Frequency: 156.803MHz) Minimum input arrival time before clock: 1.459ns Maximum output required time after clock: 3.921ns Maximum combinational path delay: No path found

*****************************And the Result from the Synplify are***************************************************************** 1.*************************************************************************************************** Mapping to part: xc4vlx15sf363-10 Cell usage: FD 134 uses GND 1 use MUXCY 1 use MUXCY_L 127 uses XORCY 128 uses LUT1 125 uses LUT2 4 uses

Mapping Summary: Total LUTs: 129 (1%)

----------------------------------------------------------------------------------------------------------------------- add|clk 1.0 MHz 143.8 MHz 1000.000

6.952 993.048 inferred Inferred_clkgroup_0 ======================================================================================================================= 2.*************************************************************************************************** Mapping to part: xc4vlx15sf363-10 Cell usage: FD 134 uses GND 1 use MUXCY 1 use MUXCY_L 127 uses XORCY 128 uses LUT1 125 uses LUT2 4 uses

Mapping Summary: Total LUTs: 129 (1%)

Starting Clock Frequency Frequency Period Period Slack Type Group

----------------------------------------------------------------------------------------------------------------------- add1|clk 1.0 MHz 143.8 MHz 1000.000

6.952 993.048 inferred Inferred_clkgroup_0 ======================================================================================================================= 3.*************************************************************************************************** Mapping to part: xc4vlx15sf363-10 Cell usage: FD 134 uses GND 1 use MULT_AND 2 uses MUXCY 1 use MUXCY_L 63 uses VCC 1 use XORCY 63 uses LUT1 61 uses LUT2 1 use LUT3 3 uses LUT4 2 uses

Global Clock Buffers: 1 of 32 (3%)

----------------------------------------------------------------------------------------------------------------------- add2|clk 1.0 MHz 139.9 MHz 1000.000

7.146 992.854 inferred Inferred_clkgroup_0

Can any one please help me why i am getting this much difference in the result and what should be the real approch to write in HDL to get most optimised result. Thanks in advance

Reply to
subint
Loading thread data ...

Top posting to avoid the 4 pages of original post, included at the end...

The synthesizers appear generally not to be smart enough to group the additions by size first. We can't ask them to do all the work but it would have been nice to get better results without the extra nudge. The nudge? In Synplify it's called a syn_keep and there's a similar attribute in XST (though I know not what it's called.

The specific knowledge of the hardware can be used to get the "best" results. Since we know 4-input LUTs are available in the Virtex-4 (larger in the Virtex-5) it would be "most" efficient to add in1 and in2 with LUTs then add that result with in3 and have in4 be a carr-in to this last add.

While "temp" values are great for readability, there's no guaranteed behavior when those values are combinatorial with no directives attached; the synthesizer will look at the overall combinatorial "logic cone" feeding the output reg.

To get you "best" result, us a 3-bit temp variable

(* syn_keep=1 *) wire [2:0] temp = in1+in2; // Verilog2k attribute syntax

and add this to th 64-bit vector with a carry-in

out Hi guys,

1.***************************************************************************************************
2.***************************************************************************************************
3.***************************************************************************************************
1.***************************************************************************************************
2.***************************************************************************************************
3.***************************************************************************************************
1.***************************************************************************************************

-----------------------------------------------------------------------------------------------------------------------

=======================================================================================================================

2.***************************************************************************************************

-----------------------------------------------------------------------------------------------------------------------

=======================================================================================================================

3.***************************************************************************************************

-----------------------------------------------------------------------------------------------------------------------

Reply to
John_H

**=AD**************************
**=AD**************************
**=AD**************************
**=AD**************************
**=AD**************************
**=AD**************************
**=AD**************************

--=AD--------------------------------------------

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=AD=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

**=AD**************************

--=AD--------------------------------------------

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=AD=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

**=AD**************************

--=AD--------------------------------------------

Some speculation....

In the add2 architecture, I suspect that XST is implementing 8 (64 bit

  • "3" bit) adders and muxing out the result where Synplicity may have figured out to mux out the 3 bit vector and then add it to the 64 bit input. This would fit with the much larger footprint given by XST for add2. BTW, I own stock in Synplicity ;,)

It maybe that the the width of temp and temp2 in add1 maybe adding noise to the statistics. The statistics after Map in P&R tools may give better numbers and give apples to apples comparisons between the synthesizers.

-Newman

Reply to
Newman

**=AD**************************
**=AD**************************
**=AD**************************
**=AD**************************
**=AD**************************
**=AD**************************
**=AD**************************

--=AD--------------------------------------------

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=AD=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

**=AD**************************

--=AD--------------------------------------------

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=AD=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

**=AD**************************

--=AD--------------------------------------------

Some speculation ....

In the add2 architecture, I suspect that XST is implementing 8 (64 bit

  • "3 approx" bit) adders and muxing out the result where Synplicity may have figured out to mux out the 3 bit vector and then add it to the 64 bit input. This would fit with the much larger footprint given by XST for add2. BTW, I own stock in Synplicity ;,)

It maybe that the the width of temp and temp2 in add1 maybe adding noise to the statistics. The statistics after Map in P&R tools may give better numbers and give apples to apples comparisons between the synthesizers.

-Newman

Reply to
Newman

subin ur thinking very loudly now..... keep it up..... and never let it down.... only by this u can learn new things.... any way abt ur doubt...... we human can understand what ur doing in the algo is same... but not the machines.... ur first two cases are atleast comparable.... we need to think why it differed... like as John suggested may be its due to the blocking assignments.... like in the first case synthesizer can view it as four adds.... it can pump all the available optimization into that.... but for the second case we are forcing the synthesizer to look it as a three separate add operations... at that time the synthesizer may not be able to apply all the optimization techniques.... as john suggested u try to use the non blocking assignments.... tht may free up the synthesizer to look into the prob as a single operation..........

But the third one ... simply different thing...... its actually a decoder.... to decode the {in4,in2,in1} variable then some logic to form the operand to add to the in3..... plus ofcourse an adder to add that variable to in3.... i think u will get the same result with the following code case{in4,in2,in1} op = value based on different cases.... .... ... endcase result = in3 = op;

Reply to
vssumesh

result = in3 + op;

It looks like Synplicity thinks methods 1 and 2 are identical. I would think that a variation of method 2 should be able to get skinnied down a bit in the area of XORCY, LUT1, .... I was wondering if the map routine during the implementation phase would trim some of these out. Interesting academic exercise. For practical purposes, I think John had it about right for how to group things together. It would be interesting what the resource usage of coregen components would be that were structurally connected together. as stated by John.

-Newman

Reply to
Newman

and subin i didnt observed the last question...... what should be the real approch to write in HDL to get most optimised result. How can one suggest a general method or guideline for coding.... i think we can classify it as two separate class....

1) general functionalities.... like addition,multiplication,muxing etc... i think here we need to code them as direct as u done in the first code.... all the synthesizers i hope will have algos to deal with that... so no ponit in creating something like 2nd or 3rd coding style.... tht looks nt good in the HDL itself.... 2) The other things are unconventional functionalities... like what we implemented in the source formatin switching logic.... we know what to do but no machine can translate direct to the optimized HW... so what we do we also think abt it and find a way to implement it and code it that way..... I think when we are coding somrthing we need to differentiate between these two class...
Reply to
vssumesh

Thanks for all of the replys, John you are completly right.. it was because of the grouping of the adders making the difference.But how?... why the out=a+b+c+d is not equal to out = (a+b+c)+d;

Yes i intentionally made those input registers. This is the method i follow to generate the worst path using the synplify tool. The blocking and non blocking assignments not making any difference in 3 of my codes. ut as you suggested the "syn_keep" in the second i am getting the "best" result in both synplify and ISE. By changing the temp size to 3 itself helped to generate the "best" result in the ISE(without the KEEP constrains).

1.***************************************************************************************************
2.***************************************************************************************************
3.***************************************************************************************************
1.***************************************************************************************************
2.***************************************************************************************************
3.***************************************************************************************************
1.***************************************************************************************************

-----------------------------------------------------------------------------------------------------------------------

=======================================================================================================================

2.***************************************************************************************************

-----------------------------------------------------------------------------------------------------------------------

=======================================================================================================================

3.***************************************************************************************************

-----------------------------------------------------------------------------------------------------------------------

Reply to
subint

Hai sumesh, The second method is giving me the "best" result.By grouping the small adders together and adding that with the bigger one actually reducing the hardware.But i am surprised how it's implementing(without grouping).

regards sub> and subin i didnt observed the last question......

Reply to
subint

aftr par also u rgetting the same ??? let me think.... ok put it this way.....

64 bit + 1 bit -> needs one 64 bit carry propagation network.... Above result + 1bit -> needs one more 64 bit carry propagation network. like that.... but suppose.... 1bit + 1bit -> needs only two bit carry propagation network 64 bit + 2bit -> needs 63 bit carry propagtion network.... so second one is more efficient..... i am neglecting all the additions since its all two or 1 bit additions..... Any way i did not felt the power of this grouping neither do i carefully read the johnH first reply.... Sorry..... so i think in a single strecth addition the evaluation is from left to right.... ur case (in1+in2+in3+in4) ==> ((2bit + 2bit) + 64bit) + 1bit...... need two 64 bit carry chains........ try to change that order to in1 + in2 + in4 + in3; ..... also dont forget to test in3+ in4 + in2 + in1;.......... i think tht will give the maximum value.... pls test it and pls let me know... as u knw its been more than two months since i last touched the ISE.... one mre thing.... Where r u using this concepts.....???
Reply to
vssumesh

You appear to be responding to posts and not asking questions so the ability for others to read your message is less important, perhaps.

Do you want others to actually read what you say? I read through your previous post and had sincere trouble following along due to the abbreviations and lack of sentence/paragraph structure. Since you're not asking a question here, I'm just not reading the post.

If you don't care if your messages are read, you don't need to do anything at all. If you'd like to be part of the grand conversation, you will get more people to see what you're saying if you stick to a good written style. Scanning this message that I didn't read, it looks like there are fewer texting-style abbreviations. Great start. Avoid all the dots in your thoughts trailing off and instead use solid sentence structure and formatted paragraphs and your message will be inherently more readable.

I appreciate the interaction from most of the folks on this board (I only have one author on my kill list at this point - so much nicer that way) and would like to see the conversation open and not ignored.

I'm just making a recommendation here, no demands. Your posts today are simply the most difficult to read in the last several months.

Otherwise, thanks for the contributions.

Reply to
John_H

The grouping was once (back in the early 90s, at least by some tools) specifically order-dependent. Since the language became a standard and more synthesizers got better optimizations, the order of operations and the implied grouping with parenthesis no longer make the impact on synthesis one might hope in trying to optimize the code.

Since the synthesizers believe they can do a better job by looking at the entire logic cone, the synthesis results *should* be the same independent of order. The arithmetic elements are one example where it appears the synthesizer optimizations are "a little behind" where we'd want them to be. I *often* have syn_keep attributes around the adders in my code to make sure the proper "minimum" amount of logic goes into my adder and the register-to-register flow doesn't get broken up improperly.

Because the synthesis is based on the logic cone and not the way the equations are grouped, the use of parenthesis or additional temp wires will often affect the result little. Some, but little.

Knowing the silicon and checking slow or "large" simulation results (with the timing analysis and area reports, respectively) you can often find where the synthesis "goes wrong" and focus on the syn_keeps to reign the synthesizer back in. Often if you don't hit a time or area problem, these sub-optimal results are fine: the results are the same with just a little more power wasted in a part that still isn't full.

I like to optimize things so I can reuse my code later in different target systems and still get decent performance so I see a bunch of the missteps from synthesis. Luckily I haven't spent much time outside the one synthesizer so I haven't had to deal with the different moods of different tools.

Some days it can feel like a challenge, but who are we to step down from a challenge?

Reply to
John_H

Hi John, Sorry for the trouble. I also wanted to be part of this discussion as i used to be part of this group for the last two years. This group actually helped me to learn a lot in the FPGA based HW modeling. Sorry for the wrongly formated message i sent. The sentance structure may be some times wrong as i am not that strong in english. But last time what happend is; i was little bit excited to see my old collegue to write something on this group. It remind me about the old times when we both were actively participated in these discussions. Thats what happend. Sorry for that. Anyway my actual intention is only to be a part of all these discussions. Sorry for every thing else. Thanks for your suggestion. regards Sumesh V S

Reply to
vssumesh

And *thank you* for becoming an active part of this board.

- John_H

Reply to
John_H

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.