Counting ones

Peter's: solution for 48 bits: 2 BlockRAMs, 16 LUTs BlockRAM tcko + 2 levels carry chain John's solution for 30 bits: Resources: 46 LUTs (giving him 1 LUT per CYAdd bit) Delay: 2 levels LUTs + 3 levels carry chain John_H's solution for 31 bits (mine, at end): Resources: 41 LUTs Delay: 3 levels LUTs + 2 levels carry chain

See original post for John's earlier comments as well as my own. In John's form of ASCII art is the 3-input LUT form of a 31 bit adder that won't work for 32 bits wihtout adding a bit of extra junk but is a more common need.

- John_H

___

0-| F | ___ 0-| A |-1-----1-| |-2-----+ 0-|_3_|-0-----0-| * |-1---+ | ___ +---1-| |-0-+ | | 0-| F | | +-0-|___| | | | 0-| A |-1-+ | | | | | 0-|_3_|-0---+ 0 | | | ___ 0-----------------+ | | +-2-| C | ___ | +---1-| Y |-3-------+ 0-| F | ___ +-----0-| A |-2-----+ | 0-| A |-1-----1-| |-2-------2-| d |-1---+ | | 0-|_3_|-0-----0-| * |-1-------1-| d |-0-+ | | | ___ +---1-| |-0-------0-|___| | | | | 0-| F | | +-0-|___| | | | | | 0-| A |-1-+ | | 0 | | | | 0-|_3_|-0---+ 0 | | | | | 0-----------------+ | | | | | 0---------------------------------+ | | | | ___ | | | | 0-| F | ___ | | | | 0-| A |-1-----1-| |-2-----+ | | | | 0-|_3_|-0-----0-| * |-1---+ | | | | | ___ +---1-| |-0-+ | | | | | | ___ 0-| F | | +-0-|___| | | | | | | +-3-| | 0-| A |-1-+ | | | | | | | +---2-| C |-4 0-|_3_|-0---+ 0 | | | ___ | +-----1-| Y |-3 0-----------------+ | | +-2-| C | +-------0-| A |-2 ___ | +---1-| Y |-3---------3-| d |-1 0-| F | ___ +-----0-| A |-2---------2-| d |-0 0-| A |-1-----1-| |-2-------2-| d |-1---------1-| | 0-|_3_|-0-----0-| * |-1-------1-| d |-0---------0-|___| ___ +---1-| |-0-------0-|___| | 0-| F | | +-0-|___| | 0 0-| A |-1-+ | | 0 | 0-|_3_|-0---+ 0 | | 0-----------------+ | | 0---------------------------------+ | 0---------------------------------------------------+

LUT count: 16 12 8 5 = 41

  • The 2bit+2bit+1bit LUTS (0-7 result) are implemented as 3 levelsof logic implemented with 3-input LUTs.
Reply to
John_H
Loading thread data ...

...oops. Mine was 45 LUTs. 16 luts in the second stage, not 12. Delays stand as indicated

Reply to
John_H

Reply to
Peter Alfke

Correction/amendment to (30,5) counter (6 CLBs): ___

0-| F | 0-| A |-1-------+ 0-|_3_|-0-----+ | ___ | | ___ 0-| F | | +-1-| F | 0-| A |-1-----|---1-| A |-2-----------+ 0-|_3_|-0-+ +-|---1-|_3_|-1---------+ | ___ | | | ___ | | ___ 0-| F | +-|-|---0-| F | | +--2-| | 0-| A |-1---+ +---0-| A |-1--+ +----1-| C | 0-|_3_|-0---------0-|_3_|-0--|-----------0-| Y | | ___ | A | +--1-| F | | d |-3----+ +---1-| A |--2-| d |-2---+| | +-1-|_3_|--1-| |-1--+|| | | +-0-|___|-0-+||| | | '0'--+ | |||| | | 0 |||| 0---------------------------|-|--------------+ |||| ___ | | |||| 0-| F | | | |||| 0-| A |-1-------+ | | |||| 0-|_3_|-0-----+ | | | |||| ___ | | ___ | | |||| 0-| F | | +-1-| F | | | |||| 0-| A |-1-----|---1-| A |-2-|-|-------+ |||| ___ 0-|_3_|-0-+ +-|---1-|_3_|-1-+ | | |||| | | ___ | | | ___ | | |||+-3-| | 0-| F | +-|-|---0-| F | | | ||+--2-| C | 0-| A |-1---+ +---0-| A |-1---|-----+ | |+---1-| Y |-4 0-|_3_|-0---------0-|_3_|-0---|---+ | | +----0-| A |-3 ___ | | | | | d |-2 0-| F | | | | | | d |-1 0-| A |-1-------+ | | | | +----3-| |-0 0-|_3_|-0-----+ | | | | | |+---2-| | ___ | | ___ | | | | ||+--1-| | 0-| F | | +-1-| F | | | | | ___ |||+-0-|___| 0-| A |-1-----|---1-| A |-2---|-+ | | +--2-| C |-3-+||| | 0-|_3_|-0-+ +-|---1-|_3_|-1---+ | | +----1-| Y |-2--+|| 0 ___ | | | ___ | +------0-| A |-1---+| | 0-| F | +-|-|---0-| F | +--------2-| d |-0----+ | 0-| A |-1---+ +---0-| A |-1--------------1-| d | | 0-|_3_|-0---------0-|_3_|-0--------------0-|___| | | | 0 | 0--------------------------------------------+ | 0-----------------------------------------------------------+

LUT Count: 18 + 12 + 2 + 10 + 6 = 48

48 LUTs = 6 CLBs Pipeline to taste...
Reply to
John

My final answer (until someone asks an interesting variation on the question...) OP asked about VHDL implementation, John_H and I gave graphs, so the code below is illustration of how to build the graphs in VHDL, for both the other John's tree (simple tree), and mine (modified tree), for 30 bits. Some observations on synthesis results using XST:

LUT utilization (from map report)... Simple - 44 LUTs Modified - 43 LUTs

This is a trivial difference. Because the code for the simple tree is, well, simpler, it is preferred. Smart folk should be able to genericize it so that the tree structure is built automatically for arbitrary input size, by a synthesis tool that accepts unconstrained bus inputs and outputs. (Does XST do this yet?)

Timing (from par report smallest Spartan3)... Pad to Pad delay... Simple - 17+ ns Modified - 17+ ns Clock period (with the FFs at the end commented back in, so as to take the pad delay out of the equation)... Simple - 10+ ns ( 6 levels ) Modified - 10+ ns ( 6 levels )

No pin assignment or other attempts to map logic used.

Timing can be pushed around a little by playing with map/par options, but not necessarily the way you'd expect (who called it 'pushing the rope'? Great metaphor!) The keep attribute, trying to force the tool, may backfire.

So there it is. I like this one, because it is such a neat example of how even the simplest of operations ( Counting ones!!?!?! ) may not be so simple. You need to be aware of what the compiler actually does with a behavioural model, and how to push it, and how not to push it.

Regards, John p.s. thanks to Dan C., local X-FAE, for helping me get this thing through XST. Like the OP, I'm not a big VHDL guy either (just don't like the wordiness/aesthetics of it), I am liking Handel-C more and more.

p.p.s the code (un-simulated, you get what you pay for):

library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity TestAddBits_Top is generic (UseFA3 : boolean := FALSE ); Port ( Clock: in std_logic; InBits : in std_logic_vector(29 downto 0); SumOut : out std_logic_vector(4 downto 0)); end TestAddBits_Top;

architecture VirtexII of TestAddBits_Top is subtype TwoBit is std_logic_vector(1 downto 0); subtype ThreeBit is std_logic_vector(2 downto 0); subtype FourBit is std_logic_vector(3 downto 0); type Leaf is array (0 to 8) of TwoBit; type Level1 is array (0 to 3) of ThreeBit; type Level2 is array (0 to 1) of FourBit; type Level1A is array (0 to 5) of TwoBit; signal LeafSum: Leaf ; signal L1Sum: Level1; signal L2Sum: Level2; signal L1SumA: Level1A; signal FA3: std_logic_vector(1 downto 0); signal SumBits: std_logic_vector(4 downto 0); signal Bits: std_logic_vector(29 downto 0);

--attribute keep : string;

--attribute keep of LeafSum: signal is "true";

--attribute keep of L1Sum: signal is "true";

--attribute keep of L2Sum: signal is "true"; begin

-- Simple adder tree to count 30 bits. Uses 44 LUTs T1: if ( UseFA3 = FALSE ) generate addleaf: for I in 0 to 7 generate -- Add 8 leafs LeafSum(I) Delay: 2 levels LUTs + 3 levels carry chain

Reply to
John

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.