Multiple addition(2)

About the original message multiple additions, I forgot to add that I'm currentily using a Virtex II Pro 2vp100 device.

Thanks

Reply to
blackduck
Loading thread data ...

I'm

[...]

Hi BlackDuck,

Is pipelining an option ?

always adding 2 32..38 bit numbers (depending of the stage within your pipeline) leads to 7 Pipeline-Stages to add 256 numbers !!!

- 100 MHz - one Result on every clock - with an inital latency of 7 clocks

should be possible in a Virtex2Pro...

Cheers Jochen

Reply to
Jochen

Thanks Jochen,

Well, I can use the arrangement you propose to speed up the process, which solves part of the problem, about the pipeline, I am not sure if it will work since the design uses 250 filters which gives the 250 different values which have to be added to generate a single value. As you said, each two of them can be added in 7 stages to generate the final addition, but as I know a very little about pipeline I cannot imagine how to implement it. Can you explain me a little how the implementation would look like please?

Reply to
blackduck

well you make a additionner that add 2 number you put 125 additionner in parrallel to add your 250 number next level you use 63 additionner to add the 125 result of the first level and you do that till you have only 1 result so in final you will have 125 + 63+ 32+16+8+4+2+1=251 adders and don't forget that ouput of each level is 1 bit more than input

but another thing is that your input don't arrived all at the same cycle clock (or does virtex 2P have 8000 input pin??) so you could make an accumulator imagine one data arrive at each cycle clock, you add your new data to accumulator untill you have added your 250 data( = 250clock cycle)

"blackduck" a écrit dans le message de news: snipped-for-privacy@l41g2000cwc.googlegroups.com...

Reply to
KCL

Might be worth looking at the filters as well - what are they? If they're an FIR filter array then is it possible to do some of the additions in the MAC array?

Like you I suspect that there is some data folding somewhere - at the moment this device has an 80 Giga samples/second input - should be able to make the coffee as well...

Reply to
Stephen Maudsley

Hi KCL,

The design only receives 250 bits as inputs which are the incoming impulses to lowpass filters (250), these filters produce an output which has to be added to get the total response, then this total response in sent to a comparator, indication if the total response is greater to some value. Then in the first clock cycle, the comparator has zero as input from the filters, the second clock cycle, the comparator receives the total response from the filters and compares this against a value, and this process should be repeated each clock cycle. The device is able to run at 300 MHz, but the time required to get the total response is of course slower than this (plus internal delays), therefore I am trying to carry this process as fast as possible. The design has 250 bits as inputs and 1 bit output.

Sorry if initially sounded as a 8000 input design.

Thanks

Reply to
blackduck

"Stephen Maudsley" schrieb im Newsbeitrag news: snipped-for-privacy@assayer.co.uk...

moment

the

Or maybe its just a miscommunication. Maybe the OP just needs a FIR with 250 TAPs running @ 300 MHz. OK, 300 MHz is quite fast, but a FIR can be nicely pipelined, so that each MAC unit just needs to add 2 numbers within 1 clock cycle.

Regards Falk

Reply to
Falk Brunner

(snip)

The source wasn't mentioned, but it makes sense that it would be an FIR filter. If there are already 250 multipliers then the adders shouldn't be much more work. A 250 stage pipeline should be simpler than the pipelined adder tree to add 250 numbers.

Though the first use of carry-save adders I knew about was pipelined multipliers. It is possible that they are the outputs from the multiplier before they are added together.

Also, it may be that 250 cycle latency is too long.

More information about the system would help.

-- glen

Reply to
glen herrmannsfeldt

In fact you want to know the number of bit at level '1' ? if is that what you want do try :

library IEEE; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use ieee.NUMERIC_STD.all;

-- Uncomment the following lines to use the declarations that are

-- provided for instantiating Xilinx primitive components.

--library UNISIM;

--use UNISIM.VComponents.all;

entity essai is Port ( clk : in std_logic; rst : in std_logic; data_in : in std_logic_vector(249 downto 0); data_out : out std_logic_vector(7 downto 0) ); end essai;

ARCHITECTURE Behavioral OF essai IS

signal data_in_reg : unsigned(255 downto 0); signal result_level1: unsigned(255 downto 0); signal result_level2: unsigned(191 downto 0); signal result_level3: unsigned(127 downto 0); signal result_level4: unsigned(79 downto 0); signal result_level5: unsigned(47 downto 0); signal result_level6: unsigned(27 downto 0); signal result_level7: unsigned(15 downto 0);

signal resultat :std_logic_vector( 8 downto 0);

begin

process(clk) begin if rising_edge(clk) then data_in_reg this against a value, and this process should be repeated each clock

Reply to
KCL

^^

  1. you by-pass one of the results from the 125 adders to the 32 adders directly. If you have 63 adders at that stage, one of them will do a pointless "add to zero" operation.
--
	Sander

+++ Out of cheese error +++
Reply to
Sander Vesik

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.