accumulator (again)

- J
- jmariano
  
  Contact options for registered users
posted
11 years ago

Mon, Jul 2, 2012 11:20 PM

Dear All,

I'm not an expert in VHDL, i'm just a curious trying to solve a research problem with an FPGA.

I'm using a 32 bit accumulator in a IP, as part of a SoC project with a microblaze, implemented in a Digilent Spartan-3 SKB ( the FPGA is a Xilinx XC3S200). The code is included at the end of this message. The input is a 32 bit signed integer coded in two's complement and the output also a 32 bit signed integer. What I would like the accumulator to do is to accumulate synchronously with the rising edge of clk when enb=1 and maintain the result stable at the output when enb=0 ( enb is a asynchronous signal generated elsewhere in the system)

But it does not work in this way, it behaves in a strange manner...

Some times I get the expected results but often I get strange values (large when they should be small, often negative instead of positive, etc.). If I look at the binary representation of the output, it looks like if the output din't had time to sum and propagate to the output again. In fact, the post place and route simulation shows that when the enb signal goes to 0, the output stays in a undetermined condition (you know, red line with XXXX).

I'm guessing I'm doing a very basic mistake that as something to do with the timing of the enb signal, but after 3 days banging my had to the wall, all I have is a a monumental headache.

Can some kind soul help me with this?

jmariano

================

library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all;

entity int_accum is port (clk:in std_logic; clr:in std_logic; enb:in std_logic; d: in std_logic_vector(31 downto 0); ovf:out std_logic; -- overflow q: out std_logic_vector(31 downto 0)); end int_accum;

architecture archi of int_accum is

signal tmp : signed(32 downto 0);

begin

process(clk, clr) begin if (clr = '1') then tmp '0'); elsif (rising_edge (clk)) then if (enb = '1') then -- The result of the adder will be on 33 bits to keep the carry tmp

- E
- Ed McGettigan
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jul 3, 2012 12:19 AM

s

bits to keep the carry

This is the key to your problem:

You can't expect to take an asynchronous signal into multiple (32 in this case) registers in a synchronous domain and expect that it will work reliably. You need to first synchronize the asynchronous input to the synchronous clock domain before you can use it.

Ed McGettigan

-- Xilinx Inc.

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jul 3, 2012 5:24 AM

Which means that you should latch enb in a register, with the same clock that you're using to twiddle your accumulator, and use the output of that register as your enable signal.

Paranoid logic designers will have a string of two or three registers to avoid metastability, but I've been told that's not necessary. (I'm not much of a logic designer).

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

- B
- Brian Drummond
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jul 3, 2012 2:31 PM

... The

You have one likely answer from Ed and Tim : unless you KNOW that the input signals "enb" and "d" are already synchronous with "clk" you MUST synchronise them.

But there is another problem:

tmp

- E
- Ed McGettigan
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jul 3, 2012 9:45 PM

a

he

b is

e

u

33 bits

=20

It isn't just the paranoid logic designer, it should be every logic designe= r. =20

A single register only partially solves the problem of an asynchronous inpu= t with multiple register destinations, but it does not solve the very real = metastability problem. At least two registers should be used to ensure tha= t the metastability condition has resolved and with increasing clock freque= ncy and finer process nodes using three or more stages may be necessary.

Ed McGettigan

-- Xilinx Inc.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, Jul 4, 2012 7:49 PM

h a

=A0The

or

n

enb is

,

s

the

you

o

ut

n 33 bits

t to

k

at

o

ot

ner.

put with multiple register destinations, but it does not solve the very rea= l metastability problem. =A0At least two registers should be used to ensure= that the metastability condition has resolved and with increasing clock fr= equency and finer process nodes using three or more stages may be necessary= .

Hi Ed. They way it was explained to me, I believe from Peter Alfke, is that what really resolves metastability is the slack time in a register to register path. Over the years FPGA process has resulted in FFs which only need a couple of ns to resolve metastability to 1 in a million operation years or something like that (I don't remember the metric, but it was good enough for anything I do). It doesn't matter that you have logic in that path, you just need those few ns in every part of the path. In theory, even if you use multiple registers with no logic, what really matters is the slack time in the path and that is not guaranteed even with no logic. So the design protocol should be to assure the slack time from the input register to all subsequent registers have sufficient slack time.

Do you remember how much time that needs to be? I want to say 2 ns, but it might be more like 5 ns, I just can't recall. Of course it depends on your clock rates, but I believe Peter picked some more aggressive speeds like 100 MHz for his example.

Rick

- J
- jmariano
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jul 5, 2012 11:44 AM

Dear All,

Thank you very much for your input and sorry for the late reply. It is really great to be able to get the opinion of such experts, specially since, at my current location and in a radius of some 200 km, I must be the only person working with FPGA and VHDL! I'm also glad that the discussion as evolved to levels of complexity far beyond my knowledge.

I was hoping that by now I would be able to say that the thing was working as expected but, unfortunately, no.

I've synchronized the enable signal, as suggested by Ed and Tim, using

3 FF (I'm not paranoid, I just have room). Also, following Brian suggestions, I've clean up the code regarding type conversions. All this as allow me to isolate the remaining source of error, thank you very much.

Here's the full story: I'm implementing a gated integrator, as a part of a boxcar averager. This is the standard noise reduction technique used in nuclear magnetic resonance (nmr). This is research, not a commercial product! The module gets is data from 4 8 bits ADC's at 5 MHz (adc0, adc90, adc180, adc270) and accumulates wile enb=1. enb is generated in a different module. The module does this:

1 - generates the acquisition clock (adc_clk) by division by 10 of the S3-SKB 50 MHz main clock 2- generates the accumulation clock (acc_clk) by inverting adc_clk. In this way, there is a delay of 100 ns from the moment the ADC's receive the rising edge of the clock to the moment when the data gets registered at the output. 3 - converts the data from the adc's to excess 128 (bipolar adc) and extends to 32 bit signed 4 - calculates u = adc0-adc180 and v=adc90-adc270. u and v go through a switch and emerge as r and i, to be delivered to 2 alike accumulators. Of course, 3 and 4 must occur in less than 100 ns.

The switch unit is very simple: It has a control signal, s[1:0] that comes from a different module, and the following table: 00 -> r=u, i=v; 01 -> r=v, i=-u; 10 -> r=-v, i=u; 11 -> r=-v, i=-u. The s signal is generated in a different clock domain and is stable 500 us before the enb. enb has a typical duration of 10 us. The code is at the end of this message.

I continue to get errors, specially when the input values are closed to zero, which means that the result is changing from say FFFFFFFF to

00000001, so lots of bits to change.

I have (i think!) trace the source of error to the switch_unit because, if I tie the s signal to a fixed value, 11 for example, the unit works well, but if I connect to a real s signal, I get errors. So I thought, this must be because the real s is noisy and r and i change during the acquisition period (1mm ns) so I have synchronized s with acc_clk, but the problem persists. What is more strange is that, if I do s

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jul 5, 2012 5:12 PM

e s signal

I'm not real clear on your description of your design, but if you are really generating clocks from the 50 MHz, I recommend that inside the FPGA you instead use a single clock and generate clock enables for the various functions. When you use multiple clocks in a circuit you have to do extra work for every signal that crosses a clock domain. Could that be your problem?

I don't see anything in your original post about simulation. Do you simulate your modules? I highly recommend that you write a test benche for each and every module you code. You may think this takes too much time, but I believe it pays off in the end with shorter integration time.

Rick

- E
- Ed McGettigan
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jul 5, 2012 6:03 PM

ith a

a

=A0The

ator

hen

( enb is

.

es

ve,

oks

ut

n the

(you

o

to

=A0out

on 33 bits

n

l

put to

ock

that

to

not

igner.

input with multiple register destinations, but it does not solve the very r= eal metastability problem. =A0At least two registers should be used to ensu= re that the metastability condition has resolved and with increasing clock = frequency and finer process nodes using three or more stages may be necessa= ry.

I'm glad to see that one of my 5-6 attempts to post was finally accepted by= Google. I have got to switch to something else.

Peter Alfke's publications on metastability definitely fall into the semina= l category, but you must be careful to extrapolate the original data to the= latest technology nodes, circuits and design requirements. There are two = major factors that impact the metastability equations, the tau or metastabi= lity decay rate and the settling time. =20

The tau value is an inherent characteristic of the circuit and technology n= ode and for a long time the expectation was that this is would decrease wit= h each generation, but this has stopped being true.

The settling time, Ts, is dependent on the design and is under the user's c= ontrol. Ts is a factor of the destination clock frequency and the timing sl= ack between registers. If you have 100 MHz clock frequency, but you use up =

9.5nS to get to the destination your slack is only 500pS. Adding register s= tages allows for maximum use of the clock period increasing the settling ti= me and for each stage it increases again.=20

Ed McGettigan

-- Xilinx Inc.

- J
- jmariano
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jul 5, 2012 6:04 PM

Hi Rick, tanks for your help.

Yes, I generate a 5 MHz clock inside the module from the main 50 MHz clock = by simple division by 10 because I need a 5 MHz adc clock. I can't use cloc= k enable because the AD9058 adc does not have a enable input, just clock.

What is the extra work? Have no idea! Synchronization?

Sorry about that, I did, in fact, simulate each module and the top entity. = The behavior simulation gives the expected results, the post and place simu= lation gives same errors that I could not understand, but I'll run the simu= lations again and post the results here.

jmariano=20 =20

- E
- Ed McGettigan
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jul 5, 2012 6:09 PM

k by simple division by 10 because I need a 5 MHz adc clock. I can't use cl= ock enable because the AD9058 adc does not have a enable input, just clock.

. The behavior simulation gives the expected results, the post and place si= mulation gives same errors that I could not understand, but I'll run the si= mulations again and post the results here.

The good news here is that you have a simulation that shows the same behavior in hardware. Looking at these simulation runs should tell you exactly what the problem is. I don't think that anyone here will be able to the same with the full source code for the design.

Ed McGettigan

-- Xilinx Inc.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jul 5, 2012 9:16 PM

rs to

'm not

esigner.

s input with multiple register destinations, but it does not solve the very= real metastability problem. =A0At least two registers should be used to en= sure that the metastability condition has resolved and with increasing cloc= k frequency and finer process nodes using three or more stages may be neces= sary.

by Google. =A0I have got to switch to something else.

nal category, but you must be careful to extrapolate the original data to t= he latest technology nodes, circuits and design requirements. =A0There are = two major factors that impact the metastability equations, the tau or metas= tability decay rate and the settling time.

node and for a long time the expectation was that this is would decrease w= ith each generation, but this has stopped being true.

control. Ts is a factor of the destination clock frequency and the timing = slack between registers. If you have 100 MHz clock frequency, but you use u= p 9.5nS to get to the destination your slack is only 500pS. Adding register= stages allows for maximum use of the clock period increasing the settling = time and for each stage it increases again.

The info I am referring to are posts that were made here and pertained to the "current" generation of some six or eight years ago. At that time Peter made the point that the "tau" as you call it, had gotten so fast that the impact was negligible for all but the most stringent designs and only a small amount of slack time is needed.

A quick search found these two posts about V2Pro devices. I assume your newer devices are at least as good as 10 year old technology. Note that Peter makes a point that the capture window T0, which is a product in the formula, is not an important parameter. Tau is an exponent (in ratio with Tslack) in the formula and so makes much larger contribution to the result. The same is true for the two clock frequencies, they are just products in the formula and so don't make huge changes to the MTBF.

So it seems like not much would have changed in 10 years in how a designer should deal with metastability. Leaving 2 ns of slack time in the first register to register path should make literally all designs extremely robust regardless of how many registers are receiving the first register output or if there is logic in the path. Just make sure there is 2 ns slack time and your designs should be good for many, many years!

Rick

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Peter Alfke comp.arch.fpga Oct 10 2002, 8:40 pm

You mentioned metastability, and that caught my attention.

Metastability is a reality, but it (and the fear of it) is highly overrated. We recently tested Virtex-IIPro flip-flops, made on 130 nm technology. You might call that cutting edge technology, but not exotic. When a 330 MHz clock synchronized a ~50 MHz input, there was a 200 ps extra metastable delay ( causing a clock-to-out + short routing + set-up total of 1.5 ns) once every second. That translates into a metastable capture window that has a width of 3 ns divided by 100 million ( since we looked at both edges of the 50 MHz signal). So the window for a 200 ps extra delay is 0.03 femtoseconds. If you can tolerate 500 ps more, the MTBF increases 100 000 times, and the capture window gets that much smaller. Metastability is a real, but highly overrated problem.

Peter Alfke, Xilinx Applications =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Peter Alfke comp.arch.fpga Oct 15 2002, 1:11 pm

Here are the K2 values for Virtex-IIPro:

CLB @1.50V: K2 =3D 27.2, i.e. 1/K2 =3D tau =3D 36.8 picoseconds CLB @1.35V: K2 =3D 23.3, i.e. 1/K2 =3D tau =3D 42.9 picoseconds CLB @1.65V: K2 =3D 35.7, i.e. 1/K2 =3D tau =3D 28.0 picoseconds

IOB @1.50V: K2 =3D 24.4, i.e. 1/K2 =3D tau =3D 41.0 picoseconds IOB @1.35V: K2 =3D 19.24, i.e. 1/K2 =3D tau =3D 52.0 picoseconds IOB @1.65V: K2 =3D 44.05, i.e. 1/K2 =3D tau =3D 22.7 picoseconds

For each extra 100 ps of acceptable metastable delay, the MTBF increases by a factor 10.3 for CLB @ 1.35 V, or a factor 6.85 for IOB @ 1.35 V. Much better values, of course, at nominal or high Vcc.

Klick on

formatting link

in early November.

Here is the worst-case data point:

50 MHz asynchronous data rate, 330 MHz clock , single-stage synchronizer in IOB, Vcc =3D 1.35 V: clock-to-Q + short routing + set-up time + metastable delay exceeds clock period once per 30,000 years.

At nominal Vcc: once per 100 million years.

At a 250 MHz clock rate, delay exceeds clock period less often than once per billion years.

Peter Alfke, Xilinx Applications =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

- L
- langwadt
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jul 5, 2012 9:48 PM

k by simple division by 10 because I need a 5 MHz adc clock. I can't use cl= ock enable because the AD9058 adc does not have a enable input, just clock.

you could just have a state machine running at 50MHz that grap data and set/clear the clock

which I guess is partly what you have in you divide by 10

-Lasse

- B
- Brian Drummond
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jul 6, 2012 9:43 AM

That's OK.

But you need to register the AD9058 outputs, inside the FPGA, to your internal 50MHz clock. I would also register the S input and the U,V outputs from the switch. (In fact I would make the switch a synch process with only "clk" in its sensitivity list - it will effectively register the switch outputs for you)

All these can be combined into a single synchronous process.

-- assuming u,v,r,i,adcnn are all signed! process(clk) begin if rising_edge(clk) then -- First pipe stage... synchronise the inputs if adc_enable then -- 10 MHz, when ADC is stable adc0_int

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Jul 7, 2012 2:00 AM

Ahh, but are they paranoid enough?

The key is settling time.

In the old days of TTL chips, a pair of FFs (with no logic in between) got you settling time of as much logic as the worst case delay for the rest of the system. In practice, that was enough.

With FPGAs, routhing is important. A pair of FFs close together is probably good enough. If you put them on opposite sides of a big chip, the routing delays may match the long path of the logic delays and eat up all of your slack time.

Have any FPGA vendors published recent metastability info? (Many thanks to Peter Alfke for all his good work in this area.)

I'm not a silicon wizard. Is it reasonable to simulate this stuff? I'd like to know worst case rather than typicals. It should be possible to do something like verify simulations with lab typicals and then use simulations to find the numbers for the nasty corners.

--
These are my opinions.  I hate spam.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Jul 8, 2012 10:38 PM

I'm not sure what you would want to simulate. Metastability is probabilistic. There is For a given length of settling time there is some probability of it happening. Increasing the settling time reduces the probability but it will never be zero meaning there is no max length of time it takes for the output of a metastable ff to settle.

Is that what you are asking?

Rick

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Jul 9, 2012 5:25 AM

Ed McGettigan wrote: (snip)

(snip)

Metastability can be a problem, but often the problem is clocking multiple FFs off the same clock edge, with different delays on either the clock or data. (The chance of the delays being exactly equal is close to zero.) The two effects are different.

Note, for example, the common FIFO implementation using a gray code counter (or binary to gray code converter). That avoids the clock edge problem, as either value will work correctly.

Metastability is a different problem, but one that also occurs when using asynchronous input values.

-- glen

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Jul 9, 2012 5:38 AM

rickman wrote: (snip)

(snip)

I suppose that is true, but really it shouldn't be a problem. It is usual for many systems to clock as fast as you can, consistent with the critical path delay. As metastability is exponential, even a slightly shorter delay is usually enough to make enough difference in the exponent.

That assumes that there is a FF to FF path that is faster than the FF logic FF path. I believe that is usual for FPGAs, but if you manage to get a critical path with only one LUT, then I am not so sure. But that is pretty hard in most real systems.

I would expect most systems to have at least a 10% margin. That is, the clock period is at least 10% longer than the critical path delay. Probably closer to 20%, but maybe 10%. So, with a 10ns clock there might be only 1ns slack. Assuming some delay, say 1ns minimum from FF to FF, that has nine times the slack, and that is in an exponent.

-- glen

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Jul 9, 2012 5:42 AM

(snip)

(big snip)

I believe that most FPGA families have FFs with clock enable.

Be sure that you are writing your logic in such a way that the tools figure that out. In most cases, I believe that means not writing it as a gated clock. Write it as FF's with enable.

(I know how to write it in verilog but not VHDL.)

-- glen

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Mon, Jul 9, 2012 6:12 AM

(snip)

That is a good question. I usually assume that they won't have a long route, but that might not be a good assumption.

Some time ago, I was working on a small design in a very large FPGA. Expanding to fill the available space, things were very far apart. (And, as I had so much space, I put three FFs in to synchronize, but with long enough routes even that could fail.)

As I noted previously, though, often the problem isnt' metastabilty but multiple FFs on the same asynchronous clock. Different problem.

-- glen