FPGA one-shot

- J
- John Larkin
  
  Contact options for registered users
posted
6 years ago

Thu, Dec 14, 2017 3:43 AM

I have an async signal, call it TRIG, inside a Zynq 7020.

At the rising edge of TRIG, I want to make an async one-shot. It will leave the chip as RX and reset some outboard ecl logic. Anything from, say, 2 ns to 10 ns width would work.

The board is built, and we can't easily add more connections to the FPGA or hack in glue logic. Well, it would be ugly.

Here are some ideas:

formatting link

We could play with i/o cell slew rates and drive strengths to tune the timing. And use as many delay stages (circuit B) as we like... there are tons of unused balls.

Or maybe use some internal delay path, if we can find one that is reasonably repeatable.

The compiler will probably let us do circuit A or B without whining much, but might object to the third one.

I grew up on async hairball logic, so this seems reasonable to me, but my FPGA guys are horrified. We don't want to spin up a 250 MHz PLL here and do it synchronously, for various reasons.

An internal passive pullup resistor charging an i/o pin capacitance would be fun, but I don't think we could make a short enough blip.

Any other ideas or comments?

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 4:02 AM

I could do all sorts of stuff if I could access some FPGA balls, but I can't. I brought out some test points, but we used them for another hack!

And kluging parts onto the board is second-best to doing something invisible like this.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics

- R
- Richard Damon
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 4:29 AM

The big issue with trying to do this sort of stuff asynchronously in an FPGA is that timing is tremendously dependent on routing and placement, including at times things that you really can't control at the typical programming level. It is quite different from the old days of using discrete chips where the gates were the slow part, and the wires were mostly negligible. Inside the FPGA, the wires are significant part of timing, and the tools and chip designs go to a LOT of work to make synchronous stuff work.

My guess is that the most repeatable method would be to build a delay line by instancing a buffer, with the synthesis directives to force the system to not optimize/remove the buffer, and force it at one end of the FPGA, and then have that drive a similar buffer which you have forced to be on the other side of the chip, and so on back and forth until you have built up enough delay.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 7:10 AM

John Larkin wrote on 12/13/2017 10:43 PM:

The variation in delay in an FPGA for any given route aren't all that bad, about the same as with regular logic. I assume Xilinx still has a manual chip editing tool. It will give you delays of routes. So you can do a run of the chip and manually reroute the one delay path to get your time delay. There are ways to force placement of the FF with attributes in your source code. So as long as the routes you need are not used it would be a simple matter to hand route the same path each time. Getting a 2 ns minimum pulse width shouldn't be hard at all.

You seem to be thinking you can tune the loop with I/O pad delays, but that will still require manual work in the chip editor to make adjustments each time you get a different route on the delay path and so need different I/O pad slew rates.

One other thought is to use some number of LUTs as the main delay element, there are ways to force the use of such a component in HDL source. By constraining the placement to cells that are in the same logic block you will get consistent route delays and routing variation should go away. I believe it is the inter-logic cell routes that have lots of variation.

The main reason why your FPGA guys are reacting in horror is because they know what a royal PITA it will be to learn the tools well enough to make all this happen.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 10:49 AM

"Some internal delay mechanism" on your block diagram could be an IDELAY (or IODELAY), which gives you a calibrated delay that will be independent of PVT. Of course, it's independent of PVT because you give it (actually an IDELAY_CTRL) a reference clock at 200MHz (or some other, higher frequencies). Max delay is a few ns, delay resolution is some tens of ps, and jitter is some tens of ps as well.

I recently ran some ring oscillator experiments in the same FPGA family. I used LUTs as delay elements and when coded (in VHDL) such that all elements were in the same slice and the routing was all in the local switchbox, I measured a frequency of 945MHz for a 3 element ring. That should give you some idea of the achievable delays.

The placement and routing is quite easy to control from your favourite HDL, once you know how. This is important to get right as otherwise the results will not be repeatable.

Watch the minimum pulse widths on the FF clear input. This will be specified in the datasheet somewhere.

I'm guessing you want an IOB FF rather than a CLB flip flop though. The IOB FF are described in the "SelectIO Logic Resources" chapter of UG471. They should be as fast as the internal FF. Maybe faster, as they are designed for superior metastability resolution.

[Assuming the trig input is on a pin.] You also have the option of using special IO clocking resources to get the clock from a pin to the FF clock input with much less delay / delay variation / jitter than you would get through the global clock networks. (These are the clocking resources that are used for DDR3 data clocks, etc. so they have to have low, predictable delays.) This will only work if you put the trig input on the correct pin (as not all pins can be used as clock inputs this way), but hey, of course you picked that up at the schematic design review.

Allan

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 11:39 AM

This Xilinx forum thread gives some examples of placement and routing in VHDL:

formatting link

Regards, Allan

- T
- Theo Markettos
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 11:44 AM

Is this a mature product, or one which is likely to see frequent updates?

That may direct your strategy. If it's mature, it might be feasible to use the ECO tools to manually add cells to an existing design.

If it's still in flux, you probably need to understand how to direct the tool that this piece of logic needs special treatment and should be constructed like so. This means it will persist over respins of the rest of the logic. You will likely still need to verify over a number of respins that it does in fact persist, given that it's hard to get this right.

Theo

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 2:12 PM

It's higher in FPGAs since the wires are longer (higher capacitance), though distance between gates may (or may not) be similar. The wires in an FPGA are "fixed" length, where they are only as long as needed in an ASIC. There is also a lot of capacitance from all of the muxes (pass gates) hanging off the wire. The higher magnitude will mean a higher variation, too.

A very poor way of doing things but it may be the only way to make such a kludge.

Not to mention the maintenance of this kludge for the life of the product.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 3:02 PM

Something like this was done in test equipment from a major manufacturer. They needed to mux a clock and the delay through the chip needed to be minimal. I don't recall if the LUT was hand placed or not, but the routing was done by hand in the chip editor. I found out about it because we had to touch the chip. My boss was the guy who had originally done this and not documented a single thing on it. He gave a demonstration on how to do the hand mod to few of us and that was how he passed the torch, by oral tradition. lol

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 3:18 PM

When you say "routing", it doesn't appear to deal with the actual routing. He does mention that the attributes assign specific I/Os on the LUTs and so which pin is connected to which is determined. But the routing interconnects still need to be wired up in the chip editor I believe.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 3:35 PM

What a wonderful way to run a company. I assume they're no longer in business?

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 4:54 PM

The virtue of putting most of the delay into i/o cells is that they will behave the same independent of compiles. And the slew/drive strength params can be set without any hand routing or fighting the tools to do something they don't want to do. We can hang timing constraints on the presumably short runs to the dflop to keep that uncertainty low.

Rob here suggested that an adder/carry chain might have a more consistent internal delay (it's a fixed structure) than routing delays, which might change every compile. Maybe a MAC?

I'll ask my folks to add a couple of experiments to the next compile. We are iterating the design to add and test features once or twice a week. This thing is maybe 20x as complex as our average design, which is our excuse for not thinking out everything in advance.

--

John Larkin         Highland Technology, Inc 

lunatic fringe electronics

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 5:39 PM

John Larkin wrote on 12/14/2017 11:54 AM:

There is nothing inherent in the IOBs that makes their delay more consistent than any other internal logic component. The issue is the routing

*changing* when you recompile the design. *That* will give widely varying timing unless you lock the placement of each component so there is a direct route available between them. I have not studied the routing flow of the newer families of Xilinx devices, but I believe if you stay within a local block of logic the routing resources have very direct paths, so the timing with not change appreciably on different passes. This location constraint can be relative so it simply puts the logic in the same block but allows the block to "float" anywhere in the device. If you also want to minimize the delay from the leading edge of the trigger pulse you need to further constrain the logic to be adjacent to the IOB which should again be able to use dedicated routing for adjacent blocks.

The adder carry chain has a defined delay just like any other block of logic, but the delays are very short per bit, I recall around 200 ps, but may be less in the newer devices. Again, no advantage other than being able to customize the pulse width with very high resolution which you appear to have indicated is not important.

The way to reduce variation in the pulse width is to use logic block local routing which will have much less potential variation. In all cases you will have PVT variation in timing which you can't do anything about. It is not clear if the delay time (trigger to leading edge of pulse) is important. If it is you will likely still need to deal with widely varying routing delays between blocks (IOB and logic). The delay from trigger input to the FF can be constrained I believe although I've never used this. So the routing from the FF to the output should be optimized through placement.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 5:50 PM

Ever hear of TTC? I was interviewed with TTC but came on board Acterna. TTC merged with Dynatech and WWG along with nearly a billion in debt which ultimately sunk the company, not anything about their designs. I believe TTC was very well respected in the test equipment world. I don't know who bought the pieces.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 6:31 PM

IDELAY sounds ideal for setting my pulse width, because it's calibrated and tweakable. We'll look into that.

I did a ring oscillator to measure chip temperature, on a part that didn't have an internal sensor.

formatting link

--

John Larkin         Highland Technology, Inc 
picosecond timing   precision measurement  

jlarkin att highlandtechnology dott com 
http://www.highlandtechnology.com

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 7:06 PM

John Larkin wrote on 12/14/2017 1:31 PM:

It still has to be routed to the logic. So you haven't worked around the problem of wildly variable routing delays which add to your pulse width.

As usual John is trying to work with things he doesn't understand by applying methods he has used on totally unrelated designs. Sit down and draw a block diagram showing not just the delay you wish to control, but the delays on every bit of wire in the design. Then maybe the picture will emerge.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998

- J
- Jon Elson
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Dec 14, 2017 7:49 PM

This signal is generated within the FPGA and sent out? And, you just want to stretch it? Make a counter with a few bits, set it to zero when TRIG occurs, and count up at the available clock rate, and generate RX. When the counter reaches the max, turn off RX and don't increment the counter again.

This is so simple, I must be misunderstanding what you want to do.

Jon

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Dec 15, 2017 2:59 AM

No. I think I know why.

Makes complete sense. Knowledge locked up in someone's head => company in pieces.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Dec 15, 2017 5:56 AM

You didn't read a word I wrote. The company failed because like so many in the dot com bubble they didn't see it was a bubble and borrowed a shit-ton of money to expand, then when the bubble burst they couldn't pay the debt. it had nothing to do with the dick-head I had for a boss.

The dick-head came from the company's history developing as a startup and having loose engineering management principles. So he didn't know any better because they never taught him any better. But the company was a technical success. Ask anyone who works in telecom if they've ever used a T-BERD. It is highly regarded test equipment from what I've heard and very widely used.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Fri, Dec 15, 2017 10:21 AM

the

in VHDL:

routing.

and so

There is no manual step needed. Once you lock the pins, the routing will be fixed (to an extent).

I haven't done manual routing on an FPGA since the '90s.

I haven't done manual placement for a few months. Even then, it was all in the form of relative placements in HDL, so the tools still have the ability to move the entire macro around on the die.

Allan