reset strategy FPGA Igloo

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Wed, Oct 16, 2013 6:26 PM

I'm not familiar with automotive use. I would think they would have been a much bigger company if there had been much automotive use of their parts. Just one design win is 500,000 a year in that market.

I sort of lump space, avionic and defense into the same category since they have similar requirements and are all much smaller than commercial markets. Perhaps we can call this generically "high rel" and also include the small commercial market segment. High markup, low volume...

That is the crux of the problem. While some brands of FPGAs do a reasonable job of detecting power on and resetting the entire device, it sounds like the Actel doesn't bother since... well, since it doesn't need to as long as it can rely on the user to reset it.

I'm not clear on this. It sounds like you are using the timer in

*place* of the lock signal from the PLL. My point was to condition the lock signal from the PLL. But since you say below that you don't know the state of the PLL lock signal at power up I suppose this won't work. It would be a simple matter to test it though. Or you can contact the factory. Since there is no configuration process the logic of the PLL should be ready and working on power up I would expect.

The fact that you are checking the upper bits prevents any pruning of the counter. I think your approach is to help assure that the counter has been reset before it will count down. I would not think this was workable. Either the counter is reset properly or the circuit can malfunction by lockup, possibly with global_rst = '0'. Also be aware that you are really only using 5 check bits since the sixth is used to flag end of timeout.

If this is a high rel application, I would not want to rely on a five bit checksum to control a reset. You say "guarantee" and "random", but I see the possibility that the counter starts up in the done state with reset never having been asserted, 1 in 64 chance. If you check all the bits in the counter for the done state it is still a 1 in 4096 chance of malfunctioning without ever producing a reset out.

If you can't rely on the PLL lock signal to be asserted at power up, I don't think you can rely on *any* logic in the FPGA to compensate for this. Why not investigate and find out if the PLL signal will work as a power on reset? Contact the factory and/or test it yourself on the bench. Bring the LOCK signal out to a pin and scope it while powering up the unit.

I think you need a reset signal for the device, period. Even if you manage to supply a reset to all the FFs in your logic, is there nothing else on the chip that requires a reset like the PLL itself? What other circuits are on the device other than the user configurable logic? Does the data sheet talk about a requirement for the reset signal?

The PLL issue and the idea of using the I/O pin to generate a reset are both issues I would contact the factory about. If I were in your shoes, I would push hard to have the module disqualified since the FPGA can not be assured to have been reset. That is the part that is insanity!

--

Rick

- M
- Mark Curry
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Wed, Oct 16, 2013 9:16 PM

Wow - can't keep up - lots of replies here. The GSR from Xilinx isn't the end all solution that Xilinx touts it as. The release of the global GSR is completely asynchronous. It doesn't really matter much that router may have trouble routing this signal with low skew. It's asynchronous - Murphy's law says that one FFs going to see the inactive edge of reset on one clock edge - the next FF's going to see the it on the following clock cycle. Low skew likely lowers the likelihood of the event, but it can happen none-the-less and should be accounted for in your design.

I've gone into this in the Xilinx forums some, but you've got to be careful on using that GSR...

Regards,

Mark

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Oct 18, 2013 9:19 AM

Hi Rick,

On 16/10/2013 20:26, rickman wrote: []

The idea is to use the 'lock' signal *and* the timer. The lock signal should be, at a certain point, signalling 'unlock' condition. This condition, even if not reliable, allows the following logic:

use 'not lock' to set the global reset
start count when 'lock' is reporting 'pll locked' (unreliable) *and* the upper part of the count is correctly set to a certain pattern
count up until the maximum lock time specified in the datasheet
release the global reset

If 'lock' signals chatters at any stage of its life (that I suppose is only happening when locking) does not matter, it will only mean the counter will restart counting.

I have to start from the assumption that my pll lock signal is, at a certain stage, signalling a pll not in lock. If the pll lock signal wrongly reports a pll not in lock is not an issue, while it is an issue if it wrongly reports a pll in lock.

If the counter *is not* reset properly then a system lockup is at least better than a logic that runs without a global reset.

Correct, I did not really care how many bits I'm using in the example, it was just to show the concept.

You have a point and, yes, I was too lazy to write the example in the proper way.

The investigation might be an extremely tedious process. Under which conditions should I verify the behavior? Our temperature range is -40 +

80, should I run the test in all conditions? Should the test be performed in thermovacuum (the application will run in low earth orbit). In order to verify that the lock signal is correctly reporting a lock is a not so straight forward path.

But I agree that relying on the fact that my pll lock signal is at least reporting 'not locked' at a certain stage is still an assumption.

[]

PLL does not require an external reset.

there's a little bit of RAM and FLASH units. Not much indeed.

Nope. Is up to the user to take care about meeting timing constraints.

That is a good hint and I'm actually doing it. I'll try to report back here at least to share the information with you all, somebody else might find it useful.

I agree in principle, but pushing hard is not always beneficial and being only a 'new comer' in the project I guess I do not have the critical mass to push that hard.

- H
- HT-Lab
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Oct 18, 2013 4:59 PM

Hi Alb,

On 18/10/2013 10:19, alb wrote: ..

yes!

I suspect you didn't have your CDR yet but the first thing that was discussed when I was working on satellites was the reset/POR circuitry. I worked on OBC's during the Wire mission (1999) and hence reset/supply rise time/unused jtag pins etc were hot topics.

Regards, Hans

formatting link

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Oct 18, 2013 5:14 PM

I don't think the answer to that question will be left to chance! (Nor its near equivalent, an answer in a usenet posting)

- T
- Thomas Stanka
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Oct 20, 2013 9:47 PM

Am Freitag, 11. Oktober 2013 23:46:33 UTC+2 schrieb alb:

Thats because Actel tend to provide global resources on all families that can be either reset, or clock, or just a high fanout net like enable. There are some slight differences from familiy to family, as for some fuse based there exist dedicated "clock-only" global resources, for those flashbased I used there was no difference between using them as clock or reset.

bye Thomas

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Oct 22, 2013 8:13 AM

Hi Hans,

On 18/10/2013 18:59, HT-Lab wrote: []

A full temperature test can be done at the subsystem level (board at minimum) since board layout may affect your results. The idea to route out to a pin the 'lock' signal and monitor it during a temperature scan is quite painful and I may still miss aspects of the characterization (like power variations) which are even more difficult to test unless a special test board is built.

Those are the main reasons why I would not rely on such tests unless the manufacturer or some other group has already done an intensive analysis and test campaign on the device. As a small group we can not afford the costs and time it takes to undergo such campaigns.

At the system level we are certainly performing ESS and TVT in order to meet requirements, but those tests are not the right place where we can verify such details.

The above comments apply even more in this case. The FPGA itself is only one part of a much larger system which has to have appropriate thermal paths in order to be tested in a TVT chamber. Routing the lock signal out of the FPGA and out of the subsystem to make it visible in a TVT Chamber is certainly out of question and, considering the cost of a TVT, rather unfeasible.

I'm not aware of any CDR done on this component since it is a payload subsystem. Unfortunately (or fortunately [1]) Research institutes are not always so strict in within their hierarchical organizations to demand PDR and CDR for their subsystems. We certainly went through CDR for our main interfaces with the hosting spacecraft (power, data, mechanics, harness).

[1] working without such a structure has some benefits up to some level since is less bureaucratic and much more pragmatic. This unstructured approach does not scale well though and for large projects what's at risk is not only the budget but also mission success.

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Oct 22, 2013 8:26 AM

I agree with you, since I'm neither a believer nor a good gambler! :-)

Let me say though, considering a TVT for such a payload (only 30Kg) is on the 40K$ range I would never even dream of verifying my lock signal in such conditions.

For each problem there's an appropriate environment where things need to be verified. In a TVT I cannot be worried about an 'and' gate working properly. Moreover the observability of your unit under test is so limited in a TVT that you can only verify your thermal calculations were accurate and every component is working in within the specified temperature range.

We typically run full functional tests (all possible mode configurations and external stimuli variations) during the TVT, but certainly are not looking at a fpga pin signal on a scope.

At the system level it is good to add as much embedded diagnostics (DFT) as possible to enhance observability and allow to anticipate and/or diagnose issues early in the process. These features are certainly neither free of cost nor without problems themselves.

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Oct 22, 2013 8:45 AM

Hi Thomas,

On 20/10/2013 23:47, Thomas Stanka wrote: []

apparently there's a 'clkint' buffer which is used to route global nets and 'buf/bufd' for high fanout nets. I haven't found yet what should be a reasonable fanout value I should consider before inserting a dedicated 'buf/bufd' but certainly the reset line has a high fanout.

Funny enough the p&r fails to meet timing requirements on some reset-to-clock paths even if reset is removed synchronously.

- T
- Thomas Stanka
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Oct 22, 2013 10:00 AM

Am Dienstag, 22. Oktober 2013 10:45:42 UTC+2 schrieb alb:

I'm not familiar with igloo family, but I guess allowed max fanout might be 10-20. If fanout exceeds this value a buf can be inserted as normal buffer tree. Clkint is for gloabl resources like clock, but also reset or enable with several hundred as fanout.

If you are not able to meet timing, that means very likely you have a buf-tree with to high fanout/depth instead of a global resource

regards Thomas

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Oct 26, 2013 8:07 AM

I won't argue that for a moment. Xilinx has exactly the same problem as the Actel devices. GSR is not a solution, you need to locally resync any reset to the clock. Further you need to design your circuit so that each reset section works if it is not released from reset on the same clock as other sections. But the problem is compounded if there is no GSR at all. Then you have to use other routing resources to spread the reset signal. But I guess it is six of one vs. half dozen of the other. With a GSR the resources have already been gobbled up by the GSR net before you even run the router.

--

Rick

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Oct 26, 2013 8:20 AM

Just a comment on timing analysis. We were doing a retrofit of an existing hardware design using an Altera Flex 10K part, IIRC. The tool was MAX+ II. The company I worked for had identified a problem with the tool that allowed it to pass timing analysis and fail on the bench. We decided it was clearly a timing issue because of the temperature sensitivity. Warm it up and it fails, cool it down and it passes. Not sensitive to which chip (other than small differences in the threshold temp) was used. Design changes modified it only slightly. We figured it was a poor timing estimation of a heavily loaded net, but were never able to prove that. Altera was no help for this problem sticking their head in the sand since they would be dropping support for this tool in another year.

Just a caution that passing static timing analysis is no indication that the design is actually meeting timing.

--

Rick

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Oct 26, 2013 8:59 AM

Nasty nasty nasty.

So, how do you[1] convince yourselves and your customers that each individual chip actually /is/ working with a reasonable margin?

No, I don't expect a neat easy response.

[1] the impersonal pronoun, since "how does one" sounds too stilted to the modern ear/brain combination :)

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Oct 27, 2013 4:34 PM

Can "one" ever assure "one's" customers that "one's" designs are entirely bug free? I have never been able to do that with *any* design. Why would FPGAs be any different?

My statement above may be a bit strong. Surely the static timing analysis tool is intended to verify timing. But it *can* be wrong, that is my point.

--

Rick

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Oct 27, 2013 5:46 PM

Of course, but that's a bog-standard and therefore uninteresting point. But it is the margin (or lack of it) that is the interesting question.

So, how do you assess the margin?

If there's a problem, how do you positively determine the cause is an internal margin problem? (As opposed to merely presuming)

If the problem was the static timing analysis of the chip's internals, then I'm concerned because internal points are somewhat difficult to directly observe. External timing is a different issue.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Oct 27, 2013 10:58 PM

Yes, I agree. The symptoms were the erratic nature of the failure in terms of routing. Then it would go away when the chip was cooled down. Thirdly some chips were consistently more sensitive than others. Finally it was also sensitive to Vcc. This all points to timing. Can't prove it, but we acted on that assumption and wrote some timing analysis tools ourselves. We eventually got a route that passed timing at elevated temperatures and low Vcc voltage and shipped the product.

Ever since I have not trusted the tools 100%. But then like I said, this was a product that was being replaced by Quartus in less than a year, so Altera wouldn't put any effort into working on the problem, even to see if it really existed.

You can draw your own conclusions.

--

Rick

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Thu, Oct 31, 2013 8:23 PM

Software geeks have been fighting compiler bugs for a long long time.

The thing that makes FPGA timing bugs so nasty is that you can't reasonably check the output. With a compiler, you can look at the instructions it produces. With a PCB router, you can eyeball the gerbers.

Many years ago, I made a list of all the possible places that could cause a board I was working on. At the high level, there were things like bugs in the board design, bugs in the individual PCB or in assembling the board, bugs in the firmware or FPGA or ... bugs in the driver

Mixed in with those were things like bugs in the tools (there are a lot of them) the board layout tools, their libraries the assembler for the firmware (which we had written) the FPGA tools bugs in the data sheets bugs in my reading of the data sheets

--
These are my opinions.  I hate spam.

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Nov 1, 2013 1:12 PM

it does not take 'software geeks' to fight bugs. It takes a process of development and verification that has a level of complexity that is certainly beyond anything that 'software geeks' can deem to conceive alone. The process is also full of compromises which are constraints driven and pitfalls as well. For further readings refer to [1].

I'm not sure what makes you think that you cannot 'reasonably' check the output. A synthesis tool provide a netlist and the netlist is verifiable. A P&R tool provides a bitstream which is verifiable.

The problem is that, unfortunately, being tools proprietary software with a non-standardized output format, is difficult for the *end user* to check. But the main developers of the tools can certainly check at each level of complexity they want, it is all a matter of 'pain vs. gain'.

In the open source software world (without wandering even further in the 'libre' software world) there's a level of peer-review that is orders of magnitude higher than in proprietary software and that is why open software is - by far - less buggy than proprietary software.

Now try to convince any EDA company to release their source code...

Not knowing the instruction set is exactly the same as not knowing the bitstream format for an FPGA. Having said that, even assuming you know how the instruction set looks like, there's still a lot of work to 'reasonably' verify the tool.

Be also aware that the level of 'reasonableness' is what companies have clear in mind, considering that complex bugs are difficult to find (meaning they cost money to the company), they decide what is the level of 'reasonableness' they pick according to their market.

Complex designs need a verification plan. There's no eyeballing that can help you. With a verification plan you can minimize the amount of time you spend on the bench to debug it, but it's all matter of the amount of risk you want to deal with.

[...]

every other bug you referred to has a root in this last one. Every spec, at each level of the design flow, might be misinterpreted since there's no process, AFAIK, that can verify the correct interpretation of a requirement.

Al

[1] An Assessment of Space Shuttle Flight Software Development Processes:

formatting link

- A
- alb
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Wed, Nov 6, 2013 4:07 PM

Hi all, here is a feedback from the FAEs at Microsemi concerning the power up reset, please see my comments inline if you are interested.

[]

According to the FAE it is possible to configure the internal weak pull-up resistor on the PIN configuration and profit of the same mechanism described in the AN I was referring to

formatting link

therefore *without* the need of an additional external pull up resistor.

Al

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Nov 8, 2013 2:47 PM

The app note goes into great detail about the timing of VCC and VCCI. In this discussion I believe they are talking about the input from the IBUF (RST_p) when they say, "The I/Os are tristated and the core logic detects '1' on the inputs from the boundary scan register (BSR)." It is not clear what sets the value in the BSR. It is also not clear how this determines the value of the RST_p signal.

Do you understand this portion of the reset design?

This entire circuit seems to depend on VCC reaching "its functional voltage level" before VCCI. Do you know that this is true for your board?

It would be good to have a dialog with the person who wrote the app note, but they don't say who this is. Much of the language usage would indicate it is someone for whom English is a second language and so might not be easy to converse with.

--

Rick