Spirit on Mars

- P
- Pablo Bleyer
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Jan 22, 2004 5:23 AM

"Austin Lesea" escribió en el mensaje news:bup6dl$ snipped-for-privacy@cliff.xsj.xilinx.com...

Interesting. Under what conditions and what kind of tests are Xilinx rad-hard FPGAs tested for irradiation? Datasheets usually have too condensed information about rad parameters...

Regards.

- A
- Austin Lesea
  
  Contact options for registered users
posted
20 years ago

Thu, Jan 22, 2004 6:56 PM

Lest anyone spread rumors,

Spirit used a 4K QPRO part for the squibs that fired for the parachute, the inflatable bag, etc for the Lander.

The rover has Virtex 1000's in the wheels for position/motor control.

The fact that the Spirit has stopped sending useful data back to NASA is a terrible thing, but we got them there, and rolled them onto the surface. Other folks parts are supposed to do the communicating with Earth. Who volunteers to let us know whose components are used for that? What processor did they use?

Hope those folks figure it out, as it is a tradegy for everyone to lose the ability to gain knowledge of our solar system.

By the way, the self checks on the FPGAs after that solar flare that destroyed that Japanese satellite's electronics showed that the FPGAs were undamaged, and had suffered not at all from the flare (as we can take many rads of radiation, and not be affected at all).

The reason for the failure of other parts could be that they are NOT FPGAs. FPGAs are manufactured in huge volumes, and are all tested in the qualification for latch up under irradiation. Many SRAM,s and other products do not have the volume to afford such testing, and in fact recent shrinks of common parts are known to latch up with a single event, and destroy themselves.

Austin

- J
- Jake Janovetz
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Jan 23, 2004 1:12 AM

You're saying that FPGAs enjoy higher volumes than SRAMs? Interesting...

- U
- Uwe Bonnes
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Jan 23, 2004 10:18 AM

Austin Lesea wrote: : Lest anyone spread rumors,

: Spirit used a 4K QPRO part for the squibs that fired for the parachute, : the inflatable bag, etc for the Lander.

The Moessbauer Spectrometer has a Quicklogic QL30XX...

If I remember right, the APX has an Altera ...

Bye

--
Uwe Bonnes                bon@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik  Schlossgartenstrasse 9  64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Jan 23, 2004 3:47 PM

Jake,

No, you are correct about SRAMs, I am wrong there. My only point is that SRAMs can not all be tested at LANSCE in the beam, and when they do get around to it, there have been some spectacular single event latch up problems (ie instant destruction of the device).

Aust> You're saying that FPGAs enjoy higher volumes than SRAMs? Interesting...

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Jan 23, 2004 3:48 PM

Uwe,

Thank you. Very interesting. This is the kind of info that is useful to know.

Aust> Aust> : Lest anyone spread rumors,

- N
- Nicholas C. Weaver
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Jan 23, 2004 4:08 PM

I seriously doubt the paranoid EEs at JPL would allow any device which couldn't stand the radiation load, and wasn't batch-tested for the radiation load, onto the rover.

My personal bet would be software fault or a nontransient hardware fault in non-memory.

--
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Jan 23, 2004 4:21 PM

Pablo,

Thanks for the opportunity to let us brag a bit:

Test

We test all technologies as part of the qualification in the proton beam at UC Davis, and then at LANSCE in the neutron beam (the industry only gets a few days a year to do tests at LANSCE which is the only facility in the world with a HESS spectrum neutron beam).

Hot, Hot, Hot!

Can not take them home after the tests till they "cool off" as they are radioactive after spending so much time in the beam. Oh, and none of them ever suffer any damage -- they power on and meet all specs after hundred and hundreds of rads.

Real Tests

We also do atmospheric testing. We call this the "Rosetta" experiments, as they are intended to help us decipher the meaning of the LANSCE tests which are, after all, just an arbitrary test that has only a correlation to real performance.

Sea Level, 5100 Feet, 12,500 Feet, 13,200 Feet

We have 100 2VP50's here in San Jose, 100 2V6000's here in San Jose, 100

2V6000's in Albuquerque NM, 100 2V6000's on White Mountain, California (outside of Bishop, Ca), 100 3s1500's due to go online soon here in San Jose, and 100 2VP30's also here in San Jose. Another 110 2V6000's go to Mauna Kea Hawaii next week to the Caltech Submillimeter Observatory.

All of them our monitored every 2 hours for any single cosmic ray induced upset.

Analysis

This is a standard procedure, and we are the ONLY company that actually KNOWS how our parts are affected by cosmic neutron showers, alpha particles, etc in REAL applications from sea level to 60,000 feet (I can't talk about the programs we have for mil/aerospace until you sign an NDA).

Competitors

Other companies out there are in a state of "blissful ignorance" and when (not if) they start to have customers complain of failures, they will be saying, "gee, we don't see anything (because we can't), must be something you are doing." Why can't they see anything when a customer complains?

Xilinx Advanced Technology

Our advanced readback and internal access configuration port allows us to actually check all memory cell states to see if anything anywhere has flipped. We can then locate the exact cell that flipped (ie LUT, BRAM, config latch, etc. and from than know what the susceptibility of each one really is). We can identify if that bit is used in the customer's design, and what it does. Because we have had to do this for the military/aerospace community for years, we are able to do this for everyone else who may suspect that they have soft errors.

Reality

Customers are unlikely to see the problem as anything but a background annoying return rate of "no problems found" as powering down and up, or reconfiguring makes the "problem" go away!

At least we have been working on this for 5+ years, have patents pending, making improvements, and understanding exactly how things happen (upsets do happen....most people are totally unaware of this fact).

How We Assure Reliability

In addition to design techniques in silicon, we also have application design techniques to reduce the probability of soft error causing failure to 0 (ie Spirit and Opportunity, not to mention the hundreds of military and aerospace applications we "fly" in).

We are presenting papers at conferences (MAPLD 2003, for example) detailing our results for .5u, .35u, .22u. .18u, .15u, .13u and 90 nanometer. If interested, email me directly, and I will send you the MAPLD ppt presentation.

Call or Write for More Information

Or better yet, contact your Xilinx FAE for a full technical presentation!

Austin

PS: many have asked me if the information I present here is unique (proprietary) in any way. It is not. All information posted is published already (ie in the public domain). It is just that I do see all Xilinx press releases, and see all marketing communications, so I am aware of what we can (and are able to) post.

Pablo Bleyer wrote:

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Jan 23, 2004 9:34 PM

On a sunny day (Thu, 22 Jan 2004 10:56:52 -0800) it happened Austin Lesea wrote in :

It is obvious whodidit:

formatting link

- J
- jim granville
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Jan 23, 2004 11:37 PM

How much of this is non-invasive - ie can be done with the device operating, and how much needs it to be halted/paused ?

-jg

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Jan 24, 2004 12:34 AM

Jim,

It is true in the past that readback while operating caused some issues (pre-Virtex). And even today, LUTRAM is disturbed by readback if it is being used at the same time (address is changing).

With Virtex II, II Pro, Spartan 3 we can readback all memory excepting LUTRAM/SRL16 while it is operating and not disturb the static values, nor disturb the customer design. In the next generation we have also fixed that, and allow for readback while LUTRAM/SRL16 are operating.

Now, of course, using readback on LUTRAM/SRL16 and BRAM doesn't help you, as you do not know what it is supposed to be (as it just might be changing).

But you can readback the rest of the config memory. That covers 90% of the static cells (excluding BRAM which has a parity bit that can be used to see when the BRAM has been upset). BRAM is also more than 8 times less likely to be upset than a static config latch (from actual data in Rosetta).

In fact, you can in your design check that all static config bits are unchanged by simply reading all of them back and comparing a checksum with one calculated when first powered on.

Or, you can imagine making an error check AND correcting "crawler" that goes around and fixes any soft errors that might have occurred before they affect the customer design.....

Spirit and Opportunity use an even simpler method, which is they continually re-program all static memory cells (skipping LUTRAM/SRL16, BRAM if necessary) while operating. Of course they also used full triple modular redundancy with voting, too. That level of reliability is only required for applications where it can absolutely NOT ever fail!

If you can integrate these self-test and monitoring features with the rest of the system, you can even check BRAM and LUTRAM by stopping the processing at strategic moments.

Aust> Aust>

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Jan 24, 2004 12:44 AM

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Jan 24, 2004 2:10 AM

Let me answer this in more general terms: Anything that is stable can be reliably included in the readback, i.e. configuration bits, LUTs, flip-flops that happen not to change., etc.

For stuff that toggles, it is even c>

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Jan 24, 2004 4:52 AM

Peter,

My understanding is that there is a readback bug for SRL16/CLBRAM that causes corruption of the user circuit if the column is read while those elements are being clocked in Virtex, VirtexE, and possibly VirtexII. As I recall, it was found right about the time VirtexII was going into tape-out, and it didn't sound like it would be fixed for VirtexII. We had to avoid readback during operation of columns with SRL16s in it on one of my customer's projects that was destined for space. The work-around included clustering the SRL16s to specific columns and using a different approach to discover configuration faults in those columns. My paper "A Low Complexity Method for Detecting Configuration Upset in SRAM Based FPGAs" goes into a little bit more detail (the paper is available on my website for download at no charge) as to the problem and the technique I used to be able to still use the SRL16's without compromising upset detection. As I recall, there was also an issue reading BRAM columns while they were being clocked because the readback circuits shared some of the user circuit in Virtex. IIRC, that one was corrected in VirtexII.

The caveat to your statement is then, configuration bits can be read back while the circuit is operating except for SRL16's CLB_RAM and in virtex, block RAM as reading columns with those elements while they are being clocked can corrupt the state of the user circuit.

Peter Alfke wrote:

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- J
- jim granville
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Jan 24, 2004 11:48 AM

Sounds like that one, if true, needs a fix.... ( and an alert -> someone might think they are increasing reliability by reading back all nodes, and ignoring the 'might change' ones ) -

So this fixed, 'next generation' is not Spartan 3, but whatever is comming after that ?

If that BRAM was storing constants/lookup info (read only), then I can see a need to verify the table is actually still correct ?

-jg

- R
- rk
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Jan 24, 2004 2:23 PM

[ snip ]

Here's the presentation, on-line (no paper available that I know of):

"NSEU Sensitivity of SRAM-based FPGAs" Joe Fabula, Austin Lesea, Carl Carmichael, and Saar Drimer Xilinx Corp.

formatting link

--
rk, Just an OldEngineer
"For a successful technology, reality must take precedence over public 
relations, for nature cannot be fooled."
-- R. Feynman, Appendix F.

- R
- ram
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sun, Jan 25, 2004 5:21 AM

Thats an interesting graphic to coool out the topic Ram

- J
- Jason
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Mon, Jan 26, 2004 4:51 AM

Pablo - they undergo the standard tests....TID per spec 1019.5, SEL and SEU (Heavy Ion and Proton - Static and Dynamic). All of the specifications (from krads to Cross Sections to energies, etc) are listed in the datasheet. For more detailed info test results are published via the Xilinx Single Event Effects Consoritum, which includes major Aerospace companies, at the typical industry conferences - NSREC, RADECS and MAPLD mainly.

Thanks Jason Xilinx Mil/Aero Group

- T
- Thomas Stanka
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Mon, Jan 26, 2004 1:38 PM

What about 100 krad? I'm just curios. In our company we have problems to use devices without qualification for at least 30krad.

This seems a bit too overconfident. Actually I didn't know your effort, but I know the effort Actel is doing for its devices. And they prove very sufficient analyses beside the analyses spacecompanies are doing with Actel Fpgas for themself.

I'm shure you wouldn't tell names, but did you _ever_ tried the hotline of another fpga vendor? Tell us a bit about your experience. I'm very satisfied the way Actel is reacting on complaints.

bye Thomas

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Mon, Jan 26, 2004 4:35 PM

Jim,

See below,

Austin

Yes.

If BRAM or LUTRAM is storing constants, then you may include it in the readback verify.