Intel plans to tackle cosmic ray threat

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Dear All, Austin in particular,
I saw this and thought of you!
Cheers, Syms.
http://news.bbc.co.uk/1/hi/technology/7335322.stm



Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
Symon,

Well, Cypress, Xilinx, IBM, and many others have made it no secret that
neutrons at sea level are causing upsets, and we have done something
about it (and presented the papers, and shown our results).

Intel has also been working very quietly on this, with much less press.

I suggest that if you are not thinking about single event effects, you
should be, and demanding your vendor show you the proof of their design
efforts in this regard.

Virtex 5 is (as of today), 144 FIT/Mb for the config bits, 95%
confidence interval from 100 to 200 FIT/Mb.  This is from our 400
devices located on mountain tops in France (31.029 Giga-bit-years of
test time, 35 events).

Compare this to a 65nm ASSP or ASIC, which is at least 1000 FIT/Mb or
1000 FIT/million gates(!).  Do nothing, and it gets worse.  Do
something, and it gets back to where it should be.  These numbers from
the SELSE II conference a few years back:  the industry numbers are
really a lot worse, but no one will admit it.

There is a reason why Xilinx FPGA devices are finding their way into
many high availability and high reliability applications: we are the
only choice -- there is no competition whatsoever.

Austin

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
Quoted text here. Click to load it
Hi Austin,
I wondered what were your thoughts on their patent where "The cosmic ray
detector [built into the device] is therefore designed to spot when rays
have caused interference and then tell the chip to repeat the command." ? I
guess in an FPGA it could trigger a readback to ensure the device was still
correctly configured and/or issue a user logic reset.
Cheers, Syms.



Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
Symon,

First of all, there is no such thing as a single particle detector.

Secondly, detecting the current spike (from a strike) requires a very
complex circuit, itself subject to spikes (I know, we designed them for
the USAF...)

Thirdly, Intel has done far more than this, and deserved a better PR.

Perhaps they should fire the PR firm?

Austin

Symon wrote:
Quoted text here. Click to load it

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
And,

Yes, in S3A, S3AN, S3D, V4, V5 we are able to either reconfigure on
detection of an upset, notify the user (and they decide what to do), or
in V4 and V5, correct the flipped bit without having to reconfigure (or
even go to the config flash/prom).

Basically, in our road show, it is detailed how the user needs to decide
what to do, and at what levels, in order to meet their availability and
reliability numbers.

Mitigation is part hardware, part system architecture, and part
software.  Depending on what you are doing, and how long you can
tolerate being "off-line" there are different solutions.

They are:
-just reconfigure, start fresh
-just fix the bit flip, continue on (as a flip does nothing 90% of the
time, and seldom causes anything to 'crash')
-fix the bit flip and reset or go back to a check point/known states
-use dual redundancy, and check for agreement (if a fault is not
tolerated - like in banking, accounting) repeat if no agreement
-use full triple modular redundancy (when it must be correct, and 100%
available), also scrub to fix bits that may flip so flips are not
allowed to accumulate

All methods are used by our customers, and they all work.  We have
reference designs and support for these models.  And they can be tested
by reconfiguring to flip bits while operating. One heck of a lot cheaper
than using a proton beam, or neutron beam .... and more complete (we
have folks who flip each bit, one by one, and prove their system meets
its requirements).

Austin

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
Austin posted:
|------------------------------------------------------------------------|
|"[..]                                                                   |
|                                                                        |
|[..] they can be tested                                                 |
|by reconfiguring to flip bits while operating. One heck of a lot cheaper|
|than using a proton beam, or neutron beam .... and more complete (we    |
|have folks who flip each bit, one by one, and prove their system meets  |
|its requirements)."                                                     |
|------------------------------------------------------------------------|

Logical testing will not match checking whether real radiation respects
your model of the system. One transient can defeat the outcome of clocked
triply modularly redundant voters.

Sincerely,
Colin Paul Gloster,
unemployed and cold

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
Colin,

It is a question of completeness.

Logically going through every bit, is 100% functionally complete.

Sitting in a proton beam is "waiting for Godot" -- how long must you
wait to check enough bits to achieve the required coverage?

It becomes a matter of "too many dollars to keep the lights on."
(Beam testing is horribly power hungry, and very expensive, eg TSL is
$250K for a session, not including the airplane tickets, hotel rooms,
people, rental cars...).

Additional system testing in a beam is highly desired, but the goals are
not for functional completeness, but to cover whatever might have been
missed bu flipping 100%, one by one, every configuration bit.

XTMR Tool(tm) software can not be broken by a single radiative event,
nor by a single bit flip (as verified by NASA, JPL, CERN, etc....).

Our flow triplicates the voters, so that every feedback path gets a full
TMR.  A failure in a voter is "voted" out by the other two voters.

That is why we have so many designers using this flow:

  it   just   works.

Austin

Colin Paul Gloster wrote:
Quoted text here. Click to load it

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
(I had already emailed this to Austin in response to an email which
he sent me, but I have just noticed that he posted the email to
Usenet as well, so for the benefit of those who did not see my
private response, I post it now.)

Austin,

I trust that you are sincere, but you would not be
the first person to work in aerospace who is mistaken and who
is utterly convinced that he is not mistaken and whose confidence
is understandably bolstered by many positive, genuine experiences
of yours of overcoming would-be faults from radiation.

Generalization is a problem.

Radiation is just a detail. Even without radiation, you can not
prove absolute safety. Can you prove at the 100% confidence
level that your finite upper bound on metastability is valid?
99.999999999% confidence from empirical measurements are
inadequate if you want to claim that a problem is impossible.
Can you disprove the claim of quantum mechanics that any
component has an infinitesimal (i.e. > 0% therefore necessary
to be covered in a claim that something is perfect) probability
of being spontaneously teleported to some galaxy we never heard of?
Was hysteresis of unknown parameters overlooked in the curve fitting
which was used in SPICE? Is the physics of deep submicron processes
understood well enough?

As others have done, I congratulate you and Xilinx for many
fine posts to newsgroups. I thank you for responding (but I
would have been content with you responding on comp.arch.fpga)
(if you used email to avoid publicly embarrassing me, thank
you), but I am still displeased that Xilinx did not answer
a challenge I made in a thread to which Anne L. Atkins and
Dr. John Williams also posted in 2007 (or maybe 2006) (I
compose these posts and emails at home and as trying to find
a way to pay for food is a major objective while I have
limited networked time, it is not worth my while to give you
an exact reference as you can easily search for yourself) in
a similar discussion.

Austin emailed:
|------------------------------------------------------------------|
|"It is a question of completeness.                                |
|                                                                  |
|Logically going through every bit, is 100% functionally complete."|
|------------------------------------------------------------------|

Logic is theoretical whereas the devices are actually subjected
to physics. A VHDL simulator can not replace SPICE for
electromagnetic compatibility issues and SPICE can not replace
empirical experiences and extrapolating empirical experiences
to untried conditions can work but it can also fail.

Similar points had been admitted in the book Thomas Kropf (editor),
"Formal Hardware Verification: Methods and Systems in Comparison",
Springer, 1997; in the final sentence of Section 5.3 of the book
He Jifeng, C. A. R. Hoare, Jonathan Bowen, "Provably Correct Systems:
Modelling of Communication Languages and Design of Optimized
Compilers", 1994; in Section 12.1 What Are Formal Methods? of the
book Jim Woodcock and Martin Loomes, "Software Engineering
Mathematics: Formal Methods Demystified", 1988; on Page 181 (though
oddly enough, almost the opposite was argued on Page 180) of the book
Fenton and Hill, "Systems Construction and Analysis: A Mathematical
and Logical Framework", 1993; and Dr. Fleuriot (who had been involved
in collision and detection issues for aeronautics) of the University
of Edinburgh said to me in a personal conversation on January 24th,
2008 "[..] there's no such thing as one hundred per cent guarantees
[..]".

In an even more impressive triumph of missing the point than
Fenton's and Hill's Pages 180 and 181, Zerksis D. Umrigar,
Vijay Pitchumani, "Formal Verification of a Real-Time
Hardware Design", Design Automation Conference 1983 contains:
"[..] If there are no errors, inconsistencies or ambiguities
in the specifications, and no errors in the correctness proof,
then a successful proof enables one to be totally confident
that the design will function as desired. [..]"

|---------------------------------------------------------------------|
|"Sitting in a proton beam is "waiting for Godot" -- how long must you|
|wait to check enough bits to achieve the required coverage?"         |
|---------------------------------------------------------------------|

True. (Though actually there are somewhat usable techniques for
aiming at desired locations in a device.)

An even more important problem with a radiation source than what
you have raised is whether it is even similar enough to what will
bombard the device in the field. This is similar to I.Q. tests:
their goal is to measure intelligence but they can not do so,
instead they measure one's ability to do well in those tests,
and though intelligent people are more likely to tend to do well
in those tests, someone who has been practising those tests will get
improved marks without actually becoming more intelligent.

A paper in which it is shown that one radiation source can not
be relied upon to be a perfect proxy for another is
Jamie S. Laird, Toshio Hirao, Shinobu Onoda, Hisayoshi Itoh,
and Allan Johnston, "Comparison of Above Bandgap Laser and
MeV Ion Induced Single Event Transients in High-Speed Si
Photonic Devices", "IEEE Transactions on Nuclear Science",
December 2006. A minor discrepancy would probably not be
important, but in one device it could make all the difference.
Do not make unjustified generalizations.

Even if the relevance of the radiation is not in doubt, it
can be very difficult to make measurements, as mentioned in
Thomas L. Turflinger, "Single-Event Effects in Analog and
Mixed-Signal Integrated Circuits", "IEEE Transactions on
Nuclear Science", April 1996.

|---------------------------------------------------------------------|
|"It becomes a matter of "too many dollars to keep the lights on."    |
|(Beam testing is horribly power hungry, and very expensive, eg TSL is|
|$250K for a session, not including the airplane tickets, hotel rooms,|
|people, rental cars...)."                                            |
|---------------------------------------------------------------------|

Omnisys cut costs by using a source in a hospital. As mentioned above,
that might not always be good enough, in one case it was.

Anyhow, in a field in which spending $2000-$10000 for four megabytes
of radhard memory is not a problem, testing with radiation is not
merely a useless luxury.

|-------------------------------------------------------------------------|
|"Additional system testing in a beam is highly desired, but the goals are|
|not for functional completeness, but to cover whatever might have been   |
|missed bu flipping 100%, one by one, every configuration bit.            |
|                                                                         |
|XTMR Tool(tm) software can not be broken by a single radiative event,    |
|nor by a single bit flip (as verified by NASA, JPL, CERN, etc....)."     |
|-------------------------------------------------------------------------|

Would that be the same NASA which failed to pay attention to established
schedulability analysis techniques for a rover for Mars and which lost
a probe in 1973 intended for Venus as a result of being satisfied with
a decimal point instead of a comma?

Prof. William H. Sanders boasted on April 27th, 2006 at 12:04 that his
group convinced NASA JPL that his group solved NASA JPL's supposedly
insoluble fault-tolerant spaceborne computer problem posed in 1992. He
showed his supposed solution and as it was not perfect and it did not
seem that he was going to admit this without being forced to, I
challenged him, so he admitted at 12:28 that it was not perfectly
solved because of "[..] the classic problem in fault-tolerant computing
of who checks the checker?"

Scott Hensley of NASA said on June 4th, 2007 that his Europa
TopoMapper proposal has still not been approved after fifteen years,
partially due to the much worse Jovian radiation. If NASA is convinced
that the techniques you use are sufficient, then why is this
proposal still not approved? (I recently noticed that the European
Space Agency is planning a mission to Jupiter. I do not know whether
this is similar to the French space agency's example of ignoring
common lore and sending doomed hardware into space, or whether the
European Space Agency has actually overcome a serious obstacle.)

Would that be the same European center for nuclear research
which is partially responsible for the paper Agostinelli, et al.,
"GEANT4---a simulation toolkit", "Nuclear Instruments and Methods
in Physics Research A", 506 (2003) in which it is claimed on Page
252: "[..] It has been created exploiting [..] object-oriented
technology [..]" despite being distributed with functions
containing copied and pasted statements instead of common
statements isolated in a shared function?

That is the same European center for nuclear research whose papers
did not predict that physical effects would be observed at
particular times of day and not at others due to systematic effects
of a locomotive influencing particles' trajectories before they
realized that they should look at a railway timetable in order to
determine when a train would not be around to disrupt an experiment.
I doubt that XTMR Tool(TM) was as much help in that case as you
might had thought.

|-------------------------------------------------------------------------|
|"Our flow triplicates the voters, so that every feedback path gets a full|
|TMR. A failure in a voter is "voted" out by the other two voters."       |
|-------------------------------------------------------------------------|

TMR can help a lot. It does. It does not work for everything. Your
marketing is similar to many inadequate MAPLD papers.

If the probability of an upset for any gate is equal to the probability
of an upset for any other gate, then
winning_result <= majority_of(voter1, voter2, voter3);
can work if just one single-event upset hits any of voter1 and voter2
and voter3. It does not work if it hits winning_result.

Even so, TMR can be less risky than not copying if for example
the data in the voters are hours' worth of data which could have
been corrupted. It is true that in the relatively short amount of
time used to transfer the hours' worth of data to winning_result
that winning_result could get zapped, but it is not very likely
so it might not be as dangerous as never bothering to vote on
the accumulated readings. Of course, this is not safer than
performing the vote immediately instead of waiting hours, but in
that case winning_result and all but one of the voters can be
unnecessary.

Please understand that in one of the less bad MAPLD papers on
Klabs.org
the reason for placing voter1 into one FPGA susceptible to
errors and voter2 into a different FPGA of the same quality
and voter3 into yet another identical FPGA and winning_result
into a less susceptible FPGA (probably an antifuse one) is
that John von Neumann's (Janos Louis Neumann's) T.M.R. works
better if the checker is not error-prone (but of course, this
provides an incentive to not bother with T.M.R. at all by
using less susceptible technology throughout).

It has been empirically shown in papers such as J. Benedetto,
P. Eaton, K. Avery, D. Mavis, M. Gadlage, T. Turflinger,
Paul E. Dodd, and G. Vizkelethyd, "Heavy Ion-Induced Digital
Single-Event Transients in Deep Submicron Processes", "IEEE
Transactions on Nuclear Science", December 2004 and
Matthew J. Gadlage, Paul H. Eaton, Joseph M. Benedetto, and
Thomas L. Turflinger, "Comparison of Heavy Ion and Proton
Induced Combinatorial and Sequential Logic Error Rates in a
Deep Submicron Process", "IEEE Transactions on Nuclear
Science", December 2005 that it is better to not ignore
reality just because strategies used to work for particular
technologies.

Please explain what happens if all your voters are fed the
same clock pulse and if a single-event latchup is caused
by a single-event transient hitting this clock at an
unfortunate moment?

|-------------------------------------------------------|
|"That is why we have so many designers using this flow:|
|                                                       |
|  it just works."                                      |
|-------------------------------------------------------|

Wrong.

Yours sincerely,
Colin Paul Gloster

P.S. Though I cited good I.E.E.E. papers, the I.E.E.E. has
also published inadequate items related to this topic.

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)


Quoted text here. Click to load it

Boy, I saw that text, too, and really wondered about how reliable such a
procedure would be.  If the state of flip-flops or dynamic memories are
altered, repeating the previous instruction operation would be
worthless.  There is SO much more area in high-end CPUs devoted to
memory and much less to logic functions, I would expect memory
corruption to be the most probable fault.

Jon


Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
Symon,

Well, that employee should be fired:  that is the stupidest thing I have
ever read.

It isn't even science -- detecting neutrons! Pure BS!  A neutron is an
uncharged particle, that goes through 10 meters of concrete before it
gets stopped.  Detecting one is just......stupid.....idiotic.....

(breathe in, breathe out.)

Their PR folks are probably going nuts on this one!

Was that April 1 dateline?

Anyway, Intel is pretty savvy, and they are not standing still.  If you
use their parts, you need to request their Soft Error Effects roadshow.

It is only given under NDA, so although I know it exists, and I suspect
I know what is in it, I have never seen it.

I have seen IBM's "show" and they certainly have their act together.  As
do we.  IBM's "show" is under NDA, however, so I can't say anything
about its contents.

Our roadshow is available by request from your local friendly FAE, and
it is no NDA is required (why would we hide we are the best?).

Remember:  per the JEDEC89A standard, there are three ways to
characterize soft error effects.  Be sure to ask which ones were used,
and their degree of confidence.

If they won't share this with you (under NDA), then they are hiding
something, something very very bad.

Austin

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)

Quoted text here. Click to load it
Austin,
Are you talking about the link I posted? I didn't see any reference to
neutrons, am I missing something? Also, if what you say is true, that
neutrons whizz through 10 meters of concrete, aren't you gonna be incredibly
unlucky to get a direct neutron hit on a 45nm transistor? (BTW., A cursory
web search would suggest some kind of boron based detector, which kinda
makes sense as boron is used to absorb thermal neutrons in nuclear reactors.
http://en.wikipedia.org/wiki/Neutron_detection )
My rudimentary knowledge of cosmic rays is that they are not neutrons but
mainly protons (and a few alpha and beta particles). I would expect them to
be more detectable.
Whatever, I'm confused now...
Cheers, Syms.



Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
At sea level,

93% of particles from the cosmic ray shower are neutrons, and 7% are
protons (see JEDEC89A).

There are 12.9 per square cm, every hour, passing through everything
(for New York City, up to 25X more on mountain tops, 300X at 40K feet,
less at the equator, 10X at the poles...).

There are also electrons, muons, pions, and a host of more exotic stuff,
but hose either don' matter (do not affect anything), or they are
absorbed quickly, or decay (even a lone neutron decays in 11 minutes!).

So, like I said, that is the dumbest PR I have read.  It gets the first
prize for ignorance about soft error effects.

Some Real Science:

http://www.xilinx.com/support/documentation/white_papers/wp286.pdf

Austin

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
Quoted text here. Click to load it
Aha, thanks! Now I think I get most of it. It would seem that the cosmic
rays, which are charged particles, hurtle into the earth from all
directions. They are made of protons mainly, with some alpha and beta
particles. The earth's magnetic field means that there are more at the poles
than at the equator. The cosmic rays are charged and so interact with the
atmosphere a lot, and so very few reach the earth's surface. However, these
energetic collisions in the atmosphere produce showers of neutrons. These
uncharged particles don't interact with the atmosphere nearly as much as the
cosmic rays, so can reach the surface more easily.

Ok, here's another question. As the uncharged neutrons don't interact with
much, indeed you say they can go through 10 metres of concrete, I can't see
why the highly interactive remaining protons aren't the real danger, even
though they only comprise 7% of the total, not the 93% neutrons? Maybe none
of the original protons reach the surface, but the 7% protons are produced
by secondary neutron collisions?

Sorry to bombard you with questions!

Regards, Syms.



Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)


Quoted text here. Click to load it

The protons interact VERY strongly, due to the charge.  As most
electronics is housed in something, the housing usually stops the
protons, although there will be Gamma radiation when they hit, and that
can penetrate the housing.  If you put a bare photodiode outside on a
dark night and reverse-biased it, you could pick up these interactions
easily with an oscilloscope.  With a little digging into the physics,
you could discriminate alpha hits from protons, etc.  Of course, cosmic
ray showers deliver so much "stuff" that you'd just see big pulses
without being able to pick out the fundamental particles.

Oh, one other aspect is "stopping distance".  Very energetic charged
particles zing through stuff with minimal energy deposited into the
material, until enough energy has been shed, then they interact and stop
suddenly.  So, the very high energy primary particles are not much
trouble, it is when they either lose energy by travelling through
something or create secondary particles that the energy is low enough to
create ions.

So, the protons are not likely to ever make it into the silicon
directly.  Secondary Alphas and lots of Gammas will be bouncing around,
and those could deliver energy to the chip.

Jon


Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
Symon,

The cosmic rays are ions: iron, gold, xenon, carbon, basically anything
and everything.  Yes, there are lots of protons, but they do not have
enough energy to cause problems.  More light ions (like carbon), fewer
heavy ions (like gold).

But, iron, with too few electrons, traveling at 90% the speed of light,
now there is a particle!

When one of these "heavy ions" strike the upper atmosphere, say a
nitrogen molecule, all hell breaks loose and you get all sorts of
products (Even CERN has nothing on a cosmic ray--high energy physics
used mountain top sites before the cyclotron!).  Since neutrons have no
charge, and go right through most things, (as most mass is empty space),
the neutrons predominate at the earth's surface.

Beam neutrons at a block of iron, or aluminum or copper, and you will
get radioactive iron, aluminum, or copper (excess neutrons will
eventually be released if they have created an unstable isotope).  This
is why lead on the surface is more radioactive that lead at the bottom
of the sea.

The ions got directed by the earth's magnetic field, but once the ion
strikes, the neutrons are unaffected by the fields.

The direction is predominately "up" as the flux falls off away from "up"
(towards the sky) as the neutrons are absorbed by the atmosphere at
oblique angles.

No neutrons come from "down" unless you are standing on lead, uranium,
or in the basement in Minnesota (Radon).

The neutron hits the silicon lattice.

The silicon "spallates" (spilts the atom) and releases an alpha particle
( a helium atom, minus the electrons: two protons, two neutrons).

The alpha particle has charge, and it upsets the source drain region
(due to deposited charge, actually leaves a trail of 'holes' and
electrons which quickly recombine, in less than 30 ps).

The neutron may also just "ping" the silicon lattice, and cause the
silicon dioxide molecule to be dislocated from the lattice, or just
vibrate.  In either case, charge is also released.

A good history lesson (and some physics):

http://www.research.ibm.com/journal/rd40-1.html

specifically:

http://www.research.ibm.com/journal/rd/401/tang.html

If you can stomach the physics....

Austin

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)

Quoted text here. Click to load it

That would seem to suggest the semiconductor dies should always be
oriented in the vertical plane -
substantial reduction in cross sectional exposure, plus anything that
does hit it might affect a longer
"scratch" of circuitry rather than just a point?   Assuming I guess
that the interaction isn't entirely confined to discrete points along
the flight path by quantum effects.

Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)
Quoted text here. Click to load it

Hmm, because it's a flux, isn't the only thing that matters the volume of
the die? Edge on to the main direction means there's less exposed area, but
much more depth to travel through for the particles that do hit. Is it true
that the particles/per unit volume remain the same no matter what the
orientation is? Interesting...

Perhaps upside down is the best orientation so the lead in the solder stops
some stuff. :-) Oh yeah, damn you
RoHS!

Cheers, Syms.



Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)

Quoted text here. Click to load it

    I forget where I saw the information, it may have been in a briefing
from some of our rad-hard experts in Manassas (BAE Systems, proud producer
of the RAD750 PowerPC and RAD6000 processors). There are some
direction-dependent effects that are just being recognized and dealt with as
circuit dimensions shrink. I think the V4 is seeing some of these issues
crop up...the "single event upsets" are no longer confined to a single
circuit element especially if the rays come in from an oblique angle. The
stream of charges/holes created by a particle floods multiple cells IIRC.
Not very many rad-tolerant designs deal with this concept yet (correlated
upsets in adjacent bits of the logic), though with smart floorplanning a
design can probably mitigate this. Austin's comment about deliberately
flipping one bit at a time and verifying performance does go out the window
when you throw this curveball into the mix. Since the RAD750 is currently
fabricated at 150nm (soon 120nm) this effect isn't important (yet), but when
you look at 65nm circuits (1/4 the surface area per logic element) this
effect is becoming noticed. Of course the performance you get from these
denser circuits is why we keep plugging away at making it work. Both Xilinx
and BAE Systems can share credit for the Mars Rover's endurance (a RAD6000
is the main computer, and I think Austin described several smaller Xilinx
parts in critical subsystems).

Comments, Austin? I'm looking at this second-hand so I'll defer to your
obvious focus on this area.

Dr. Marty Ryba



Re: Intel plans to tackle cosmic ray threat (actually they have beenworking on it for at least five years...austin)
Quoted text here. Click to load it

Expecting quality in a PR document seems to be the triumph of hope
over expereince?

These thing start in the depths of a company, we assume largely
accurate. Then, that companies Media liason/managers work on it.

Then the PR firm 'works' on it and finally the publishing media's
editors have a go.

Like chinese whispers, any semblence to the original, is pure
coincidence!  ;)

-jg


Re: Intel plans to tackle cosmic ray threat (actually they have been working on it for at least five years...austin)


Quoted text here. Click to load it
Right, having worked with a nutron detector array, detecting them is
REALLY hard, and not something easily done on a chip.  However, most
neutrons pass through chips easily with no interaction, and so can be
ignored.  What you have to detect is if the neutron was CAPTURED, and
deposited energy in (or very near) the active circuitry.  That will
release some energy (could be charged particles, could be Gamma rays)
that could affect the active circuitry.  The gammas could be detected
from a distance, but they can be quite directional and local, so
detecting them could be tough, too.

Quoted text here. Click to load it
Really!  Just detecting a neutron or Alpha hit could be difficult,
although detecting a cosmic ray shower is a lot easier, as the shower of
charged particles greatly increases your probability of detection on a
small detector device (probably just a diode).  But, then, the REAL
problem is how to CORRECT any malfunction that may have ocurred.
Reducing the probability of corruption, as Austin descibes Xilinx has
done, seems the most reliable and provable scheme.  Proving you can
correct corruption from a hit anywhere on a chip, while running ANY
program, at any time, seems like fantasy.

Jon


Site Timeline