Re: The Manifest Destiny of Computer Architectures

- G
- glen herrmannsfeldt
  
  Contact options for registered users
posted
12 years ago

Wed, Sep 14, 2011 9:06 PM

At every computer conference I attend, I see numerous papers that show
> how to incrementally increase the capabilities of present products, > plus a paper or two about some aspect of distant future processors. > There is a sort of consistency among these papers that, taken > together, creates an image of the manifest destiny of processors that > are VERY different from present-day processors and networks. I am > interested in that image, and I suspect that others here may also be > interested.

I am reading in comp.arch.fgpa, but comp.arch readers may have different ideas.

Here is the sort of image that I see emerging. Perhaps you have your > own very different vision? > 1. Processors would be able to automatically reconfigure around their > defects with such great facility that reject components will be nearly > eliminated. This would make it possible to build processors without > any practical limits to complexity. Several papers have been presented > explaining how this could be done with Genetic Algorithm (GA) > approaches. Initial reconfiguring would be done at manufacture, but > power-on reconfiguring would adapt to on-shelf and in-service > failures. Processors with large numbers of defects would be sold as > lesser performing processors.

Reminds me of stories about Russian processors that came with a 'bad instruction' list the way disk drives (used to) come with a bad blocks list.

If you follow such conferences, you necessarily get far-out ideas. But if you look at the actual processors in use today, they are not so different from 40 years ago. Bigger and faster, yes, but otherwise not that different.

2. An operating system would distribute the work as tasks, with each > task having input and output vectors. Any task that fails to > successfully complete would be re-executed on other sections of the > processor while diagnostics identify the problem in the failed > section, which would then be reconfigured around the new defect. This > would allow systems to keep running and continue producing correct > results, despite run-time failures.

I suppose there are some problems that could work that way. A web browser updating multiple windows on a page could farm out each to a different task. But many computational problems don't divide up that way.

3. Memory would be integral to the CPU, and would be in the form of > thousands (or millions) of small memory banks that would eliminate the > memory bus bottleneck. Switched memory buses could quickly move blocks > of data around. > 4. The processor would be organized as a small (2-4) number of CPUs, > each having a large number of sub-processors capable of dynamic > reconfiguration to specialize in the computation at hand. That > reconfiguration would be capable of the extensive data-chaining needed > to execute complex loops as single instructions, and do so in just a > few machine cycles, after suitable setup. Sub-processors would > probably be reconfigurable for either SIMD or MIMD operation.

Very few problems divide up that way. For those that do, static reconfiguration is usually the best choice. Dynamic reconfiguration is fun, but most often doesn't seem to work well with real problems.

5. The system would probably use asynchronous logic extensively, not > only for its asynchronous capabilities, but also for its inherent > ability to automatically recognize its own malfunctions and trigger > reconfiguration. > 6. A new language with APL-like semantics would allow programmers to > state their wishes at a high enough level for compilers to determine > the low-level method of execution that best matches the particular > hardware that is available to execute it.

APL hasn't been popular over the years, and it could have done most of this for a long time. On the other hand, you might look at the ZPL language. Not as high-level, but maybe more practical.

7. There are other items on this list, but they aren???t as easy to > explain, and they may not be essential to achieve the manifest destiny > of processors. > Note that the Billions of dollars now spent on developing GPU-based > and large network-based processors, along with the software to run on > them will have been WASTED as soon as Manifest Destiny processors > become available. Further, the personnel who fail to quickly make the > transition to Manifest Destiny processors will probably become > permanently unemployed, as has happened at various past points of > major architectural inflection.

Consider that direct decendant of the 35 year old Z80 are still very popular, among others in many calculators and controllers. New developments might be used for certain problems, but the old problems can be handled just fine with older processors.

For many years now, the economy of scale of people buying faster processors to browse the web or run spreadsheets has supplied computational sciences (computational physics, computational chemistry, and computational biology) with cheap, fast machines. Machines that wouldn't have had sufficient economy of scale without those other uses. The whole idea behind GPU processors is that the economy of scale of building graphics engines for gamers can also be used for computational science.

Apparently the only conference around with a sufficiently broad > interest and attendance to host discussions at this level is > WORLDCOMP. This would provide a peer reviewed avenue of legitimation > for Manifest Destiny research. I have talked with Hamid, the General > Chairman, about hosting these discussions, and he is OK with it, > providing that I can drum up enough interest. So, I need to determine > the level of interest out there in a more distant future of computing > that lies beyond just the next product.

Consider the latest deviation from traditional processor design, the VLIW Itanium. VLIW has been around for years, and never did very well. Some thought its time had come, but it is sinking just like the similarly named boat.

Conferences aside, please email me or post your level of interest, and > please pass this on to any others you know who might be interested.

-- glen

- J
- Jon Elson
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 14, 2011 9:23 PM

I think I mentioned this problem a year or so ago, but have new data. We previously had problems with whiskers shorting adjacent pins on some boards that have a Xilinx XC9572-15TQG100C part. These whiskers were laying flat on the board, so their origin was not completely clear.

Now, I have some boards that were reflow soldered some months ago, and were only finished now. On inspection of the CPLD, clear evidence of Tin whisker growth is obvious. I think EVERY chip has whisker growth on at least one pin! This is quite a concern, as this equipment may have a 20 year operating life.

There are 12 other fine-pitch parts on this board, and none of those show signs of the whiskers.

I reported the first occurrence to Xilinx at the time, including microphotographs, and they basically blew me off, saying it was obviously my process. We are using tin-lead solder paste on tin-lead plated boards.

Does anyone have any idea why we are experiencing this, or what can be done to prevent these chips from developing shorts over time?

Thanks,

Jon

- U
- Uwe Bonnes
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 14, 2011 9:54 PM

Jon Elson wrote: ...

Please put the pictures on the web ...

-- Uwe Bonnes snipped-for-privacy@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt

--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------

- S
- Stefan Monnier
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 14, 2011 10:08 PM

Actually, if you look back at "far out research" from years ago, I think that even though the machines we use are "like the ones from back then" from a programming point of view, they are also in some ways "like the far-out ideas from back then".

E.g. the "Processor In Memory" still hasn't happened, but current CPUs have a boat load of on-chip memory. So I think the way to predict the future is to take those far-out ideas and try to see "how will future engineers manage to use such techniques while still running x86/ARM code". After all, experience shows that the part that's harder to change is the software, despite its name.

Stefan

- J
- Jon Elson
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 14, 2011 10:11 PM

They are really crummy, and show the "old" problem, some whisker-like strands thay lay across the board. This new condition is different, and shows REALLY typical-looking tin whiskers that are growing out of the bends of the gull-wing leads on these Xilinx QFP parts. The last time I tried photographing this, I got very mediocre results, the stereo zoom microscope setup we have is optimized for hand rework of parts, and the light level decreases as you increase magnification. So, although I can see what is going on quite clearly, I doubt the pictures would be very definitive. But, I have NO doubt, whatsoever, that what I am seeing NOW matches the published tin whisker photos that are ubiquitous on the web.

What has me worried is these are essentially new boards, just going through testing before sending out to researchers who will be using them for a number of years. If I saw this amount of whisker growth in the six months these boards have been in storage after reflow, it may indicate a LOT of problems in the future. It has definitely gotten me worried!

(As for posting this as a reply to another thread, my first post as a new thread was rejected by some news server, but I could not discern the reason for the rejection.)

Jon

- Q
- Quadibloc
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 14, 2011 10:47 PM

My main question concerning the original poster's projection of the logical future for processors is that reconfigurability comes with a large amount of overhead. So if leaving reconfigurability out improves speed by a factor of 3, say, it won't be popular.

However, that's only true if the reconfigurability is fine-grained, as on an FPGA. On something like IBM's recent PowerPC chip with 18 CPUs, where one of them can be marked as bad, so that it uses one for supervision and 16 for work, there is almost no overhead.

So, just as larger caches are the present-day form of memory on the chip, coarse-grained configurability will be the way to increase yields, if not the way to progress to that old idea of wafer-scale integration. (That was, of course, back in the days of three-inch wafers. Fitting an eight-inch wafer into a convenient consumer package, let alone dealing with its heat dissipation, hardly bears thinking about.)

John Savard

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 14, 2011 10:53 PM

Interesting location, suggests stress helps ?

Are these on both bends, & on the compression, or tension part of each ?

Were these manually or reflow soldered ? At Lead-based, or Lead-free temperatures ? Post cleaned or not ?

-jg

- M
- Mark Thorson
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 4:53 AM

Defect density is hardly a limiting factor. Thermal and I/O are, both also being packaging and substrate issues. Also, it would introduce pain if different chips with the same part number, revision level, and date code had different performance. Probably no fun for the guys in the testing department, either.

I'm reminded of a friend of mine that worked on binary code rehosting tools for Clipper. He'd rant and rave about all the hardware bugs being hidden by the assembler. When I told him that I learned from this newsgroup that yield was being enhanced by zapping individual bad cache lines to make them permanently invalid, he just laughed.

- M
- Mark Thorson
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 4:58 AM

8051 and PIC architecture hardware and software engineers are still gainfully employed, perhaps more now than ever before. Maybe he was referring to the 6502?

- M
- Mark Thorson
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 5:02 AM

Oh, sure it does. Just have four of them on the top of the box, put it in the kitchen, and call it a stove.

- N
- nmm1
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 5:24 AM

This is a meaning of the term "asynchronous" with which I was previously unfamiliar.

Its status as the leading write-only language has been taken over by Perl; despite the claims of its proponents, it never was exceptionally useful for scientific calculations or anything else. Also, its model is a fair match to the computers that were being fantastised about in the 1980s, rather than the 2000s.

I am very much in favour of people doing serious future thinking, but it would have to be a lot better-informed and hard-headed, and preferably more radical, than this.

For example, there are people starting to think about genuinely unreliable computation, of the sort where you just have to live with ALL parths being unreliable. After all, we all use such a computer every day ....

Regards, Nick Maclaren.

- N
- Nico Coesel
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 7:18 AM

My guess is that you'll need to look at the temperature profile of the soldering process. I'd get some lead-free soldering experts to look at the problem.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 7:32 AM

(snip)

It does seem a little unusual. Asynchronous logic, sometimes also known as self-timed logic, has been around for years. Some is described in:

formatting link

I suppose I believe that some failure modes could be detected and a corrective action initiated.

-- glen

- J
- Jon Elson
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 4:53 PM

This is a known part of the whisker problem, if you read the literature. Compressive stress will produce the whiskers, tensile stress suppresses the whisker growth. Funny, these sort of look like they are coming from the outer side of the bend, but there may be initial tensile stress there that then becomes compressive when the reflow relaxes strains in the lead.

It SEEMS that these are mostly showing up on the back of the first bend up from the PC board, that would be the second bend out from the package body, and facing toward the package.

These are lead-free parts, soldered with 63/37 Tin/Lead solder paste (Shenmao solder paste distributed by Manncorp), and reflowed onto Tin/Lead plated 6-layer PC boards with a peak reflow temp actually measured on the board of about 235 C. (My batch reflow oven has a thermocouple that I poke into a through-hole on one of the boards near the center to monitor actual substrate temperature.) But, then as I am still getting my solder stencil technology figured out, I had to do a bunch of rework to remove solder shorts.)

THEN, the boards were stored for about 6 months, and now when I inspected them, I see the whisker growth. The boards were not cleaned after reflow, but were cleaned just before this inspection.

Thanks for the comments and questions!

Jon

- J
- Jon Elson
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 4:59 PM

My feeling on this is that the whiskers have been growing over the

6 months of storage, and that whisker growth is not possible during the reflow. I believe all the other parts on the board are ALSO lead-free, and the whiskers are ONLY showing up on this ONE part. And, we use other Xilinx parts where it is NOT showing up. ONLY on the 100-lead QFP, but not on 44- or 144-lead parts.

Searching the literature, I have NOT found anyone who says temperature profile has ANY effect on whisker growth. Alloys, stresses in the tin plating, thickness of the tin plating, purity (or lack of) in the Tin, storage conditions (humidity and thermal cycling) have all been implicated in affecting the rate or prevalence of the whisker growth. But, I have never seen a paper that mentions the reflow temp profile. If you have a reference, I'd like to read it.

Thanks,

Jon

- N
- Nico Coesel
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 5:26 PM

Simply use common sense and knowledge: the whiskers grow because of stresses in the crystal structure of tin. If you (nearly) melt a metal you'll alter the crystal structure*. Humidity and temperature may accellerate growth of whiskers, but only if the initial stress is in the crystal structure. Thats why I'm pointing to the temperature profile of the soldering process as the source of the problem.

You should check whether the whiskers also grow on parts which have not been soldered yet. I bet they don't otherwise Xilinx would have a really big problem on their hands.

think about hardening steel by cooling it very fast after it has been heated close to the melting point.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

- A
- Andy
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 5:52 PM

ree,

d

Using pure Sn lead finishes in standard Pb soldering profiles is a no- no.

You need to run your profile at the higher, Pb-free process temperatures. Make sure all of your materials (board, other components, flux, paste, etc.) can handle the higher temperatures. If you do not get up to the Pb-free process temperatures, the Sn finish is not annealed properly, and thus stresses between plating and base metal are not relieved properly, thus causing an increase in Sn- whisker growth rate.

Andy

- J
- Jon Elson
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 6:41 PM

Xilinx has knowledge base articles where they specifically say that you can safely use their lead-free parts in standard Sn/Pb solder processes without change to the process. Or, at least that is how I read what they said there.

Jon

- Q
- Quadibloc
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 15, 2011 6:42 PM

Ah. Do your power gaming while waiting for supper to be ready. But silicon carbide has too many defects to run chips at that temperature yet...

John Savard

- K
- kenney
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Sep 16, 2011 9:13 AM

Green Array are already selling processor arrays with the biggest being

144 processors. Each processor has it's own on chip memory and fast communication channels with the others. They can be configured individually. Supplied language is Color Forth. I do not know anything more about this, just what I picked up reading comp.language.forth, but evaluation boards are available.

Ken Young