for all those who believe in ASICs....

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 19, 2006 8:21 PM

Nothing could be farther from the truth.

Sun's strength was it's SPARC prococessor line which allowed it to grow as a high end systems company without being a MS/Intel clone.

Had Xilinx embraced RC as a systems company, it would have leveraged it's strengths into a high dollar market. I believe that is still possible with Xilinx before it's core patents expire. Or inspite of Xilinx, using an A-Team competitor with an agressive technology plan.

I have several different roadmaps that I've been developing over the last several years. Today is the right time to start a new tech industy, as we are just on the back side of a very deep tech slump, that should progress into a boom cycle. All the core technologies, operating systems, processors, FPGAs/CPLDs are mature products that have been incrementally refined for two decades.

The time is ripe to innovate hard, as we did between 1978 thru 1987, and in the process use strong vision to take the industry to the next level.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 20, 2006 9:42 AM

So your first thought was "right", laced with sarcasm. Let's examine this a bit. Xilinx produces about as many board level products as Apple Computer did in 1984, including an outstanding entry into the system level market with the ATX form factor ML310 motherboard. It's "only" problem is it's cost of ownership coupled with marginal software support to actually use it as a reconfigurable computer -- IE nearly a complete lack of software support for the applications developer community, focusing instead on embedded markets. Considering the Xilinx ROI for those boards, and Apples ROI for it's board development, there are questions to consider.

Consider that Apple produced 1 million systems with the MacIntosh 128,

512, and Plus design, with a price tag of over $2K each over three years from introduction. That nearly doubled the companies sales in just over a year, to just under $2B for 1985 and 1986 while redefining the personal computer industry. Apple did this with one of the most agressive cost of ownership designs in the computer industry at the time, and aggressively creating a developer network that produced some 800 application programs that "sold" the MacIntosh for Apple.

formatting link

$24

Everything on the ML310 board, except the VP30, can be found on a commodity $50 retail ATX motherboard. The value that Xilinx failed to capitalize on was taking reconfigurable computing mainstream by agressively pricing this product, and creating a large reconfigurable computing developer network to produce applications for the platform. Having Xilinx FPGA literate developers, and lots of them, would easily push Xilinx's chip sales and market share for an explosive growth.

Missing at Xilinx was system product level architects and a management team visionary enough to build and capitalize on a systems level growth market. To do so, would require Xilinx doing an about face on it's software product licensing and embracing open software in a very different way.

I believe that Xilinx leading the reconfigurable computing market could easily take 5-10% of the global computing market share, just as Apple has for the last two decades. Apple for fiscal 2005, generated revenue of $13.93 billion, ten times Xilinx concentrating on chip sales.

With it's patent protection for the cash cows expiring, and a very likely boom in off shore competition in the comodity FPGA market, I believe that Xilinx needs to seriously pick up some vision about it's future. While it has some major volume design in's for the US auto industry, it has also created a huge introductory market for off shore fabs to produce FPGA's for foriegn auto producers that will follow the US lead.

So, while the Xilinx staff here are critical of offering "advise" because they are so successful, that "success" does have other measures. There is another view that Xilinx doesn't need advice, it needs a completely different "vision" in it's management team to create new and larger markets as FPGAs go comodity and off shore competitors chip away at the cash cows. The $155M structured ASIC market is peanuts compared to the possiblities as a systems level company.

Xilinx with an agressive cost of ownership strategy, could produce ML310 like motherboards into the market at pretty substantial volumes, along with fully packaged retail systems later. Deep seeding of the educational and open source developers communities would result in a rapid expansion of RC literate programmers, and applications, creating an RC market that has a very likely chance of securing 5-10% of the computer market inside a few years. RC established on Xilinx product, would set a defacto binary standard (and resulting market share) that would be hard to erase for a long time - the Intel effect. Or, one of the A-Team companies, or new off shore entrants, can establish that standard first.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 20, 2006 10:28 AM

The flip side of that is tester time is so expensive that a die can only spend a short time on it, so intermitant failures don't get caught. I've been on the system end of the problem most of my life, doing operating systems, drivers, and diagnostics combined with hardware design. There was a reason we did burn-in with tests specifically targeting intermittants, and heavy qualification in an environmental chamber.

As a systems guy, we provided burn-in support that directly augmented board level ATE testing.

With a systems guy view, I can only wonder why die level and package level ATE is required. Design for test has to be one of the most critical requirements in the process - from die to system. It would seem that the the ATE function should probably be redundant on the wafer at fab, tightly integrated to the application die's IO pads -- AND critical internal logic not visible from pads. The wafer in this environment would have surface metalization on one edge for power and a network interface. The ATE function on the wafer would include power control for each die, an on wafer jtag chain, plus an on wafer gigabit network, each bused to a set of pads on one keyed edge of the wafer. The wafers would be racked into a test/burnin fixture that would provide power and network connectivity to the wafer (thousands of wafers) with an external test controller and logger. Each wafer's on silicon ATE system would then test each die (possibly hundreds in parallel) and leave a rest report for the external logger.

This can be done by the fab, as they build up a multi-project wafer, or by the fabs customers for single customer wafers.

Xilinx is such a large company, it would seem that designing a standard on wafer ATE function to free themselves of expensive die probing ATE would be a huge priority. IT would also allow for entire batchs of wafers to be tested with a full range of environmentals over a significant period of time to better issolate intermittants coupled to particular environmental ranges.

Similarly, a subset of that on each die would facilitate package level testing both after packaging, and after customer board mounting.

There is nothing special about the ATE's test interfaces, that can not be implemented on wafer with some good test for design strategies.

One of the biggest problems in any industry .... "we have always done it that way".

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 20, 2006 10:47 AM

Actually the concept of managed defects becomes even stronger economically with lower yeilds, as the ratio to valuable recovered defect product to discards gets higher. Instead of discarding 80-90% of the product, you then add that yield to your revenue. Only untestable dies (those with power rail shorts, failed jtag interfaces, failed configuration paths, etc) are discarded. And even some of those can be recovered with additional design for failure strategies.

Every system I've worked with has spare cycles ... doing testing/scrubbing in the idle loop is always a possibility.

Yep.

As I've posted elsewhere in this thread, increasing their revenue by

10-20x in a few years, and their long term market share substantially. Or, just staying in business.

or, as the disk drive market ... just assume every device has defects and design for it.

Designing for defect detection and management changes the entire view of their process and ATE .... as I posted just a few minutes ago, why isn't this integrated on wafer, instead of continuing to do it with ATE as has been done for several decades?

I can see where the cost of ATE could make the difference between being in business, or not. Designing for test, in a very different wafer oriented way I see as critical.

For some applications sure .... for system level RC applications it's completely trival (and necessary) in the grand scheme of things. A very very minor amount of software.

Consider that a new system could be brought up using triple redundancy and run live in that reduced capacity configuration for it's first few hundred/thousand hours, then back off to single redundancy for checking, and after the part is well qualified run only with background idle loop testing and scrubbing.

Using the racked wafer strategy this is very viable to hold wafers in test for 72 hours or more before they are cut and packaged. Then the same strategy is extended to the package after it's brought up in a system.

Designing for test, good test, coupled with defect management should only increase yields, lower costs, and benifit both the mfg and customer long term.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 20, 2006 11:07 AM

Very true. Having been responsible for both factory level burnin testing and field testing, intermittants are by far the toughest nut to crack as they seldom show up at the ATE station.

Good software based diagnostics generally attempt issolation to a component set. Which in the FPGA sense, would include searching for the specific resource set that fails. I generally see design for testing to use ATE for screening of dangerous hard failures (power faults) and completely dead devices.

Production in an RC world ... not problem. Production for an embedded design that is not defect aware a complete nightmare. Designing for test, desiging for defect management, I believe is not an option ... even for embedded.

Seems that it can be completely transparent with very very modest effort. The parts all have non-volatile storage for configuration. If the defect list is stored with the bitstream, then the installation process to that storage just needs to read the defect list out before erasing it, merge the defect list into the new bit stream, as the part is linked (place and routed) for that system. With a system level design based on design for test, and design for defect management, the costs are ALWAYS in favor of defect management as it increases yeilds at the mfg, and extends the life in the field by making the system tollarent of intermittants that escape ATE and life induced failures like migration effects.

In RC that is not a problem ... it's handled by design. For embedded designs, that is a different problem.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 20, 2006 11:08 AM

But is your employeer?

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 20, 2006 11:23 AM

Your turn.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Mar 20, 2006 9:46 PM

I was told once they have some adversion to becoming a systems company, along with some NIH factors that might make a new kid on the block a little unwelcome if waving a $3-5B business plan in the air.

I have at times been looking for an RC startup as a senior architect and/or CTO, plus considering seeking funding based on my own work. I'd still like to build the multi petaflop system I proposed to several firms last year, using a large number of XC4VLX200's and RM9000's. Then spin a few wafers with different programmable architecture to push past an exaflop by decade end. In the short term I have a few student boards to build, and finish my proof of concept work.

I've already said more here than I would have planned, but maybe that's good, as Xilinx's competitors have something to consider about doing this business right. Peter can keep pushing, and I might even level their playing field a little more. They might even want to shut me up by giving me the briefcase full of XC4VLX200's to do the proof of concept machine right, so I can go sell petaflop RC super computers with Xilinx defect managed parts instead of A-Team parts. Or maybe there is an A-Team that is really interested in becoming a $5B company this decade.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 1:49 AM

Which reconfigurable FPGAs would those be with the non-volatile bitstreams? I'm not aware of any. Posts like these really make me wonder whether you've really done any actual FPGA design. They instead indicate to me that perhaps it has been all back of the envelope concept stage stuff with little if any carry through to a completed design (which is fine, but it has to be at least tempered somewhat with actual experience garnered from those who have been there). In particular, your concerns about power dissipation being stated on the data sheet, your claims of high performance using HLLs without getting into hardware description, your complaints about tool licensing while not seeming to understand the existing tool flow very well, the handwaving in the current discussion you are doing to convince us that defect mapping is economically viable for FPGAs, and now this assertion that all the parts have non-volatile storage sure makes it sound like you don't have the hands on experience with FPGAs you'd like us to believe you have.

What are you doing different in the RC design then? From my perspective, the only ways to be able to be able to tolerate changes in the PAR solution and still make timing are to either be leaving a considerable amount of excess performance margin (ie, not running the parts at the high performance/high density corner), or spending an inordinate amount of time looking for a suitable PAR solution for each defect map, regardless of how coarse the map might be.

From your previous posts regarding open tools and use of HLLs, I suspect it is more on the leaving lots of performance on the table side of things. In my own experience, the advantage offered by FPGAs is rapidly eroded when you don't take advantage of the available performance. However, you also had a thread a while back where you were overly concerned about thermal management of FPGAs, claiming that your RC designs could potentially trigger a mini China syndrome event in your box. If you are leaving enough margin in the design so that it is tolerant to fortuitous routing changes to work around unique defects, then I sincerely doubt you are going to run into the runaway thermal problems you were concerned with. I've got a number of very full designs in modern parts (V2P, V4) clocked at 250-400 MHz that function well within the thermal spec with at most a passive heatsink and modest airflow. Virtually none of those designs would tolerate a quick reroute to avoid a defect on a critical route path without going through an extensive reroute of signals in that region, and that is assuming there was the necessary hooks in the tools to mark routes as 'do not use' (I am not aware of any hooks like that for routing, only for placement).

Still, I'd like to hear what you have to say. If nothing else, it has sparked an interesting conversation. Having done some work in the RC area, and having done a large number of FPGA designs over the last decade (My 12 year old business is exclusively FPGA design, with a heavy emphasis on high performance DSP applications), most of which are pushing the performance envelope of the FPGAs, I am understandibly very skeptical about your chance of achieving all your stated goals, even if you did get everything you've complained about not having so far.

Show me that my intuition is wrong.

>

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 2:51 AM

I think John was meaning store the info in the ConfigFlashMemory. Thus the read-erase-replace steps. .. but, you STILL have to get this info into the FIRST design somehow....

-jg

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 3:07 AM

What are XC18V04's? Magic ROMs? What are the platform flash parts? Magic ROMs? They are CERTAINLY non-volatile every time I've checked.

In fact, nonvolatile includes disks, optical, and just about any other medium that doesn't go poof when you turn the power off.

and now this assertion that all the parts

Ok Wizard God of FPGA's ... just how do you configure your FPGA's without having some form of non-volatile storage handy? What ever the configuration bit stream sources is, if it is reprogramable ... IE ignore 17xx proms ... you can store the defect list?

UNDERSTAND?

Now, the insults are NOT -- I REPEAT NOT - being civil.

With RC there is an operating system, complete with disk based filesystem. The intent is to do fast (VERY FAST) place and route on the fly.

You are finally getting warm. Several times in this forum I discussed what I call "clock binning" where the FPGA accel board has several fixed clocks arranged as integer powers. The dynamic runtime linker (very fast place and route) places, routes, and assigns the next slowest clock that matches the code block just linked. The concept is use the fastest clock that is available for the code block that meets timing. NOT change the clocks to fix the code.

Certainly ... it may not hardware optimized to the picosecond. Some will be, but that is a different problem. Shall we discuss every project you have done in 12 years as though it was the SAME problem with identical requirements? I think not. So why do you for me?

In my own experience, the advantage offered by FPGAs is

The performance gains are measured against single threaded CPU's with serial memory systems. The performance gains are high degrees of parallelism with the FPGA. Giving up a little of the best case performance is NOT a problem. AND if it was, for a large dedicated application, then by all means, use traditional PAR and fit the best case clock the the code body.

This is a completely different problem set than that particular question was addressing. That problem case was about hand packed serial-parallel MACs doing a Red-Black ordered simulations with kernel sizes between 80-200 LUT's, tiled in tight, running at best case clock rate. 97% active logic. VERY high transistion rates. About the only thing worse, would be purposefully toggling everything.

A COMPLETELY DIFFERENT PROBLEM is compiling arbitrary C code and executing it with a compile, link, and go strategy. Example is a student iterratively testing a piece of code in an edit, compile and run sequence. In that case, getting the netlist bound to a reasonable set of LUTs quickly and running the test is much more important than extracting the last bit of performance from it.

Like it or not .... that is what we mean by using the FPGA to EXECUTE netlists. We are not designing highly optimized hardware. The FPGA is simply a CPU -- a very parallel CPU.

First you have taken and merged several different concepts, as though they were some how the same problem .... from various posting topics over the last several months.

Surely we can distort anything you might want to present by taking your posts out of context and arguing them in the worst possible combination against you.

Let's try - ONE topic, one discussion.

Seems that you have made up your mind. As you have been openly insulting and mocking ... have a good day. When are really interested, maybe we can have a respectful discussion. You are pretty clueless today.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 3:11 AM

Thanks Jim ... that is EXACTLY what I did say. It doesn't mater if the configuration storage is on an 18V04, platform flash card, or a disk drive.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 4:14 AM

John, last time I checked, FPGAs did not get delivered from Xilinx with the config prom. Sure, you can store a defect map on the config prom, or on your disk drive, or battery backed sram or whatever, but the point is that defect map has to get into your system somehow. Earlier in this thread you were asking/begging Xilinx to provide the defect map, even if just to one of 16 quadrants for each non-zero-defect part delivered. That leads to the administration nightmare I was talking about.

In the absence of a defect map provided by Xilinx (which you were lobbying hard for a few days ago), the only other option is for the end user to run a large set of test configurations on each device while in system to map the defects. Writing that set of test configurations requires a knowledge of the device at a detail that is not available publicly, or getting ahold of the Xilinx test configurations, and expanding on them to obtain fault isolation. I'm not sure you realize the number of routing permutations that need to be run just to get fault coverage of all the routing, switchboxes, LUTs, etc in the device, and much less achieve fault isolation. Your posts regarding that seem to support this observation.

Now see, that is the fly in the ointment. The piece that is missing is the "very fast place and route". There is and has been a lot of research into improving place and route, but the fact of the matter is that in order to get performance that will make the FPGA compete favorably against a microprocessor is going to require a fast time to completion that is orders of magnitude faster than what we have now without giving up much in the way of performance. Sure, I can slow a clock down (by bin steps or using a programmable clock) to match the clock to the timing analysis for the current design, but that doesn't help you much for many real-world problems where you have a set time to complete the task. (yes, I know that may RC apps are not explicitly time constrained, but they do have to finish enough ahead of other approaches to make them economically justifiable). Remember also, that the RC FPGA starts out with a sizable handicap against a microprocessor with the time to load a configuration, plus if the configuration is generated on the fly the time to perform place and route. Once that hurdle is crossed, you still need enough of a performance boost over the microprocessor to amortize that set-up cost over the processing interval to come out ahead. Obviously, you gain from the parallelism in the FPGA, but if you don't also mind the performance angle, it is quite easy to wind up with designs that can only be clocked at a few tens of MHz, and often that use up so much area that you don't have room for enough parallelism to make up for the much lower clock rate. So that puts the dynamically configured RC in a box, where problems that aren't repetitive and complex enough to overcome the PAR and configuration times are better done on a microprocessor, and problems that take long enough to make the PAR time insignificant may be better served by a more optimized design than what has been discussed, and we're talking not only about PAR results, but also architecturally optimizing the design to get the highest clock rates and density. In my experience, FPGAs can do roughly 100x the performance of similar generation microprocessors, give or take an order of magnitude depending on the exact application and provided the FPGA design is done well. It is very easy to lose the advantage by sub-optimal design. If I had a dollar for every time I've gotten remarks that 100x performance is not possible, or that so and so did an FPGA design expecting only 10x and it turned out slower than a microprocessor because it wouldn't meet timing etc, I'd be retired.

I guess I owe you an apology for merging your separate projects. I was under the impression (and glancing back over your posts still can interpret it this way) that these different topics were all addressing facets of the same RC project. I assumed (apparently erroneously) that this was all towards the same RC system. I also apologize for the insults, as I didn't mean to insult you or mock you, rather I was trying to point out that, taking all your posts together that I thought you were trying to hit all the corners of the design space at once, and at the same time do it on the cheap with defect ridden parts. I am still not convinced you aren't trying to hit everything at once....you know that old good, fast, cheap, pick any two thing. Rereading my post, I see that I let my tone get out of hand, and for that I ask your forgiveness.

In any event, truely dynamic RC remains a tough nut to crack because of the PAR and configuration time issues. By adding the desire to use defect ridden parts, you are only making an already tough job much harder. I respectfully suggest you try first to get the system together using perfect FPGAs, as I believe you will find you already have an enormous task in front of you between the HLL to gates, the need for fast PAR, partitioning the problem over multiple FPGAs and between FPGAs and software, making a usable user interface and libraries etc, without exponentially compounding the problem by throwing defect tolerance into the mix. Baby steps are necessary to get through something as complex as this.

fpga snipped-for-privacy@yahoo.com wrote:

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 4:48 AM

How does a FPGA compare with something like the cell processor ?

I'd have thought that for reconfig computing, something like an array of CELLS, with FPGA bridge fabric, would be a more productive target for RC. FPGAs are great at distributed fabric, but not that good at memory bandwidth, especially at bandwidth/$. DSP task can target FPGAs OK, because the datasets are relatively small. Wasn't it Seymour Cray whot found that IO and Memory bandwidths were the key, not the raw CPU grunt ?

-jg

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 7:50 AM

Since NOTHING exists today, I've offered several IDEAS, including the board mfg taking responsiblity for the testing and including it to the end user .... as well as being able to do the testing at the end user using a variety of options including triple redundancy and scrubbing. Multiple ideas have been presented so provide options and room for discussion. Maybe you missed that.

Not discussed was a proposal that the FPGA vendor could provide maybe subquadrant level defect bin sorting .... which could be transmitted via markings on the package, or by order selection, or even by using 4 balls on the package to specify the subquadrant.

For someone interested in finding solutions, there is generally the intellectual capacity to connect the dots and finish a proposal with alternate ideas.

For someone being obstructionist, there are no end to the objections that can be raised.

I'm not sure you realize

I'm not sure that you understand, where there is a will, it certainly can and will be done. After all, when it comes to routers for FPGA's there are many independent implementations .... it's not a Christ delivered on the mount technology for software guys to do these things.

Ray, the problem is that you clearly have lost sight that sometimes the expensive and critical resource to optimize for is people. Sometimes it's the machine.

Ray .... stop lecturing ... I understand, and you are worried about YOUR problems here, and clearly lack the mind reading ability to understand everything from where I am coming or going.

There are a set of problems, very similar to DSP filters, which are VERY parallel and scale very nicely in FPGA's. For those problems, FPGA's are a couple orders of magnitude faster. Other's, that are truely sequential with limited parallelism, are much better done on a traditional ISA. It's useful to mate an FPGA system, with a complementary traditional CPU. This is true in each of the prototypes I built in the first couple years of my research. More reciently I've also looked at FPGA centric designs for a different class of problems.

So? what's the point .... most of these applications run for hours, even days. I would like a future generation FPGA that has parallel memory like access to the configuration space with high bandwidth ... that is not today, and I've said so.

You are lecturing again, totally clueless about the issues I've considered over the last 5 years, the architectures I've exported, the applications I find interesting, or even what I have long term intent for. There are a lot of things I will not discuss without a purchase order and under NDA.

So, what's your point? Don't think I've gone down that path? .... there is a big reason I want ADB and the related interfaces that were done for JHDLBits and several other university projects. Your obsession with "highest clock rates" leaves you totally blind to other tradeoffs.

With hand layout, I've done certain very small test kernels which replicated to fill a dozen 2V6000's pull three orders of magnitude over the reference SMP cluster for some important applications I wish to target ... you don't get to a design that can reach petaflops by being conservative, which is my goal. I've used live tests on the DIni boards confirm the basic processing rate and data transfers between packages, for a number of benchmarks and test kernels, and they seem to scale at this point. I've also done similar numbers with a 2V6000 array. Later this year my goal is to get a few hundred LX200's, and see if the scaling predictions are where I expect.

So, I agree, or I wouldn't be doing this.

Accepted. And I do have nearly six different competitive market requirements to either fill concurrently, or with overlapping sollutions. It is, six projects at a time at this point, and will later settle into several clearly defined roles/solutions.

It's there in education project form .... getting the IP released, or redoing it is a necessary part of optimizing the human element for programming and testing. Production, is another set of problems and solutions.

Actually, I do not believe so. I'm 75% systems software engineer and

25% hardware designer, and very good at problem definition and architecture issues. I've spent 35 years knocking off man year plus software projects by myself in 3-4 months, and 5-8 man year projects with a small team of 5-7 in similar time frames with a VERY strong KISS discipline.

I see defect parts as a gold mine that brings volums up, and prices down to make RC systems very competitive for general work, as well as highly optimized work where they will shine big time.

I'm used to designing for defect management ... in disks, in memories, and do not see this as ANY concern.

I've built several, and have several more ready to fab.

FpgaC I've been using just over 2-1/2 years, even with it's current faults which impact density by between 2-20%. Enough to know where it needs to go, and have that road map in place. There is a slowly growing user base and developer group for that project. The project will mature during 2006, and in some ways I've yet to talk about.

This is a deal breaker, and why I've put my head up after a couple years and started pushing when JHDLBits with ADB was not released. There is similar code in several other sources that will take more work. I've a good handle on that.

I've done systems level design for 35 years ... operating systems, drivers, diagnostics, hardware design, and large applications. I do everything with baby steps and KISS, but by tacking the tough problems as early in a design as possible for risk management.

Again ... defect mangement may be scarry to you, because of how it impacts YOUR projects, in this project it is NOT a problem. Reserving defect resources is very similar to having the same resource already allocated. OK?

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 7:59 AM

A traditional multiprocessor shares one more moderately wide dram systems, which are inheriently sequential from a performance perspective, even when shared/interleaved. Caches for some applications create N memory systems, but also can become even a worse bottleneck.

The basic building block with FPGAs is lots of 16x1 memories with a FF ... with FULL PARALLELISM. The trick, is to avoid serialization with FSM's, and bulk memories (such as BRAM and external memories) which are serial.

DSP and numerical applications are very similar data flow problems. Ditto for certain classes of streaming problems, including wire speed network servers.

- A
- Aurelian Lazarut
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 10:53 AM

Hi John,

"Not discussed was a proposal that the FPGA vendor could provide maybe subquadrant level defect bin sorting .... which could be transmitted via markings on the package, or by order selection, or even by using 4 balls on the package to specify the subquadrant."

This is not a new idea, to bin them on a quadrant basis, the catch about this is more complicated than it looks. In order to bin them you need to identify the defect with much higher accuracy than normal testing, you want to know the position (that's easy ) and the nature of the defect, because not all defects will map nicely to this method, try to imagine a defect on the global routing net, this will can disable more than one quadrant, or a IO defect on a device with an IO ring (old devices) or an interruption on a long line which has cross from one quadrant to another, what about a defect DCM ? the defect map becomes quite complicated, but this is only the tip of the iceberg. In order to qualify such a part the time spent on a tester will be an x fold, not to mention the engineering effort to find out more about a particular defect ($$$) just to end up with a part which is "limping". In all this time spent on the tester you could test x times more parts (good ones) that we can sell as OK, and use engineering time to improve the coverage for testing.(not to mention the burden of managing parts with different quadrants which becomes a new part from the logistical point a view, marking packing, storing etc.)

Aurash

PS. I apologize for my English I'm not a native speaker/writer.

- P
- Phil Hays
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 1:55 PM

Why test at die level at all? Economics. Packaging costs money. Why test at package level at all? Full testing at wafer sort isn't realistic, and die damage during packaging happens.

Something quite like this was tried. Some very good reasons not to do it were found, the hard way. "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." (Douglas Adams)

Power supply measurement requires an ammeter per power supply per die, or some way to switch an ammeter between measurement points, like relays. I'd love to hear your plan.

Some things can't be implemented on wafers. Disk drives, relays, precision resisters, ...

-- Phil Hays

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 3:03 PM

Aurelian Lazarut wrote: Hi Aurash

No apologies necessary with me, I've dealt with too many other cultures where it's me that has the problem.

Several comments:

1) it comes down to statistically which resources will fail the most frequently. Given that the majority of the die's resources have a less global nature, I would assume (right or wrong) that the majority of the defects are less global.

2) I agree tester time is a valuable resource. I also believe there are better ways to do detailed problem issolation without that cost. Go-NoGo is one process, fault issolation another.

3) where there is a will, and good engineers, there is a way. Where the will is lacking, skill is lacking, there is always good reasons not to try.

4) If Xilinx doesn't want that business, I would certainly be happy to discuss purchasing the scrap product for a better price than crushing it.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Mar 21, 2006 3:14 PM

And for some, such a damn if you do, and damn if you don't is a purfectly good excuse to do nothing. Life isn't purfect. Finding solutions I find more valuable than finding restrctions and excuses.

One of the most remarkable forms of success, is the difficult challenges offered from failures. The cost of chipping away at this problem could be relatively small, one or two engineers for a few years adding a very small complexity addition to production die. When success materializes, the savings are substantial.

It always comes down to V=IR and there are plenty of designs/products that do current sensing well, even if an external reference standard is required. Maybe one of the ATE functions is to calibrate on die standards, and pass that to the rack manager.

None of which are needed on die for self testing.