Xilinx padding LC numbers, how do you feel about it?

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 8:55 PM

fpga snipped-for-privacy@yahoo.com wrote:

Umm yes, I disagree. I've got several designs that are 90%+ packed and are being clocked at clock rates well beyond typical design clock rates. Many of these have active cooling, but it is certainly possible to fill up and use a device.

You won't get the density and performance without handcrafting to meet both the density and performance limits. The typical user is going to run into place and route issues before he even gets close to the high density, high clock rate corner. I don't care if it is an RC application or not, you just don't get into that corner unless you do a considerable amount of handcrafting on the design. BTW, the handcrafting also helps tremendously with power, as a significant percentage of the power is dissipated in the routing rather than in the logic. If you keep the routing short and the design is maximally pipelined (stops lgitch propagation), the power dissipated by the routing can be kept relatively small. you can look at the designs on my gallery page for some older examples of designs that are dense and running near the limits of clock rate. My point is, you'll probably need to derate for shortcomings of an RTL synthesis tool flow before derating for a full device, and as for derating for a full device the power dissipation is so dependent on the design and PAR solution that there is no way to accurately predict it.

Worst case numbers are totally meaningless in this scenario as well. You could generate a design that purposely toggles every ff and intentionally congests the routing by forcing poor placement, and have something that could easily melt the balls right off. With proper cooling and attention to I/O switching, you have a shot at making it actually work in silicon. So where do you set the max based on circuit configuration? The answer is you can't. Instead, the best one can do is give the thermal characteristics of the package/die combo, maximum die temperature and provide the tools to allow someone to simulate this if they are concerned about it (like I said, the simulation is also meaningless unless it accurately models the data in the operational deisgn). If you are using the spread sheet estimator, you may be setting it up wrong to get meaningful answers. The routing complexity knobs have a lot of influence over the result, and are difficult to set in a meaningful way. For a floorplanned design, setting those to low complexity often still gets power estimates that are 2-4x higher than what is measured on the board. For a design with poor placement, and multiple levels of logic, I've seen the estimator come in with much less margin. Again, for data designs, you will rarely see greater than a 15% average toggle rate, and as I said, that is a function of the number system more than of the design itself. Anything much above that can be considered high toggle rates, not modest toggle rates as you propose.

A reminder too, it doesn't take many watts to make a chip without a heatsink feel hot to the touch. 5 watts is enough to burn your finger if the heat isn't dissipated. The same chip can handle 30 watts or more with a decent heatsink without excessive effort spent on cooling. The fact you burnt your fingers on an FPGA without a heatsink tells you very little about how close you were to the design corners for the FPGA, nor does the fact a device with a heatsink got warm to the touch. All it tells you is that in the first case, you probably didn't have adequate cooling for the design, and in the second case not even that (a chip cooled to 105F will feel warm to the touch, even though the silicon will run at nearly double that (85C) without any derating).

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 9:51 PM

Yes, it is better to train designers to check thermal levels - after all, everyones car has a temperature guage, so even the real HW novice can relate to that !

There could be a case for multiple thermal diodes, in a FPGA die, to avoid missing a hot area - or some way to 'route relative' to the sense diode ? :) Does anyone do that now ?

That's correct. But apart from the thermal measurement, about all you could do is something like a transistor SOAR ( Safe Operating ARea ) polygon, - and that will have some many caveats, it might confuse more than it helps.

I would push for thermal verify - if you are doing RC designs, then simply demand a PCB that HAS thermal management!

Not entirely - so long as a usable portion of the fabric runs at Fmax, then that is a valid number.

The bottom line is thermal: so why not just focus on that ? - more accurate than any predictive clock usage based modeling, which will be far more 'cockpit error' prone.

I also like the idea of Freq tracking cells, that use ring oscillators to give Vcc/Temp/Process auto-correcting values, and allow you to clock _really_ at the safe-limits, but I have not seen that deployed yet.

Do this, and the leading RC boards would quickly become VERY thermally efficent, as that gives the best performance.

-jg

- J
- Jerry Coffin
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 24, 2006 11:43 PM

[ ... ]

Looking in from the sidelines, it seems to me that quite a bit of this conversation is taking place more or less cross-purposes.

First of all, I think "derating" is a poor term -- though I tend to agree that they might be able to provide more useful numbers. One possibility might be to more or less directly specify the heat output from the chip (as a whole) per million (or whatever) transitions per second. This might give a better idea about trade-offs between faster clocks vs. more gates. Unfortunately, it has a substantial problem (thats been alluded to elsethread): it's basically dealing with the power consumed by logic, not by routing, so in any given design it might be off by a fairly large factor. I can believe that it could be reasonably useful for things like product selection though -- if you're planning to encrypt at a rate of X gigabytes per second (for example) it's fairly easy to figure a rough idea of the number of bit transitions involved and see if you're at least in the right ballpark. This wouldn't tell you that a design _will_ work, but it'd at least let you separate things that stand a reasonable chance from those that don't.

Second, in terms of providing a general-purpose computing resource, I don't think anything Xilinx (or anybody else) can provide in a datasheet is going to mean a whole lot. If you're providing a product for end users (instead of engineers) you need to make it foolproof. Nothing in the datasheet is going to

Whether Xilinx should provide a circuit like that on-chip (e.g. like most CPUs now have) is open to some question -- it would likely add a more or less fixed amount to the product price. An amount that would hardly be noticeable in a big Virtex would be utterly prohibitive on a small Spartan. Perhaps this would be a reasonable feature to add on the next generation of Virtex chips though...

--
    Later,
    Jerry.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 11:56 AM

Thanks for making my point. The Xilinx product chips + ISE is unable to route designs which have a high usage level, which is believe is because it both lacks routing resources and P&R needs improvement. You are probably one of better 1/4 of percent of engineers that might have the experience to beat P&R regularly ... but for the rest of us mortals the product isn't usabel as you get close to 100% packing.

For RC uses I have a laundry list of things that are wrong with P&R, some of which are WHY you can not get high density designs routed with ISE. P&R fails to pack FF's with the LUT that has it's input term ... choosing instead to use another LUT as a pass thru that is several CLB's away. Given a netlist that is obviously a 6x15 mesh from the routing, it tends to place the parts in an arc around the center of the chip instead. ... and a number of other observations that says it's costing algorithms have a very different goal, and fail because of it for some designs.

The problem is that P&R is not optional, they will not release the doc for an open source implementation which is turned to other applications, like RC. So until P&R can automatically route the same dense designs you pull off, I say the product chips+ISE isn't usable for dense designs.

The reason they get away with it today is that for hardware design there is a VERY strong incentive to buy up ... purchase a larger device, just to make sure that in the future changes will fit. So many designs will always have the headroom, and presure on Xilinx to improve P&R for high density routing is relatively low, as few designs will cross above 95% use.

With RC there is a completely different goal, and that is to use the entire chip, in fact, all of every chip in an FPGA processor array. High density designs are the norm with RC, and half device designs will be relatively rare.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 12:18 PM

Worse yet, RC will tend to use the largest chips available, or the largest chip with a reasonable cost performace. Buying up is not an option.

In this arena, we are talking about fiting designs to a half million or more on chip resources (LUT's, FF's , MUX's, etc) -- and for multichip RC platforms target environments with easily 20M LUT's or more. This is so far past hand routing, it's beyond even suggesting.

Automated tools are necessary to partition, route, place and optimize these large multichip projects ... P&R isn't even close to the right tool. Wanting a vendor to open up their tool chain to allow open source P&R at some point will not be just a request, or an option, it will become manditory for implementation dyamically loaded incrementally place and routed designs with libraries. The vendors that recognize that, and can produce large chips, will in the end own this high end commodity market.

Being able to compile, load and go with 20M LUT netlists in a few seconds is what is necessary ... five days of P&R is not an option.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 2:20 PM

The issue is not the devices; the current devices are actually over-routed. That is, there is more than enough routing in them. You can prove this to some degree by setting your timing constraints low. If you can accept poor timing, it is difficult to make an unroutable design. The router does have issues that have to do with trading quality of results for faster time to completion. You can still get a lot of mileage out of doing a bit of floorplanning. The problem is that in its haste, the router puts down routes working to connect everything up, and then works to shorten any routes that don't meet timing. Unfortunately, that ends up in a lot of circuitous routes that in turn congest the routing resources to the point where the router is tripping over itself. There used to be a delay based clean-up that ripped up and re-routed to minimize propagation delays rather than just reduce them to meet timing. That was removed in an effort to reduce overall compile time.

I do not do hand-routing. I avoid it like the plague because it is outisde of the normal tool flow, is difficult to embed in the design, is extremely time consuming, and is perishable meaning that if you have to run PAR again, your hand placement also has to be redone. I do use floorplanning extensively, both embedded in the code for frequently used components, and in the UCF for less used and placement of upper levels in the hierarchy. The point is, the tools will route a high density design just fine. They often need a helping hand to get a decent placement (although they will generally work for placement too if timing isn't critical).

As far as placing LUTs away from the flip-flops, that occurs in jsut a few specific cases: 1) you have more than one level of LUT between flip-flops. The first level LUT connected to the flip-flop D input will get put with the flip-flop. The second level obviously cannot go into the LUT occupied by the first. The placer doesn't do a very good job with second level LUTs, so they tend to be placed too far away. The easiest way to work around this is with deeper pipelining to ensure you have only 4 input logic functions between flip-flops. You can also put placement constraints on both LUTs to force placement in adjacent slices (the placer will generally not put them next to each other otherwise)

2) your combinatorial logic drives more than one flip-flop. 3) the LUT is driving the reset input to the flip-flop. The reset input is not directly connected to the LUT in the same slice, so the placer generally doesn't put it there. Resets sourced from combinatorial terms are also generally feeding more than one flip-flop, so you also run into the issue with the logic driving more than one flip-flop 4) The LUt is driving a clock enable input to the flip-flop. Same deal as the reset. 5) you have logic between the carry chain and the flip-flop that does not map into the XORCY. This is usually a mux or gating function. Off hand, I can't think of any other scenarios that cause the LUT to be physically separated from the flip-flop.

RC is going to give you dissappointing results if you insist on defining the designs with an RTL description. Synthesis and PAR without physical constraints is too slow, and the results are not generally going to get you into the high density-high performance corner. RC has a much better shot of working if it uses a library of hand crafted components that it simply stitches together. Those components will need to be designed by someone far more familiar with the FPGA architecture and high performance/density design for FPGAs than the typical user you described.

My point is, it isn't so much the tools, nor the devices that prevent a high density high performance design. It is the lack of expertise of the user. FPGAs are certainly not unique in this regard. Opening up the bitstream is not going to change this one iota.

Aiming for 100% device use is also a faulty goal. Quite the contrary, RC should be targetting less than 50% utilization for any one overlay so that it can hit a reasonable performance target with a minimum of effort. What is the point of performing an operation 2x as fast (100% utilization vs 50%) if it takes 10x the time to prepare to do it, especially when the time to prepare is already orders of magnitude longer than the execution time? As long as 50% utilization provides a good performance advantage over not using RC, it is still a large win.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 3:12 PM

you are not the only one that is suggesting that derating Xilinx parts

50% is the minimum rational target for an RTL based RC system on those platforms. I don't think this is acceptable long term, and very hard to justify. That Xilinx actively prevents alternative P&R and bit stream tools to improve an this, simply means they are not interested in better fit for their product line ... IE go away, we don't care about that market.

Thanks for clearly expressing this.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 6:03 PM

Why is it so hard to justify. One could argue that you can't use 100% of a microprocessor either. Any given instruction leaves part of the microprocessor idle: it is impossible to use all of the features all of the time. Why should you have different expectations of an FPGA? As I pointed out before, you can get to the high utilization and high performance corner, but you are not likely to get there while also pushing the fast to compile and easy to use buttons. FPGA design, at the core is digital logic design no matter how many fancy tools you throw at it.

The fact of the matter is that FPGAs offer many more degrees of freedom in the design than what is offered by a conventional computer. The added degrees of freedom make them very powerful for efficient processing, but it also means that the design space has more things to trade-off to get to a particular corner of the design space (not to mention more ways to approach a particular problem which makes it harder for automated construction). Getting into the density and performance corners requires more design effort. No amount of wishing is going to change that fact. Designing to hit the high performance and high density corners is possible, but it isn't likely to happen when trying to also stay in the minimal effort and fast time to compile corners.

If you want to call that de-rating the FPGA, that's your perogative. I don't see it as de-rating the FPGA, as the FPGA can and does meet the performance and density you are seeking, but at the price of design effort. That is not de-rating, that is weighing the design trade-offs. If you want to play in a particular corner, you need to make the concessions to get you there. You can't cover all the corners at once, and that certainly isn't unique to FPGA design. Which do you want more: fast compile times, ease of use, performance, density? You can reasonably get two of these, any more than that is not going to get you into the corner.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 6:27 PM

First, apples and oranges and cow pucky comparison. It's not about leaving unused resources idle, it's about not idling used resources, which is exactly the problem here. Good compilers get well inside of

99.x% efficiency for code to hardware fit in terms of the application for most architectures. Even poor compilers tend to get better than a 90% fit. When it's only possible to get a 50%, or less fit, by your standards in an fpga for the primary execution path netlist, that is a HUGE derate. Most good compilers pack pipelines with very, very, very few wasted cycles for nearly 100% hardware effieciency for the application. The goal, is to reach similar efficient pipeline packing on FPGA's, and waste few if any resources in the process. I agree with your argument, that the existing Xilinx fpga's and tools will not yeild close to 100%, and we need to derate that expectation that we carry forward from traditional instruction set pipeline packing experience. You are the Xilinx expert here, and if you claim less than 50% packing efficiency with Xilinx product ... I'm not going to stand here and argue with you about that.

I will argue, that given better integration to a different place and route tool, such as that contained in JHDLbits, that FpgaC can do significantly better than "less than 50% efficiency/utililization" of LUT/FF based resources for a large number of unrolled loop applications, such a finite difference modeling, RC5 code cracking, and other dense unrolled loop pipelines which are common in the industry as threaded/MPI/PVM multiprocessor applications.

- A
- Andy Peters
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 6:39 PM

I fail to understand how, as an FPGA application, "reconfigurable computing" is somehow different (more resource intensive, more "high performance," whatever) from an "application-specific" FPGA design.

After all, every FPGA engineer wants the best of everything: lowest-cost part (which implies smallest/least amount of logic/best use of resources), lowest power dissipation and of course the least amount of engineering effort to meet those goals!

-a

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 6:57 PM

In two ways ... it will be one to three orders of magnitude larger than a hand written hardware design, and as such will not benefit normally from low level optimization, placement, packing, routing enhancements that a typical hardware design will get. It WILL depend on the tools to do a better than average fit automatically.

Second the life of an RC design will frequently be very short, and will evolve. Hardware designs tend to live "forever" from a typical software perspective ... as such hardware designs have a very strong incentive to invest manual labor up front to get the best fit to hardware for cost management .... that labor cost will then be amortized over the life of many units. A typical RC program will have a total life span a fraction of that, and is not likely to be frozen in time with large scale hardware shipments so there is no large incentive to invest effort toward optimizing a particular RC applicaion on hardware. There is EVERY incentive to optimize tools to do a better job at hardware fit, as those tools will have a long life and the effort amortizedf over MANY RC applications.

The same argument, in analogy, is very few people invest the effort to hand optimize assembly language fit for applications ... but it is worth putting effort into optimizing the tool chain ... compilers, etc until diminishing returns is reached.

Other than that, gates are gates.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 9:00 PM

I think you missed my point.

The FPGA is not a processor, and its design is circuit design, not a one dimensional sequencing of instructions. As I mentioned an FPGA offers many more degrees of freedom. That translates directly into more design 'corners'. It is not possible to simultaneously hit all of them. That's the price of freedom.

Have you used JHDL? The higher densities you suggest do take more work to achieve, and come about through using placed macros. The same can be done using the xilinx tools by building a hierarchical library of placed macros. There is nothing inherent in the tools preventing this, but as I have said, it takes more design effort to get there. There is no magic bullet regardless of who developed the tools.

BTW, the efficiency I was referring to on a uP is the utilization of the gates. Each instruction only uses a fraction of the logic in the uP; the rest is idled. I only brought that up in an attempt to level the field between the two. You really can't compare them.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 25, 2006 9:54 PM

I did understand, and objected to this assertion. I'm half EE and half computer science, and have worked both fields for 35 years.

The way to get a huge bang for the buck with RC is to generate hundreds, or thousands of mini state machine driven dedicated function processors, much is the same way as you build pipelines of dedicated function processing elements for DSP on FPGAs with distributed arithmetic. You drop them into the fpga as a mesh fabric for 2D algorithms, as a pipeline for 1D algorithms, and even as a flattened 3D mesh if necessary. Highly regular connectivity, short routing, and locally optimal placement are all not just possible, but highly likely, with the right tools. Specialized array processing applications are likely targets. Most of the demos I've worked with fill an fpga, and have a VERY small number of idle LUTs. It's been painful to get par to do the right thing, thus my frustration with the existing tools, and a growing understanding of why they are what they are, and why that is wrong for where we are heading and how our usage fundamentially is different that the current tool chains design strategies.

FPGAs are a poor fit for non specialized functions which are largely sequentlal and lack any dominate computational kernel, with a few exceptions.

Others like wire speed network stacks and applications which can be pipelined, and similar application structures with streaming data are a strong fit with high degrees of parallism from the pipelined streaming data. There are a few other cases, most of which also exploit either replication or pipelining with streaming data. Any where we can disconnect from sequential memory operations to distributed LUT/FF memory with well defined operations in parallel is a good fit. I suspect we will find others as well, as our experience builds with this technology.

Addtions planned for FpgaC over the next year all target functionality to support applications with this profile. The distributed arithmetic to support finite difference kernels and matrix operations are high on my list, for the same reasons as they have been hugely sucessful in the DSP community. If you review the feature requests in the fpgac project on sourceforge, you will find the start of a laundry list of things we need to address over the next year to reach long term goals. Those goals include automatic partitioning of applications between netlists and more traditional processors or virtual machines (p-code or java vm like) to get the best resource utilitization for parallel sections of code, and sequential non-pipelinable code sections. This is a very large project to be incrementally implemented over a period maybe as long as several years. Some of it is well defined applications of traditional practice to fpga computing, other parts of it are in effect research projects as we break new ground to address problems with the technology, problems with mapping existing practice to this technology, and just new ground where new uses or new implementations fundamentally require a change in existing practice for fpga computing.

For the same reasons that it's futile to invest much energy into assembly language program optimizations, we view hand optimization of placement and netlists a very poor practice. That same energy invested into incremental improvements in the tool chain, from compiler to bitstream generation tools will yield long term benifits and push the state of art until deminishing returns is reached at high utilitization and high degrees of overall efficiency. It's not going to happen over night, we don't have all the answers today, but I'm certain that none of the road blocks visible today are long term problems.

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jan 26, 2006 12:39 AM

This has been an interesting side-thread, but more from pushing the ceiling aspects, than whether 100% fabric at 100% speed is really possible, or not.

So that indicates that 100% fabric usage is not an impossible task ?

Most Software suppliers I work with, will strive to improve their tools, if given specific and clear examples of : a) What detail is sub-optimal, and why b) How that can be improved, without impact on other users/usage.

Remember, those that write these tools, do NOT actually use them, so feedback from users that 'push' the tools, is very important.

A user might think their application area is too small, or too specialised, but tool flow improvements can ripple across many application areas, and also raise the average practical frequencies- and that can get the vendors very interested.

It seems one 'stub' of an opensource project could be to handle this Xilinx-User interface. Rather than try and rebuild their tools from the ground-up, why not work with them to improve the tools ? Yes, this is done a small detail at a time.

Anyone doing an RC-FPGA array, should have strong thermal management, even up to the copper heat pipe schemes of the extreme gamers ! :)

-jg

- K
- Kevin Morris
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jan 26, 2006 1:16 AM

So, I have a question regarding the assertion that the datasheet LUT count is "important information."

Why? What is the real-world utility of that number? Do some people know up-front how many LUTs their design is gonna use, and then they just need an accurate LUT count to pick the best FPGA?

The problem with programmable logic datasheets is that it's almost impossible to put any real, relevant information on them. A FEW things

- number of SerDes transceivers, maybe user pin count, marginally available block RAM... are kinda useful, but for everything else, don't you really need to just do the design and see what device the tools tell you to use?

I've written a couple of articles on/around this topic. My favorite was:

formatting link

and people also seem to like:

formatting link

Kevin

- A
- austin
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jan 26, 2006 2:02 AM

Kevin,

If I have learned anything from this thread, it is that there are folks for which the "actual number" is a religious matter.

Once the "actual number" is revealed in its glory, there are even a further class of folks who still doubt: there is still some nefarious untruth, or hidden secret that is intentionally not being revealed by the evil FPGA vendor. The Da Vinci Bit Code, if you will.

They weave a web of untruths and half whispers with convenient coincidences to explain their troubles.

Alas, all I can do is deal with those whose questions I can actually answer.

Peter and I have decided that having the actual numbers is good, so we go to battle to do just that.

No need to thank us.

Thanks for your support,

Austin

- B
- Brian Davis
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jan 26, 2006 2:27 AM

It's great for migrating an existing design; if, say, you want to know how many 3S1600E's it takes to replace a 2VP70 without reverse engineering the marketing gate inflation factor for each datasheet.

I also find it handy for a ballpark comparison between devices and vendors, bearing in mind differences in the turbo-bonus feature count.

If you get really down-and-dirty with the parts, it's great to know things like how many N-bit cascaded add/sub stages will fit up/down and across a given chip, so you can floorplan up front; or, just to realize when the synthesizer has 'optimized' something to be twice as big as it should be.

Here are some old ramblings on the subject of device LUT aspect ratio, where the concern is not just how many LUTs but how they are arranged:

formatting link

Brian

p.s. I remembered your 'tango' article about 5 seconds after I hit the post button on my last message, or I'd have included that link, too

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jan 26, 2006 4:07 AM

You may not know how large your designs are until you finish them, but there are many users who both have reusable IP with known sizes and need to actually *plan* their work rather than just letting it happen. Even if you don't know the size before you start, you can estimate it. I have never done a design without some form of estimate before I started. Kinda like the software guys estimating SLOCs. So why would I want to work with numbers that are already off by 12.5% when estimating a design?

In reality I decided to ask what others thought about the inflation factor because the inflation factor ticked me off. I was comparing X to A and I couldn't just look at the data sheet, I had to do the calculations to get the accurate numbers for X. I am tired of vendors making more work for me with no point other than boosting the ego of some marketeer. I was curious about what others thought. Some don't mind it, others seem to hate it like I do.

I have posted to this newsgroup about the inflation factor before and the response from the X guys was far less than supportive. Although many seemed to feel the inflated numbers are not appropriate for a data sheet, the only favorable response from X was to get the footnote added to the new data sheets so that you know they are making it up. But it seems that now the X guys have seen the light and are 100% behind truth in data sheets! Hazzah!!!

Actually, it is good for X that we have a separate department for FPGAs under software. I am designing the board and would likely use a Cyclone II because of the simpler power supply required. But the FPGA guy is a former FAE and wants to use X, so we'll use X. This is such a simple design it doesn't matter much which way we go. Heck, we could do without the FPGA if we didn't want to use a DSP. We are doing CVSD on voice and they already have the code for a DSP. The DSP doesn't have UARTs, so we are adding an FPGA for some UART interfaces. If we were willing to port the CVSD code we could do it all in an MCU with 6 UARTs and a small CPLD since one interface is Manchester encoded. Or we could drop the DSP and do it all in the FPGA.

The really ironic part is that everyone on the project is acting like this job is even remotely difficult. I had forgotten what it was like to work for a defense contractor, the only hard part is figuring out what the real requirements are... anyone have some spare work to throw my way? I need something to keep my mind active :-)

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jan 26, 2006 7:30 AM

Xilinx's ommissions and the quest for that data is not a cultish movement.

You still haven't explained just what the max currents are for the various power and ground rails are. Or the max power the chip can actually safely disappate without violating the die limits or seriously impacting the life of the die due to migration problems.

There are a lot more undocumented things to discuss after these basics, like the dynamic power for LUT's, FF's, local routing, long routing, BRAM's and the like to get a handle at the front end of the design partitioning what the power requirements will be.

These are serious engineering issues for anyone that wishes to actually USE a significant portion of the chips resources, rather than expect that

85% of the die will be idle.

Not even on die measuring is enough without this data, as we can easily create a hot spot away from the diode that exceeds die temp specs by a factor of two or more.

This is not a regligous debate ... this is real engineering up front for worst case loads, not tinking in the lab with missguided retrofit cooling.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jan 26, 2006 7:49 AM

I certainly don't think it's impossible for some designs, it is however "difficult" with present tools.

something of a chicken and the egg problem. Getting major changes into the par design are VERY LIKELY to dribble out slowly over 5 years to avoid a major upset for the existing customer base.

Yep, in each of the large system design proposals I have made for the last year that is a milled 1/4" copper plate heat sink in direct contact with the fpga array, connected integral with a chilled water heat exchanger. Just can not handle the heat density of a large RC array with air. Austin may think that needing the max parameters for these devices is a joke, but when you propose to put several thousand of them in a 1M cube, your customers actually demand thermal and power data as diligent engineering. While the lack of these specs for the Xilinx product may sit will for some, it's a critical road block for serious hit the ceiling hard designs. Putting a significant fraction of a megawatt into a desktop box isn't childs play, or even a hobby project, it starts with a lot of engineering before the software design even is a dream.