Custom CPU Designs

T

Tom Gardner 6 years ago

Here's a real-life example which has subtle problems.

Consider a token ring network in the presence of failures which split and reconnect the ring.

Your task is to have exactly one token circulating, not zero, not two. Prove your solution meets that criteria.

Vote

J

jim.brakefield 6 years ago

It is fairly common for FPGA vendors to sell subsets of a physical chip by disabling a portion of the chip (for example effectively cutting the chip in half or disabling a processor core). Look at the number of configuration bits.

Vote

U

upsidedown 6 years ago

With Beckhoff bus terminals, you can stack a number of simple I/O modules together. The module could be as simple as a 2 digital input or an other with 2 digital outputs or more complicated I/O modules.

At the end of the stack you just attach a fieldbus module. This could be e.g. Modbus, CAN bus or in your case an EtherCAT module. You can change e.g. from CAN to EtherCAT by simply replacing the fieldbus module at the end. No need to disassemble the I/O module stack.

Even this example shows the problem of interfacing only a few I/O bits to a higher level system. It doesn't make sense to make fieldbus interfaces if you just need say 2 digital inputs. In this case, the Beckhoff bus terminal stack acts as a concentrator, so that discrete signals from a larger area is wired into the module stack.

While logically the EtherCAT protocol would allow nodes to effectively handle only a few digital I/O on each node, it is not economically practical. Of course, if EtherCAT I(O controllers can be made into 8 to 14 pin chips, the situation would be different (2 x power, 2 x Ethernet, 2-4 digital I/O pins), the situation would be different. But you would still need two magnetics.

Would one really want to have a large number of such stations all around a plant, each exchanging only a few at bits ? Use some hierarchical system, but the expected advantage of EtherCAT is lost.

The EtherCAT has the same reliability issues as 10Base2 and 10Base5 coaxial Ethernets with a large number of connections to a single bus.

Vote

D

David Brown 6 years ago

Yes.

It does make sense in the automation world. A key point is that the bits get added or dropped where you want them dropped.

An EtherCAT slave would take more than one virtual core on an XMOS - probably 3-6, I would guess, depending on the features you want.

But if you only want a few bits of data you'd use a simple EtherCAT peripheral with digital IO, not a microcontroller at all.

EtherCAT is always logically a ring, and you can (if you want) have both ends connected back to the master. That means you can break the ring - either by accident, or while changing the live network - and things carry on as before.

Different solutions work best in different circumstances. EtherCAT is not for everyone.

Vote

D

David Brown 6 years ago

Yes, that may well be the way to do it. (I'd guess you could split up sections a bit more than that, especially if you are willing to relax the timing specifications for routine a little.) But even with the suggested half-disabling, it could be worth it if your yields are low. Suppose that 30% of your 50 kLUT chip have a fault - that means 70% can be sold. 70% of the remaining ones - 20% of the die - can then be sold as 25 kLUT devices. These are "free".

All big IC designs are made with a view to minimising the waste due to production faults, because faults are not uncommon with big chips that push the limits for production. Multi-core CPUs are regularly made with more cores, and sold as fewer core parts where faulty cores are disabled. The same applies to memory of all types. And I know that Altera certainly used to have an option to buy pre-programmed devices to fit your design - these were cheaper because they could use dies that had faults which did not affect your particular design.

Vote

G

Grant Edwards 6 years ago

What makes you think the multi-core EtherCAT slave is exchanging only a few bits. The ones with multi-core processors are typically I/O hubs that can handle many hundreds of bits. You asked what's the point of using a multi-core processor in an EtherCAT slave. I told you the reason why people design them that way: because they need the CPU power to handle other protocols simultaneously or do things like image processing.

Generally, the multi-core EtherCAT slave _is_ part of a hierarchical system. For example the EtherCAT slave might be an IO-Link master with 8 attached IO-Link sensors, each of which can handle 32 bytes of input and 32 bytes of output.

You seem to be arguing against using a multi-core processor in an EtherCAT slave does nothing other than handle a few bits of DIO.

Nobody does that.

Nobody is proposing that.

You were worried the entire network was susceptible to single-point connector failure. With a ring, it's not, you'll need a two-point failure to loose comms.

Vote

U

upsidedown 6 years ago

My main point was that if you are going to transfer a significant number of bytes/node (say at least 10 bytes), why use EtherCAT in the first place ?

You could then use any standard garden variety RS-422/485 or

10/100/1000BaseT hardware with some standard protocol, even Modbus RTU/UDP/TCP.

If the node complexity justifies using xCore, then most likely is going to transfer a lot of data to the outside world.

The EtherCAT+xCore combination doesn't make much sense, but EtherCAT alone or xCore alone can be quite competitive in their own niches.

Vote

R

Rick C 6 years ago

I'm trying to explain they don't test the chips to "bin" them and sell them according to their capacity. They simply design a die to have X capacity but also sold as Y capacity. The die are tested to how they want to sell t hem and if they don't pass they are trashed for either size testing. Appar ently they don't find it worthwhile to test and retest.

I think on most devices if you have a failure rate high enough to make binn ing worthwhile you have process problems that need to be addressed.

I was told they were cheaper because the testing time is shorter and test t ime is a significant portion of the cost of making and verifying the chip. Just considering the routing, imagine how many times they have to reconfig ure the device to exercise every routing segment.

The largest chips in any FPGA line may have significant failure rates, but for the bread and butter products they don't have a low enough yield to wor ry with how many die are rejected due to testing failures.

The real reason they use the same die for more than one product is because the cost of the mask sets is so high. They make more money selling a die a t half capacity rather than making two different designs.

Rick C. --+-+ Get 1,000 miles of free Supercharging --+-+ Tesla referral code - https://ts.la/richard11209

Vote

G

Grant Edwards 6 years ago

Because that's what the rest of the plant is using. Not all EtherCAT nodes are identical. Many may only be exchanging a few bits. Some need to do more. The nodes that need to do more may need more processing power.

That requires a whole new cabling infrascture.

Grant

Vote

G

Grant Edwards 6 years ago

QoriQ?

Wow. That name is stunningly, amzaingly bad. Do Silicon vendors send people to some specialized school where they learn to come up with the most awfult product line names possible?

Grant

Vote

D

Dimiter_Popoff 6 years ago

They had that "digital DNA" before, not much better :-). Someone in their marketing may think they are in the business of selling soap or chocolate... Of course it does not matter much, how many of us would pay attention to the marketing name when choosing a platform - and their products are really good. OTOH I am not sure to what extent the likes of us here have much to say in big corporations when it comes to platform selection so things like that may have cost them.... or made them profit, I would not bet much on which of the two.

Dimiter

Vote

G

Grant Edwards 6 years ago

I remember being at an Embedded System Conference during the "Digital DNA" campaign and seeing that phrase on T-shirts, tote-bags, ID badge lanyards, etc. I even sat through a "Digital DNA" video presentation at one point during the conference. Neither I nor anybody I talked to had the faintest idea what "Digital DNA" was supposed to mean or whether it referred to anything concrete or not.

Grant

Vote

D

David Brown 6 years ago

I know that this is done with some devices, certainly. For one of Atmel's AVR devices, the sole difference between the 64K version and the

32K version was the text printed on the package.

(Long ago we used to use a microcontroller that had 8K of OTP memory. Then we discovered that the 32K version was significantly cheaper. This was because the 8K version was made by producing a 32K version and then running an extra step to program 24K of the memory to zeros.)

I am not privy to the testing or binning procedures for FPGAs. Your suggestions sound perfectly reasonable to me. The suggestion that they using binning for some parts is also perfectly reasonable, and I know it is done on some other big chips. But I have no idea which is used for FPGAs.

Some devices /do/ have high failure rates - particularly in early stages of development or for low volume parts.

That also sounds reasonable. It is not the explanation I heard, but I have no way to judge which system might be used. (Or maybe it's a combination, or maybe it has changed, or varies for different parts or different manufacturers.) There is little point in guessing.

Vote

R

Rick C 6 years ago

Did they not disable the extra memory in some way? Even if most of the chi ps work over the full 64K the possibility of a problem from the chip not be ing fully tested might be enough of a deterrent that people won't buy the 3

2K part to use as 64K.

I'm just telling you what I was told by the FPGA company representatives wh o used to post in c.a.fpga years ago. One was particularly argumentative a nd the company got them to stop. Another was Peter Alfke who was an FPGA i ndustry icon.

I don't know of any low volume FPGAs or MCUs other than perhaps very old en d of life product. The cost of yield issues hugely impact the cost of the final product because you not only pay for the bad dies, but all the testin g time to show the bad dies are bad. That's probably why they don't retest for the "slower" or "smaller" bins, the chances of failing that test are p robably much higher and so even more costly per good unit.

Rick C. --++- Get 1,000 miles of free Supercharging --++- Tesla referral code - https://ts.la/richard11209

Vote

T

Theo 6 years ago

MIPS is... complicated:

formatting link

I would not choose it for a new design at this point.

Theo

Vote

R

Rick C 6 years ago

y

Yeah, but why exactly? Is the concern that there won't be future updates/u pgrades to the architecture? Tool support will wane? With the various par ties involved dropping like flies, it seems like it would be a choice where no one would have an interest in bothering to collect royalties.

But then what do I know? This might set you up to where five years from no w you end up having to pay not only royalties, but penalties.

Rick C. --+++ Get 1,000 miles of free Supercharging --+++ Tesla referral code - https://ts.la/richard11209

Vote

G

George Neuner 6 years ago

They get names from the same source as do pharmaceuticals.

Vote

D

David Brown 6 years ago

In this particular case, no the memory was not disabled - but it (so I heard) was fully tested. Basically, the 64K parts were more popular (due to a few big customers), and it was cheaper for the company to make and test more of them than to have a different setup for the 32K parts. It was perhaps only a temporary measure - one hears stories, but never full details.

Fair enough.

I believe that for some parts there are spot-checks on dies - they take a few of the parts and test them for speed, temperature, power, etc., and use that information for binning all parts on the wafer.

But again, I have no idea if that applies to any particular part.

Vote

T

Theo 6 years ago

If the IP situation is unclear, companies may hold back in case someone litigious ends up as the owner of the IP (see SCO)
It appears there is only one remaining Linux maintainer, LLVM has no maintenance, FreeBSD is likely to drop MIPS in 2022. Most end users are unlikely to put in the effort to keep the toolchains current.

Theo

Vote

Custom CPU Designs

Join the Discussion

Didn't find your answer?