TILE64 embedded multicore processors - Guy Macon

Guy Macon · 2007-10-12T19:23:50+00:00

I have been following the development of these processors for the last five years, but only recently have I seen a bunch of marketing material that ranges from being wrong to outright deception.For an example of being just plain wrong, look at the prettypicture here: [ ]Looks like the corner processors only connect to two otherprocessors, doesn't it? Actually, the topology is a torus,so the far right processor on each row has a wraparound connection to the far right processor. Ditto for top/bottom.Another claim that is just plain wrong: "In architectures of this sort, you can keep growing and you won't have any serious congestion."The reality is that it takes one cycle for data to move from a processor to one of it's nearest four neighbors two cycles to reach the four nearest diagonal processors, and eight cycles to reach the processor farthest away -- and that 8 cycles will become 16 cycles on a 256-core design. Note that these 8or 16 cycles limit the latency of the L3 cache... It is also a basic reality of this architecture that as you scale up to more processors, each one has more data passing through it,causing -- you guessed it -- serious congestion.And, of course, they are trotting out the age-old vaporwarepixie dust compiler that will by magic solve all the problems involved with writing code for parallel processing, just like all the previous vaporware pixie dust compilers were supposed to solve all the problems involved with parallel processing.It is also quite telling that they aren't really revealing all the technical details. Go ahead and try to find out what theinstruction set is, whether all those processors can each talkdirectly to the gigabit ethernet ports on the board they say they are selling, or even the price of that board.The hype says that this is a "sea change in the computing industry," and the "first significant new chip architectural development in a decade." The reality is that this is an old idea with a few new twists,suitable for...

G

Guy Macon 18 years ago

Content-Transfer-Encoding: 8Bit

Nick Maclaren wrote:

Someone making misleading claims about microprocessors? I am shocked - shocked I tell you. What next? Musician using drugs? Politicians making promises they can't keep? :)

Found another interesting web page:

Near Speed-of-Light On-Chip Electrical Interconnects

formatting link

See Figure 1.4 and section 2.2.1: Resistance-Capacitance (RC) Lines: Single, Continuous Across-Chip Interconnects

Guy Macon

Vote

P

Patrick de Zeester 18 years ago

The mistake is that you are calculating the wrong thing; the speed of light isn't the only factor that affects the propagation delay of a wire across a chip.

BTW: if you are interested in the answer why did you set the follow up to alt.dev.null?

Vote

P

Patrick de Zeester 18 years ago

At this point in time they are probably more interested in investors than clients.

Vote

D

Del Cecchi 18 years ago

Look at the equations for an RLC transmission line and then let r be >> wL.

chip wires are not transmission lines like on cards or boards. I can't be any clearer than that.

remember that the switching threshold is vdd/2 in general.

Vote

E

erik magnuson 18 years ago

Closest approach would be really long telephone lines (circuit length greater than 5 miles). Google for Oliver Heaviside and the Telegrapher's equation - and why loading coils are used on long lines.

That is the key point, with digital electronics we want the input to the gate to swing a good fraction of vdd as the gate isn't capable of doing much in the way of signal processing. Even if it were possible to implement better signal processing, that processing would most likely slow the gate.

Getting back to the example of long phone lines, an RC transmission line is dispersive, velocity of propagation will vary with frequency. With the exception of the clock, the signals will have widely varying frequency components which will mess up trying to overcome the attenuation due to the distributed RC low-pass filter(s).

Things may change if someone figures out how to use carbon nanotubes as interconnects as they have substantially higher conductivity than a similar sized copper conductor. I'm not going to hold my breath waiting for this to happen.

- Erik

P.S. A good reference is "High Speed Digital Design, a Handbook of Black Magic" by Graham and Johnson.

Vote

G

Guy Macon 18 years ago

Fascinating! They do appear to have a lot in common.

Especially interesting web site I found while searching:

The Telegrapher's Equation

formatting link

Transmission-Line Equations

formatting link

Telegrapher's Equations

formatting link

US Parent 154185 (1925) Electrical wave Transmission

formatting link

Thanks! I will order it.

I am also beginning to see an interesting pattern: paper after paper saying that semiconductor researcher Y has developed scheme X that is supposedly better than the usual repeater scheme, but somehow none of them end up on pages saying that scheme X has become the standard way of doing things when designing ICs. Here is one example:

--------------------------------------------------------------- Surfliner: a distortionless electrical signaling scheme for speed of light on-chip communications Hongyu Chen; Rui Shi; Chung-Kuan Cheng; Harris, D.M. Computer Design: VLSI in Computers and Processors, 2005. ICCD 2005. --------------------------------------------------------------- Summary: We present a novel scheme to implement distortionless transmission lines for on-chip electrical signaling. By introducing intentional leakage conductance between the wires of a differential pair, the distortionless transmission line eliminates dispersion caused by the resistive nature of on-chip wires and achieves speed of light transmission. We show that it is feasible to construct distortionless transmission line with conventional silicon process. Simulation results show that using 65nm technology, the proposed scheme can achieve 15Gbits/s bandwidth over a 20mm on-chip serial link without any equalization. This approach offers a six times improvement in delay and 85% reduction in power consumption over a conventional RC wire with repeated buffers. ---------------------------------------------------------------

Guy Macon

Vote

S

Scott Michel 18 years ago

This being the same Argawal who said that floating point is superfluous and irrelevant recently at HPEC at Lincoln Labs. Of course, he said that to a bunch of embedded systems people who live and breath things like signals and image analysis... for whom floating point is a requirement, not a feature.

BTW: Internals of how TILE64 works can be gleaned by looking at how it's predecessor, RAW, worked. The two are not particularly different, evidently.

-scooter

Vote

N

Nick Maclaren 18 years ago

In article , Scott Michel writes: |> |> This being the same Argawal who said that floating point is |> superfluous and irrelevant recently at HPEC at Lincoln Labs. Of |> course, he said that to a bunch of embedded systems people who live |> and breath things like signals and image analysis... for whom floating |> point is a requirement, not a feature.

Oh, really? Those are classic requirements for fixed-point, which was heavily used until floating-point became so standard and cheap that it was easier to change. Indeed, there was a time when floating-point was never used for such purposes!

But how many people nowadays know even that you CAN do numerical programming (and analysis) in fixed-point? :-(

Regards, Nick Maclaren.

Vote

G

Guy Macon 18 years ago

I wouldn't go quite as far as that, but there was a lot of signal and image analysis done before floating point got cheap enough to be a reasonable replacement for fixed point math. Having done a bit of both, I certainly wouldn't want to go back to the old ways, so "superfluous and irrelevant" is only true if one is willing to do things the hard way just to make the life of the chip designer easier. :)

My big problem with the Tilara TILE64 hype is the obviously bogus claim that they are shipping product. I asked a fortune 500 client I do consulting work for whether they might have an interest, and they approved me buying a development board and writing up a review / recommendation. Tilara didn't reply. That's a sure sign that they are still at the stage where they are looking for investors so that someday they can ship a product. If they had an actual product to sell they would have been all over me. A sham, really. I really wanted to study the TILE64, and my client would have bought a boatload of them if they turned out to be a good fit for the application. :(

It does look that way. Alas, they appear to have abandoned the main idea that caused them to pick the name RAW; "Raw exposes its interconnect, I/O, memory and computational elements to the compiler. This exposure allows the software system to allocate resources and coordinate data flow within the chip in an application- specific manner." Instead they have decided to write about how great the TILE64 is without ever revealing the details about *how* it works. This may mean that they not only have nothing ready to ship, but don't even have a completed design...

Here is info about RAW: [

formatting link

]

Guy Macon

Vote

N

Nick Maclaren 18 years ago

In article , Guy Macon writes: |> |> I wouldn't go quite as far as that, but there was a lot of signal |> and image analysis done before floating point got cheap enough to |> be a reasonable replacement for fixed point math. Having done a |> bit of both, I certainly wouldn't want to go back to the old ways, |> so "superfluous and irrelevant" is only true if one is willing to |> do things the hard way just to make the life of the chip designer |> easier. :)

Not JUST that! While I have done only a little of it, accuracy and bounds analysis is often quite a lot easier in fixed-point, and that is important if you are trying to build an almost perfect system. I agree that floating-point makes it is lot easier to get results, but it does make it harder to prove that your program won't fail, whatever the input.

Of course, there are SOME cases where accuracy and bounds analysis is easier in floating-point, and there might even be a few cases where it is easier in IEEE 754 than in traditional models. I don't know of any, but I am not denying their possibility.

Regards, Nick Maclaren.

Vote

G

Guy Macon 18 years ago

Good point.

Which would you prefer in your anti-lock brakes? I would tend to prefer fixed point...

Guy Macon

Vote

E

Everett M. Greene 18 years ago

Scaled, fixed-point arithmetic is still useful today. Transcendental functions are an order of magnitude or more faster in fixed-point for some of the smaller micros.

Vote

W

Walter Banks 18 years ago

Agreed. We wrote a fixed point transcendental library that is very useful on some very small 8 bit micro's as part of a C compiler support for ISO 18037. It was amazing how much of the implementation depended on 40 year old software computer technology. The trade-off seems to be lower dynamic range, controllable resolution by selecting data types and fast execution time.

We are starting to see many potential applications. Control systems and instrumentation to name two.

Walter Banks.

Vote

J

Jim Granville 18 years ago

Walter, Got any Size/speed metrics on that, and how it compares with the more classic IEEE floating point libraries ?

-jg

Vote

N

Nick Maclaren 18 years ago

In article , Jim Granville writes: |> |> Got any Size/speed metrics on that, and how it compares with |> the more classic IEEE floating point libraries ?

There are few, if any, classic IEEE 754 floating point libraries.

No, I am not joking. Almost all libraries use a hacked IEEE 754 model, because the full one is such a pain, and doesn't make a great deal of sense for many functions.

And the answer is "it depends". It can be negligible, or it can be huge, often depending critically on how accurately you want to calculate the values at the extreme ranges. sin(1.0e300) or pow(1+1.0e-15,1.0e15), for example.

Regards, Nick Maclaren.

Vote

P

Paul E. Bennett 18 years ago

Those of us who still use Forth for our mainstay environment know it's faster with fractions.

******************************************************************** Paul E. Bennett............... Forth based HIDECS Consultancy Mob: +44 (0)7811-639972 Tel: +44 (0)1235-811095 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

W

Walter Banks 18 years ago

Do you have any suggestions for meaningful metrics or meaningful benchmarks?

The quick answer is adds and subs take the same execution time as integer's of the same size and are done inline. Mul and divide are very similar in execution time to the integer's of the same size. We implemented fixed point mul and div libraries separate from the integer libraries because we want different bytes from the results. The multiplied result from two fracts is in the MS byte(s) and two ints it is in the LS byte(s). With separate libraries we could use different implementation algorithms and optimize for size and cycles.

The transcendental library was an interesting project. As soon as I understood that the goals of fixed point were precision and not dynamic range the algorithmic focus changed. (I spent a day looking for the cause of a test suite failure that was due to the compiler's internal math being done with IEEE 754 less than the 32 bit resolution for accum 8.24) Some things just work better in fixed point sin and cos for example.

One thing I have found is a big difference in fixed point functionality as soon as the compiler gets involved. The source code starts looking familiar. Applications port is easier than I expected from IEEE 754 float, select the appropriate fixed point type for variables and recompile in many well implemented applications.

Nick's comment is relevant. Our IEEE754 is somewhere between a minimalist and full implementation for the 8 bit micros we mostly support. (Tradeoff between execution time and application requirements)

Walter Banks..

Vote

N

Nick Maclaren 18 years ago

In article , Walter Banks writes: |> |> Some things just work better in fixed point |> sin and cos for example.

Grin :-) Modern Computing Methods, anyone? My first computing book.

|> One thing I have found is a big difference in fixed point functionality |> as soon as the compiler gets involved. The source code starts looking |> familiar. Applications port is easier than I expected from IEEE 754 float, |> select the appropriate fixed point type for variables and recompile in many |> well implemented applications.

Yup. Clean numeric code is very robust. Floating-point, fixed-point, fixed-slash, whatever.

|> > There are few, if any, classic IEEE 754 floating point libraries. |> |> Nick's comment is relevant. Our IEEE754 is somewhere between |> a minimalist and full implementation for the 8 bit micros we |> mostly support. (Tradeoff between execution time and application |> requirements)

Quite. You and virtually every Tier 1 vendor. The only system that I know of that is full gung-ho IEEE 754 (and C99) is Sun ONE Studio 9 or later on Solaris 10 or later. I have heard rumours of others but, like the Elephant's Graveyard, the closer you get to them the less definite the rumours are.

Regards, Nick Maclaren.

Vote

J

Jim Granville 18 years ago

Thanks - the ones below are a start - simple is good on uC, tho you could also do a benchmark or two. Something like a polynomial sensor curve correction could be a still-small, but practical uC benchark.

I found this on the web, after similar frustrations....

formatting link

a nice, easy to use (very) high precision calculator, so you know the 'real' answer to a calculation.

any examples of how 'fixed' the point is ?

Vote

W

Walter Banks 18 years ago

We implemented 8,16,24 bit fracts and 8.8, 8.16, 8.24 accums (ISO/IEC 18037) and 16.8 and 16.16. The primary purpose is

18037 support. A year from now we will know how important 16.16 is. We implemented _Accum and _Fract with short and long modifiers and defined these with size specific types as well.

_Accum a,b,c; // This is an accum 8.16

. . . .

a = b + 6.736;

c = (a *0.654) + (b * 0.274) + 25;

w..

Vote

TILE64 embedded multicore processors - Guy Macon

Join the Discussion

Didn't find your answer?