Re: Intel details future Larrabee graphics chip

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 27, 2008 1:30 PM

I think the "it all always happens" thinking of analog circuit design is more like FPGA design than most programming languages.

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 27, 2008 2:06 PM

In article , MooseFET writes: |> |> I think the "it all always happens" thinking of analog circuit design |> is more like FPGA design than most programming languages.

Because most are Von Neumann! Prolog is much closer to analogue computers.

Regards, Nick Maclaren.

- A
- Anne & Lynn Wheeler
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 27, 2008 2:37 PM

look a virtual memory systems from 60s ... cp40 (on 360/40 with custom virtual memory hardware) and cp67 (on 360/67 that came standard with virtual memory) the size of real storage and the relative page-miss latency to paging drum (in processor cycles) is compareable to modern processor caches and relative cache-miss latency to memory. somewhat related earlier post in this thread

formatting link

Future architectures

besides the mentioned paging algorithm work as undergraduate in the 60s, i had also done a lot of scheduling algorithm and other performance related work (all of it shipping in cp67 product). in the (simplification) morph from cp67 to vm370 ... a lot of that work was dropped.

i had moved a lot of the work (that had been dropped in the morph) to vm370 and made it available in internally distributed systems ... some recent posts with references

formatting link

Error handling for system calls

formatting link

Yet another squirrel question

when the future system project failed

formatting link

there was something of a mad rush to get stuff back into the 370 product pipeline (which had been neglected ... some assumptions that future system would replace 370). this was possibly some of the motivation to pickup & release much of the stuff that I had been doing (during the future system period). some recent references:

formatting link

Yet another squirrel question

formatting link

Unbelievable Patent for JCL

one of the features that I had added with moving a lot of my stuff from cp67 to vm370 ... was some scheduling cache optimization (with the increasing use of caches on 370 processors). Nominally, system was enabled for (asynchronous) i/o interrupts ... which can have lot of downside pressure on cache hit ratio. The scheduler would look at relative i/o interrupt rates ... and change from general enabled for i/o interrupts to mostly disabled for i/o interrupts with periodic check for pending i/o interrupts. This traded off cache-hit performance against i/o service time latency.

for other topic drift ... there was survey of some number of operations during the summer of '81 (which included some KL10 and Vax systems) this post has some excerpts from that survey (with some comments about time-sharing comparison between cp67 and some KL10 systems):

formatting link

other posts with other excerpts from that survey

formatting link

~lynn/2006n.html#56

formatting link

other past posts mentioning scheduling/performance work

formatting link

other past posts mentioning paging algorithm work

formatting link

--
40+yrs virtualization experience (since Jan68), online at home since Mar70

- T
- Terje Mathisen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 27, 2008 4:05 PM

With long mantissas, like 100+ bits for 128-bit fp, the mantissa shift operation has to involve multiple integer registers, while using mod 256 allows byte moves (or on some architectures like x86, unaligned loads) to skip the shifts entirely.

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 27, 2008 11:08 PM

Indeed. But if you use a non-IEEE format most of the time, why not use it all the time? Using IEEE on 8-bit micros seems overkill... Have you seen Steve Wozniak's amazingly compact 6502 FP emulation code? That's a reasonable format - with base256 it would likely be faster still.

all.

Note you can use XCH in the above code to get it down to 9 instructions.

Wilco

- A
- Andrew Reilly
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 28, 2008 1:16 AM

In the ARM (and some other processors), byte permutations and unaligned loads are achieved with shifts, so this "advantage" is quite processor dependent. Well, I guess that x86 and the general ability to do unaligned loads probably shifts the numbers towards byte ops as time goes on.

Cheers,

--
Andrew

- A
- Andrew Reilly
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 28, 2008 2:05 AM

I vaguely remember being quite impressed with Bill Gates' five-byte floating point format used in TRS-80 Z-80 Basic. Seemed quite reasonable, under the circumstances. Both of those pre-date IEEE FP of course, so compatability wasn't a concern.

Perhaps I was just easily impressed back then :-)

Cheers,

--
Andrew

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 28, 2008 2:15 AM

use them

normal

hout

ernal

t again

he

ion to

=A0While

eded

ed.

up

e

Somebody else wrote the spec.

I have used up to 128 bit integer on a 8 bit machine. The numbers were really only good to 48 bits but I needed to do some math without increasing the dirt.

.

I've only seen it briefly. A 6502 is a very quick processor.

bad at all.

y

a zero

one

I don't see how the XCH can be used without ending up with the value rolled around by one byte. Since we may or may not go through this code, this doesn't seem like a very good option. I guess if I have plenty of code space, I could copy the following routines in 4 different versions.

- K
- Kim Enkovaara
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 28, 2008 6:11 AM

Then you have something wrong in your coding style. Normally if some portability is taken into consideration during design, porting takes few days to get first impressions. Usually what is needed is some tweaking of attributes in inferred memories, new clocking scheme (PLL etc.) and maybe some IO pin instantiations if DDR or very high speed signaling is used. For some higher level IP (tranceivers, PCI express etc.) more effort is needed, but that can be handled by portability layers.

With third party synthesizer scripts need only minor changes. And even with the integrated ones usually some editor tweaking can be used to get most of the scripts converted. STA is the most painful thing to convert, that will take some time. But that is only loosely part of the code, and Synopsys SDC format is gaining ground in FPGA tools also.

It is quite normal to code the FPGA to support many different chips and keep the vendors fighting with price to the end and select the one that was the cheapest ;)

--Kim

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 28, 2008 12:58 PM

I wasn't referring to just my code when I said "seen".

Do you use the assignment of the "Z" value to cause a tri-state? Quartus doesn't compile them.

We were talking of the tools that are free or low cost that come from the makers of the chips.

I really don't care which chip is the cheapest. In the market I'm in, nearly all electronic designs are a "cost is no object" sort of thing. The first 3 tings on the list are reliability. Then comes low power and light weight.

- A
- already5chosen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 28, 2008 1:50 PM

When you assign "Z" to external pin - Quartus compiles it very well, thank you. On the other hand, internal tristate nodes are not supported by Altera architecture - how would you expect Quartus to synthesize them in this case? Infer muxes?

- M
- Muzaffer Kal
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 28, 2008 4:09 PM

Actually MaxplusII had support for converting internal tri-states to muxes. That feature is removed in Quartus?

- A
- already5chosen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 28, 2008 9:39 PM

We never used Altera-provide VHDL or Verilog synthesis on MaxplusII - only Altera's AHDL or 3rd party synthesis for standard HDLs. So I have no idea how Altera's synthesis on MaxplussII handled 'Z' - AHDL simply doesn't have it. On the other hand, on QuartusII we never ever used 3rd-party VHDL synthesis tools since Altera's own are good enough. For me, it proves that VHDL front end was completely redone during M->Q transition. I vaguely remember reading (on c.a.fpga? ) that they bought a front end from somebody else. Does a new synthesis infer muxes for internal 3-state nodes? I don't know. I nobody here tried to code this way.

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 29, 2008 12:03 AM

Portability is also highly dependent on special features and IP blocks used. Some can be quite difficult to port. Older synthesizers didn't infer complex logic functions well so people relied on IP cores for simple things. There are loads of reasons designs may be difficult to port, though it's not often necessary (or considered during design).

You may think it's normal, but I've not seen it actually done. The project I've been involved with are heavily weighted towards engineering costs. The cost of the FPGA, while pretty impressive on the face of it, is peanuts compared to the total system cost.

--
Keith

- K
- krw
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 29, 2008 12:03 AM

d

Does it have a way to direct the synthesizer to infer muxes? Last=20 time I ran into this (I no longer code 'Z's for anything other than=20 I/Os) was moons ago when I was using Synplify. Synplify could be=20 directed to infer tristates or muxes.

Usualy false economics. =20

Do you really care about the cost of the tools then? BTW, so far=20 I've been in the same boat (cost is no object) but it seems that=20 I'll have to try on the other boot soon.

--=20 Keith

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 29, 2008 12:05 PM

Perhaps you have read it. That is one strange format though. Essentially using the sign bit in combination with the mantissa to squeeze one more bit of resolution out.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 29, 2008 12:11 PM

exponent format with a lot more bits, like a

after each operation, unless you can fake it

bits that would have been shifted away

The compiler itself does not need to define one, the language itself allows you to define one. IIRC the IEE-754 also requires definition of the 64-bit form as well. Aren't the exponent offsets fun as well?

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 29, 2008 12:44 PM

On Aug 29, 8:05 pm, JosephKK wrote: [. IEEE 754 ....] [...]

I had seen the implied bit idea elsewhere. The idea of using up codes to specify things like NAN, +INF and -INF are things that I have found myself disagreeing with. I have never liked "in band signaling". That is the only part of the standard I found to be strange.

I had to convert some all positive fixed point numbers to IEEE so that is the only part I really spent much time on.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 29, 2008 12:58 PM

Have you done it? I am surprised if they have fixed it. I had to recode to use the tri() to get the tri-state pins to work. I had defined a component with the tristate pins that I connected directly to the pins of the chip. The compiler would choke on it. The same code compiled on Cypress's "warp" and produced exactly the tristated pin I expected.

I didn't do this but yes a compiler could easily turn it into something like this to send to the fitter:

Y = X1 & EN1 # X2 & EN2 # X3 & EN3;

There is no need to make a full mux because there should only be one assignment of a non-Z value at a time and thus only one EN# would be true.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 29, 2008 1:12 PM

On Aug 29, 8:03 am, krw wrote: [....]

When the total sales of something is likely to be 10 units, a 100K tool just for that project does make cost an object but this isn't the only nor really my biggest problem with them.

Imagine the product has been in production for 10 years and somebody discovers a (gasp) bug. You now have to fire up that old tool and fix the bug. With rented software you can't just do that this means that there will be the urge by management not to fix that bug and a delay in fixing it.

Imagine that the vendor for a chip is standing in front of you with a chip that he claims will implement your logic equations. You have your equations and he has his chip. Until you have compiled them and put them into the chip, there is no proof that the chip will really do the job. Needing to pay a lot of money just to find out it won't work isn't something I like to do.