RAM <-> Processor

- C
- colin
  
  Contact options for registered users
posted
16 years ago

Sun, Aug 12, 2007 7:47 PM

Hi, im wondering at the ever increasing power requirments for processing especially graphics related.

It seems the move is towards paralleling the procesing such as dual/quad core with cpu and parallel pixel fill in gfx engines.

wich seems reasonable.

but the way they do it is to have the all the ram in one place and all the processing in another, wich leaves a remarkable bottleneck inbetween.

the incredible high speed of the interface must be power hungry.

transputers springs to mind but these needed external RAM. would it not be possible to integrate the RAM and processing on the same chip ?

some mcu im using have 32k ram. its so much much more efficient to use internal ram than to use via an external bus.

it would need a lot more RAM for gfx processing, although im using less than this to bitmap drive a 1/4 vga lcd. although say 100 16 bit micros with 64k internal video ram each might be quite clever.

if you could put them all onto just a few dies youd have a fair amount of redundancy.

some fpgas have 1M gates ive noticed, if this could be translated into ram area this would be interesting.

PC graphics cards have 128mb or more, much of this I assume is for textures, this might be difficult to distribute.

Colin =^.^=

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Aug 12, 2007 8:11 PM

To the degree that a graphics application is parallelizable, you might see some benefits of lots of small, local memories and lots of processors... though how these are then coupled back to a video output is another issue. For example, fractals often fit this requirement. But for general game use, if that is your concern, things haven't been designed that way and it may be difficult to arrange.

On the question of textures, I remember a time when Intel was intending a new AGP interface (it was new, back then) that would operate very much like a fast PCI... but with special relaxations on the way data is transported so that the transfers could be streamlined better. The idea was to remove the need for so much memory on the graphics boards and to rely instead upon the system memory for the textures (a use that would then compete with an operating system's use of memory) and relax the need on graphics boards for so much specialized memory. I don't think this ever really took off the way Intel intended it, instead just becoming a fast graphics bus but in no way relaxing the add-on memories in graphics boards.

There is tremendous market pressure available to provide the financial drive for a successful business, available to anyone who can do things better/cheaper than is already being done. But any such initiative will have to deeply understand the needs of those who are already paying for the high end and how to achieve those results at a lower price and/or provide additional features that people are willing to pay for, at a price they are willing to pay.

Jon

- C
- colin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Aug 12, 2007 8:56 PM

thanks, getting the video output probably isnt that difficult, you could have a tree of multiplexors, for PC games gfx your probably right that any advance needs to be compatable with existing graphics engines wich would rule this out to a large extent, but maybe for small consoles wich are running quite hot these days, and hard to squeze the procesing power in a distributed apraoch would be able to be used in a start from scratch aproach.

it neednt be restricted to just pixel filling too, the 3d framework processing etc could be done in closer proximity. thereby easing another bottleneck of main cpugfx cpu. the cpu makers seem to have admited that significant performance increases requires parallelling but this never seems to have taken off, its far more dificult to write software to take advantage, but maybe it will if theres more pressure to do so.

you could just simply have a system full of cpu+ram blocks. adding more as you need to just like adding more ram to a normal PC.

Transputers never realy took off wich always dissapointed me, theres far more to something catching on than just being a very good idea.

I was more wondering if there was a technical reason, like the difference in layer types between cpu and dynamic/static ram etc.

It would be nice to play around with possibilities and see if the energy cost of performance comes out much better.

Colin =^.^=

- S
- Stanislaw Flatto
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Aug 12, 2007 9:52 PM

You have 'something' here. Not long ago I attended Linuxfest in which a rep of computer branch of large industrial company told the crowd of mostly young computer users that a symetrical array of 1024 P3's out performed the mighty "Cray" on speed of calculations. But to write the software you need brains, and this maybe the stumbling block.

Have fun.

Stanislaw Slack user from Ulladulla.

- D
- David L. Jones
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Aug 13, 2007 12:13 AM

You can get FPGA's with many Mb of memory internal *as well as* the sea of logic gates.

Look at say the Xilinx XC4VFX140 as an example:

formatting link

It has almost 10Mbit of 500MHz block RAM, 1Mbit of distributed RAM, DSP slices, and a Power PC 405 processor in addition to all the general purpose and specific use logic gates.

Dave.

- C
- colin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Aug 13, 2007 1:43 AM

wow thats sounds cool, il have a look at that :)

maybe il throw a few dozen of them together and see what they can do...

Colin =^.^=

- H
- Helmut Sennewald
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Aug 13, 2007 6:48 AM

Hello Colin,

This chip is as expensive as a complete PC. The PC will outperform the XC4VFX140 by a factor of 10 to 100 when it comes to pure 64bit floating point math. Only if you have a very specific task in mind where you can use parallel fix-point multipliers or other logic, then such an IC will bring you an advantage.

Best regards, Helmut

- C
- colin
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Aug 13, 2007 10:08 AM

Hi Helmut, yes they are very expensive, I found one from RS at £3000. I think any test will have to be done virtually. It might be a strain on a PC to simulate a bunch of these. I think even SWCAD might struggle here.

However im interested in what can be acheived, not having a specific goal in mind.

I dont think the sort of processing I was talking about would have much in the way of 64bit fp math, although I dont know in detail things such as graphic processing or video codec, however pixel fill seems to be the most demanding on the ramprocessing bottleneck.

its this bottleneck that I was interested in exploring. the way more processors seem to be apearing on the same cpu chip this just puts more strain on this bottleneck.

a chip wich has integrated ram and processing would obviously be a good thing to look at.

A fpga is very flexible but cant compete with custom chips. it would only be usefull to show the potential.

If a solution was found that had the potential to be mass produced and used in place of ordinary RAM it would be rather interesting, however for now I would think it would stay just an interesting idea for the same reasons transputers have done so.

its a chicken and egg situation, only if it catches on in a big way will it become mass prduced enough to be cheap, and only if its cheap will it catch on.

The reason this came to my interest is that I had to replace a 450W PSU for my PC, I was advised that if i wanted to upgrade my PC I might need a bigger power supply. design improvements is sort of my interest.

I do use my pc for some games, it releives some of the stress of trying to design high perfomance stuff lol.

Colin =^.^=

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Aug 13, 2007 1:50 PM

[=2E....]

What you really want is a way around the bottle neck and not a way to explore it. Perhaps even a way to completely bypass the town the bottleneck lives in would be an area worth exploring. I will suggest one idea below. I doubt it it the best but I think it would help to show what I mean:

The video to the display will be RGB and sync information. Within the processor, it traveles as (RGB) and (XYZ) where the Z component represents the distance back from the face of the CRT the obect is to appear to be. The display information progresses through a pipeline like this:

RGB=3D0 ------ RGB ------ ------

---------

------------>! Unit !------->! Unit !--.. etc ..-->! Unit !-->! Display ! XY Z=3Dinf ! 1 ! XYZ ! 2 ! ! N ! ! screen !

------------>! !------->! !--.. etc ..--

Angle ! ! ANGLE ! ! ! ! ! !

------------>! !------->! !--.. etc ..--

------ ------ ------

---------

Each unit examines the Z value on its input, if the input Z is greater than the Z of the portion of its object that is at this X,Y location, it replaces the RGB and Z values with its own.

The angle information travels as precalculated sin() and cos() etc values. It really also contains the point of view information. Each unit constantly has to figure out what the object it is in charge of would look like from the viewers point of view.

The main CPU of the system would no longer have to make up the pixels of what is to be seen but would instead have the job of working out the motions of the objects on the screen and updating the display units. This would happen over some sort of bus.

be

or

er