graphic card accelarator vs. FPGA: which is better for the following task?

Dear all,

I guess this is a ray-tracing problem... But I need to do this task in as high as possible speed/throughput. Here is my problem:

Suppose I am given 25 rays and I am given a 3D cube and all parameters of these rays and cube are given...

I need to compute the length of the intersecting segment of the rays with this cube as fast as possible. If some rays completely fall outside of the cube, then it outputs 0, otherwise gives the length.

I heard there are some very good graphic card with accelerator... and I heard about the bus bandwidth to be as high as 500MHz... I am not sure if they have good accelaration function for doing my task?

I also think of doing this using an FPGA which is hooked onto a Intel PC with Linux... I don't know the details, but I guess it uses PCI or other bus to interact with the CPU and serve as an coprocessor...

I want to know which method is better?

Considering that after solving this throughput problem, the next bottleneck will be a 1GB memory that I need... I wonder if the graphic card has 1GB cache/memory inside it? Since a lot time it needs to do triple-buffling, I guess... it should have a high speed huge memory, right?

I also don't know what is the maximum processing speed of a high-end graphical card comparing with a high end FPGA implementation?

Can anybody give me some comments/suggestions/advice/hints/pointers on this?

Thanks a lot,

-Walalal

Reply to
walala
Loading thread data ...

Hi!

- Are the vertices of the cube parallel to the coordinate axles?

- Is the anything special about the cube (size, orientation, rotation, location, etc.)?

- In what format are the rays and the cube defined?

Possibly no. First, you would have to get data *back* from the accelerator which is something they are not designed for. As someone said to me once, they operate in a 'write and forget' mode. Second, none of the accelerators I know of do ray-tracing. However your question is not a complete ray-trace problem, so you might be able to tweak the functions of an accelerator to give you your answer.

bus

That's a possiblity. You can find PCI FPGA prototyping cards for this purpose.

BTW, if you need to process 1GB of data (assume that's the total amount of traffic) you would need at least at least 7.75 seconds just to transfer the data over a 33MHz PCI bus, not counting other PCI traffic, and other issues. If that's too slow, you would need a) 66MHz b) 64bit c) PIX-X bus and of course a PC that supports these.

Depends on many things:

- Speed requirement (as fast as possible is not enough)

- Numerical Precision

- Price concerns

- Project deadlines

- More precise description of the problem (see above)

bottleneck

None I've heard of.

Huge means 128-256MB nowadays.

Impossible to answer in general.

this?

What I would suggest is to write a SW only solution for your problem. That would give you, or anyone else a pretty good definition of the problem. After this, you probably will be able to state much better questions.

Regards, Andras Tantos

Reply to
Andras Tantos

Hi, Andras,

Thank you very much for your answer!

I guess the first thing I need to make myself clear is that what is the essence of this problem? Is it a ray-tracing problme or collision detection problem?

I need to identify the name of the problem first then I can go out and search for similar application cases...

Can you help me on that?

Thanks a lot,

-Walala

Reply to
walala

Hi!

I would think your problem is an intersection problem, but only you can find out the true nature of your problem.

Andras

Reply to
Andras Tantos

Hi!

I would think your problem is an intersection problem, but only you can find out the true nature of your problem.

Andras

Reply to
Andras Tantos

Let the cube be given by three normal vectors n1 to n3 and six points p1 to p6 on the six planes. (Actually you can use the same point multiple times) Assume your rays start in the origin and are given by a vector r of length 1. Then the interscetions with the first plane happens at a distance d of d1= n.p1/(n1.r)= n.p1 * (1/(n1.r)) See

formatting link
Then you order the planes according to d. If the ray does not cross the three front planes first, the cube is missed, otherwise the difference between the fourth and the third distance is the length of the intersection.

So for each ray you get three devisions, four multiplications and a couple of minmax cells. (Many ore optimizations due to symmetries possible.)

With integers you should be able to do that in a small Spartan-III in a pipeline a lot faster than you can get data into the chip.

With floating point numbers it should be still very fast in an FPGA, but the design gets a lot more complicated and larger.

Have fun,

Kolja Sulimma

Reply to
Kolja Sulimma

Graphics cards use AGP, now x8 that means AGP x8 interface for 2.1 GB/sec bandwidth Since this is MUCH faster than PCI bus they handle huge amounts of data a lot better.

BUT suppose someone builds a AGP x8 board with a fast FPGA. Then you suddenly have reduced the data transfer bottleneck.

Suppose you use a Xilinx Pro, or add a PowerPC as an option together with monitor out circuits. Then you will get quite an interesting board that could be used to run the X server for Unix/Linux :-)

But you can still buy PCI graphic boards...

/RogerL

--
Roger Larsson
SkellefteƄ
Sweden
Reply to
Roger Larsson

formatting link

Thanks a lot, Koja,

Very informative,,... I need to digest your answer...

-Walala

Reply to
walala

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.