Hi,
I am trying to decide which of these two technologies to invest a good amount of time into, and I have not found much of use on the net - there are lots of figures, but comparing apples to oranges is not much use :(
What I am looking to do is probably a typical use for SSE2, basically image processing and compression (DCT, motion estimation, others). The problem is trying to find comparisons between high-bandwidth (eg: Hyper-transport or DDR DIMM interface) FPGA implementations of this sort of operation, and SSE2.
If I have to put the FPGA onto a PCI bus, I very much doubt it could possibly compete, but if it was possible to integrate into the Northbridge, it may be a whole new ball game.
I've run the testbench for Intel's SSE libraries, and it seems to be pretty stable at ~370 clocks/pixel when doing a 720x480 '422' baseline JPEG at 75% quality. This gives 0.04 secs per image at 3.6GHz. What I'm trying to figure out is (assuming I had the data bandwidth via a low-latency connection) whether I'd be able to better that using an FPGA. Any takers ?
I can find figures for parts of the JPEG algorithm (eg: Xilinx have an
8x8 DCT that will push 140 Mpixels/sec if pipelined), which is a large part of the problem, but I'd like to see if the RLE/Huffmann encoding took time as well (I'm aware it could/should be pipelined after the DCT- something the P4 can't do - I'm just not sure how much extra overhead it would be)
So, anyone got any figures for a best-case FPGA implementation ? They'd be much appreciated :)
John