Most engineers still don't realize the processing power of the FPGA. True, today's processors operate at high clock frequencies. People don't truly understand how much of that raw speed is lost to processor overhead. I created a FPGA process for processing real time 1280pixel
32bit camera scans to identifiy the leading edge of incoming documents, determine the skew angle and rotate the image in realtime while the image was still being scanned. This process originally required 8 High speed DSP's with significant propagation delay and large amounts of memory. I have also converted many software processes to the FPGA enviroment. Each have had dramatic improvements in speed that no processor could ever come close to matching.
Here is a short list of problems your software program may be experiencing.
- Almost 25% to 50% your speed is lost to opcode and memory access. This number varies depending on CPU cache efficiency. 2. Depending on the software program size and memory access. You could be forcing excessive cache dumps and reload which can reduce speed another 10 to 50%. 3. You must then content with program efficiency. Is the program optimized for speed. 4. Last, the efficiency of your software compiler. Is it using extensive use of libraries or inline code.
When a program is converted to hardware you eliminate items 1, 2, and 4. You are left with program efficiency, how well it is translated to hardware.
The first thing to do is reorder the statements in the program. Section the program into stages and identify the loop/repitition structure. Each stage has a dependency on the previous computation. This will usually lead to each stage having multiple computations. These can be executed in parallel in the FPGA and many equations can be executed in just one FPGA clock cycle. After reordering the statements in the proper order for hardware conversion, recompile the program and run to insure in still functions correctly. This is now your basic template for conversion.
If you use large amounts of data in the program beyond the capacity of the FPGA, you will need to create a multi channel DMA controller w/cache. This controller will provide access to each stage needing external memory.
Second, when using decimal calculation, determine the maximum decimal error. Floating point offers many advantages but slows down computation, consume large number of resources and add to the overall complexity of the hardware. You should use fix point computation if possible.
Fix point decimal accuracy (decimal portion not including integer size)
1 byte = 2 decimal places 2 byte = 4 decimal places 3 byte = 7 decimal places 4 byte = 9 decimal places 5 byte = 12 decimal places 6 byte = 14 decimal places
Xilinx devices have built in multipliers (18x18) which are fast and save hardware. Division is possible but uses more resources.
A spartan3 can do the work. The Virtex 2, and 4 are faster and offer more memory and multipliers with the addition of CPU's to help with other tasks. A DSP does provide avantages over a standard processor but does not compare to the raw power of the FPGA.
snipped-for-privacy@charter.net