I was wondering if anyone had experience with using combinations of FPGA based CPUs and surrounding logic to perform iterative algorithms. For instance, if we want to implement different types of more complex computer vision algorithms in an embedded system, we may wish to use the parallelism of an fpga to do multiple parts of a 2d convolution or matrix operation in parallel. While the FPGA may be able to handle the number crunching requirements of a given algorithm, it seems to me to be ill suited to handle the iterative (often non-systolic) nature of many advanced image processing algorithms. Often more complex computer vision algorithms seem to be too complex to be handled solely by FPGA based logic.
I was thinking of the case were we have an FPGA connected directly to a video source, and data is flowing into the system at some fixed rate. We may wish to process this data at several scales, and iteratively search the low scales up to the higher ones until we have found features of interest in the video stream. Perhaps we wish to mark those features by altering pixels in their local neighborhood.
We may need to iteratively process multiple scales of image data and buffer the original video frame in off-FPGA DRAM, since there will not be enough on-FPGA BRAM to store full images. Once we find the region of interest, we may then wish to retrieve the original to be marked and then sent off as output video. A good example of this process might be, say, face detection.
It seems to me that the iterative nature of these kinds of algorithms needs to be handled by a combination of CPU and FPGA logic. The FPGA handling the number crunching and parallel data paths, and the CPU handling the notion of when to iterate, or when to stop, or in general, what decision to take next based on the results of the FPGA's number crunching. The CPU could be built from programmable logic, or placed off-FPGA.
Does anyone have experience with this kind of thing, or know of somewhere I might be able to find more information about optimal ways of coupling heterogenous processors?
I am aware of Altera's C2H compiler, but have not used it, and don't know how optimally it combines FPGA/CPU resources. I might be in the market to hire a consultant, if one were knowledgeable in this area.