I am doing some embedded video processing, where I store an incoming frame of video, then based on some calculations in another part of the system, I warp that buffered frame of video. Now when the frame goes into the buffer (an off-FPGA SDRAM chip), it is simply written in one pixel at a time in row major ordering.
The problem with this is that I will not be accessing it in this way. I may want to do some arbitrary image rotation. This means the first pixel I want to access is not the first one I put in the buffer, It might actually be the last one in the buffer. If I am doing full page reads, or even burst reads, I will get a bunch of pixels that I will not need to determine the output pixel value. If i just do single reads, this waists a bunch of clock cycles setting up the SDRAM, telling it which row to activate and which column to read from. After the read is done, you then have to issue the precharge command to close the row. There is a high degree of inefficiency to this. It takes 5, maybe 10 clock cycles just to retrieve one pixel value.
Does anyone know a good way to organize a frame buffer to be more friendly (and more optimal) to nonsequential access (like the kind we might need if we wanted to warp the input image via some linear/nonlinear transformation)?