Systolic array architectures are commonly used for image/video compression hardware blocks (e.g. convolution filters, motion estimation, etc). I loosely have an idea that this is because they are efficient at reusing the data, and thus reduce memory accesses in comparsion to say a custom designed high throughput singular processing element. Would this be generally considered the princicpal benefit and are there other benefits?
I have read that they are considered "i/o bandwidth efficient", I guess thats just another way of saying what I've just outlined above?
Is there ever scenario's where the area and switching overhead of a systolic array would warrant a less bandwidth efficient, more serial approach - or is that just plain ridulous to consider? For example could you hope to trade less switching in the datapath for increased switching in the memory accesses but still make an overall reduction in switching?
Looking forward to any comments/flames/rants :-)