codejk,
What kind of floating-point number is it? IEEE754? Single or double precision?
Are those the actual dimensions of your matrices?
It looks to me like you have 9 multiplications, 6 additions and 2 divisions to do every 80ns. I work this out to be 212.5MFLOPS.
According to:
formatting link
The best performance you're going to get from a FPU-enabled microblaze running on a V4 is 33MFLOPS (clocking at 200MHz). I work out about
22-23 MFLOPS being the most you would get for your particular instruction mix. Reduce this again for the lower performance of the Spartan 3, then you're looking at being a long way off from what you need.
So, FPU-enabled microblaze is a good suggestion, but when you factor in the 'result every 80 ns constraint' it's unfortunately not viable.
All is not lost though, even with Spartan devices these days you can get well into the GFLOPS in terms of floating-point performance. To do it you need to take advantage of the vast real-estate and data-throughput capabilities of the devices.
What I would suggest are floating-point cores. Pipelined, and with multiple instantiations. With these you could easily obtain the performance you need.
You would have to pay for the IP, but once you had it you could very easily achieve your design with minimum headaches.
Of course, if you don't have the cash to fork out on this, then you'll have no choice but to look at another way of doing it, fixed point for example...
Tell us how you get on,
Robin