Hi,
See below.
J The multicycle instruction always take multiple cycles. This is due to the pipeline of MicroBlaze. MicroBlaze has only 3 pipestages, Instruction Fetch (IF), Operand Fetch (OF) and Execution Stage (EX) Thanks for the explaination.
The current MicroBlaze is a good tradeoff between area and performance. Sure.
The 950 LUT figure includes the basic features no caches or debug. The caches is quite cheap on LUTs, around 50 LUTs for the instruction cache. The cost is that BRAM is needed to handle the caches. Does "basic features" include the h/w divider? I've been trying to reproduce the quoted Dhrystone figures on the simulator, and only get
0.63 MIPS/MHz without it. If I add it, I can get 0.77.
To get 0.8 MIPS/MHz, you need to enable the HW divider. The size of the HW divider is around 60-80 LUTs. I can't remember correctly but the implementation is a basic shift-compare design which only needs a compare block and a shift block. The divide will take 35 clock cycles. 2 clock cycles to setup the operands, 32 clock cycles for the division and 1 clock cycle for writing the result.
It seems strange that on the Web page
formatting link
the Spartan 3 is rated at 0.8 and the Spartan II is rated at 0.65, yet they are both listed as requiring the same number of logic cells. I would presume that either the performance figure for the Spartan II is too low, or the number of logic cells required by the Spartan 3 and Virtex II's to acheive the quoted figure is actually higher.
The difference is that S3 and VII has embedded multiplier so MicroBlaze will have a HW multiplier while the S2 doesn't have the HW multiplier so multiplication is done using SW (which takes many more clock cycles)
Incidentally, I've been trying to get the Dhrystone numbers for NIOS as well. Can anybody clarify if their instruction set simulator is cycle accurate? If it is, the figures appear to be 0.64 for a 32-bit implementation and 0.15 for a 16-bit implementation, but I have a feeling that this should be lower.
Cheers, JonB