There have been some previous threads concerning problems with the OCM bus, like these two:
-- "Sharing BRAM between Xilinx PowerPC's (on data-OCM ports)"
- also titled ""Sharing BRAM between Xilinx PowerPC's (on data-OCM ports)"
To echo what has been discussed in these threads, our research team has experienced several issues with the OCM bus. Specifically, we have been using the BRAMs connected to the OCM bus with an EDK-provided OCM Controller version 3.00.a on the Virtex 2 Pro family. While these issues did not manifest themselves on the XC2VP30 part (on the XUP board) for this project
Bug 1 ==== PowerPC 405 pseudo code .... store @ address A with data B to OCM BRAM couple ALU ops (setting up load/store base addresses in registers) store @ address C with data D to OCM BRAM some ALU ops (setting up load/store base addresses in registers) store @ address E to PLB to our PLB pcore (300 cycles - TCC cache in our design) load @ address A, returns data D from OCM BRAM
Issue: In this scenario that the OCM load returns the last OCM store, regardless of the address that was written and read. For some reason, the OCM bus and/or BRAM controller is buffering/caching the last value stored. We have only observed this buffering taking place when we have the 300-cycle PLB store. A shorter PLB store does not reproduce this effect.
Workaround: We inserted a synch instruction after the PLB store.
Bug 2 ====
range for the OCM BRAM would appear on the PLB bus. Unfortunately, since there is no slave device on the PLB bus within that address range, we get a PLB bus error. We discovered this problem using ChipScope.
Workaround: We could not find a software-based workaround as for Bug
1, so we moved the OCM BRAMs to the PLB bus.Bug 3 ===== pseudo-code loop_start: data = load from A # port A of BRAM connected to OCM controller if(data == flag) goto exit # write is done thru port B, connected to our pcore goto loop_start exit:
Issue: Sometimes the processor is stuck in this loop forever, even though the BRAM entry at address A has been written by our pcore using the BRAMs other port with the flag (user_switch in figure 4 a in this paper:
Workaround: We implemented a high-level software time-out mechanism such that after the processor loops for a certain period of time, it sends a retry notification to flag-setter. This solution works reliably.
Summary: Eventually, we removed all the OCM buses from our design and then we migrated the BRAMs to the PLB buses. Using PLB BRAMs removed all three bugs without the insertion of the software workarounds we previously developed. While we cannot comment about the OCM interface on Virtex 4 or 5's, we recommend against using the OCM BRAMs on the Virtex 2 Pro family (especially on the XC2VP70 parts).
Performance Note: We initially chose to use the OCM bus because of its short latency as compared to the PLB. To compare the latencies of using BRAMs on the two buses, we wrote the following test code in PowerPC assembly:
turn on instruction cache (use PLB BRAM to store code) # loop code fits inside PowerPC cache set loop iterator register # 32k iterations set base address start PowerPC timer loop_start: load ... load # 100th load bdnz loop_start # decrement loop iterator and goto loop_start if not less than zero read PowerPC timer cycles per load = timed cycles / (number of iterations * 100)
The following are the results of OCM and PLB BRAM controllers: OCM load = 2 cycles OCM store = 2 cycles PLB load = 11 cycles PLB store = 8 cycles
Stores take less time on the PLB since the ack is asserted as soon as the data arrives at the controller, whereas for the load the ack takes place when the data is read from BRAM. Also, turning on the instruction cache shortens the latencies.