data OCM BRAM Issues

There have been some previous threads concerning problems with the OCM bus, like these two:

formatting link

-- "Sharing BRAM between Xilinx PowerPC's (on data-OCM ports)"

formatting link

- also titled ""Sharing BRAM between Xilinx PowerPC's (on data-OCM ports)"

To echo what has been discussed in these threads, our research team has experienced several issues with the OCM bus. Specifically, we have been using the BRAMs connected to the OCM bus with an EDK-provided OCM Controller version 3.00.a on the Virtex 2 Pro family. While these issues did not manifest themselves on the XC2VP30 part (on the XUP board) for this project

formatting link
we experienced them on the XC2VP70 parts (on the BEE2 board). To get an understanding of how we have our design is structured, please refer to page 4, figure 4a of this paper:
formatting link
(more info about our project is
formatting link
In all we saw three bugs:

Bug 1 ==== PowerPC 405 pseudo code .... store @ address A with data B to OCM BRAM couple ALU ops (setting up load/store base addresses in registers) store @ address C with data D to OCM BRAM some ALU ops (setting up load/store base addresses in registers) store @ address E to PLB to our PLB pcore (300 cycles - TCC cache in our design) load @ address A, returns data D from OCM BRAM

Issue: In this scenario that the OCM load returns the last OCM store, regardless of the address that was written and read. For some reason, the OCM bus and/or BRAM controller is buffering/caching the last value stored. We have only observed this buffering taking place when we have the 300-cycle PLB store. A shorter PLB store does not reproduce this effect.

Workaround: We inserted a synch instruction after the PLB store.

Bug 2 ====

From time-to-time, a load or store with an address within the address

range for the OCM BRAM would appear on the PLB bus. Unfortunately, since there is no slave device on the PLB bus within that address range, we get a PLB bus error. We discovered this problem using ChipScope.

Workaround: We could not find a software-based workaround as for Bug

1, so we moved the OCM BRAMs to the PLB bus.

Bug 3 ===== pseudo-code loop_start: data = load from A # port A of BRAM connected to OCM controller if(data == flag) goto exit # write is done thru port B, connected to our pcore goto loop_start exit:

Issue: Sometimes the processor is stuck in this loop forever, even though the BRAM entry at address A has been written by our pcore using the BRAMs other port with the flag (user_switch in figure 4 a in this paper:

formatting link
Both ports are using the same clock (100 MHz) and clock edge. Moreover, we have conflict resolution logic in our pcore such that if the pcore detects the assertion of the port A's enable signal, it stalls the write going to port B until port A's enable signal is de- asserted. In spite of this, 1 out of a million setting of the flag variable does fails (i.e. the processor is stuck in this infinite loop). At first, we suspected that the datapath to our pcore or the pcore was failing, but we verified that the datapath was bug-free. Bug

1 led us to suspect that that in the cases that there is a failure, the OCM bus or BRAM controller is buffering variable data so that it misses the update from our pcore.

Workaround: We implemented a high-level software time-out mechanism such that after the processor loops for a certain period of time, it sends a retry notification to flag-setter. This solution works reliably.

Summary: Eventually, we removed all the OCM buses from our design and then we migrated the BRAMs to the PLB buses. Using PLB BRAMs removed all three bugs without the insertion of the software workarounds we previously developed. While we cannot comment about the OCM interface on Virtex 4 or 5's, we recommend against using the OCM BRAMs on the Virtex 2 Pro family (especially on the XC2VP70 parts).

Performance Note: We initially chose to use the OCM bus because of its short latency as compared to the PLB. To compare the latencies of using BRAMs on the two buses, we wrote the following test code in PowerPC assembly:

turn on instruction cache (use PLB BRAM to store code) # loop code fits inside PowerPC cache set loop iterator register # 32k iterations set base address start PowerPC timer loop_start: load ... load # 100th load bdnz loop_start # decrement loop iterator and goto loop_start if not less than zero read PowerPC timer cycles per load = timed cycles / (number of iterations * 100)

The following are the results of OCM and PLB BRAM controllers: OCM load = 2 cycles OCM store = 2 cycles PLB load = 11 cycles PLB store = 8 cycles

Stores take less time on the PLB since the ack is asserted as soon as the data arrives at the controller, whereas for the load the ack takes place when the data is read from BRAM. Also, turning on the instruction cache shortens the latencies.

Reply to
Nju Njoroge
Loading thread data ...

As an update on the two threads linked below (that I started), I never did find a way to use the OCM successfully in my project. I just attach everything to the PLB and have to live with the increased latency.

The Xilinx Answer Record #14052 (Virtex-II Pro PowerPC 405 errata) has some interesting details on this problem. This is the problem I experienced, and it sounds like the same issue for the Stanford TCC group as well. See Solution 11 (CPU_212) on the page, which has the following description:

-- While waiting for a Data Side PLB (DSPLB) load to complete, the PPC405 Core might ignore a valid store completion from Data Side OCM (DSOCM) when a particular sequence of operations occurs. This condition can occur in a system using both DSPLB and DSOCM interfaces. This condition can cause the PPC405 to hang or can result in incorrect values for registers in these operations.

--

You can also see the IBM version of the errata at ftp://ftp.xilinx.com/pub/documentation/misc/ppc405f6v5_2_0.pdf which has different details on the same problem.

It's a shame the OCM's are buggy in this regard. It really makes them unusable for any kind of serious project that uses both the PLB and OCM. As I recall, the same errata note is present in the Virtex-4 parts. Perhaps the Virtex-5 FX will finally fix this?

Jeff Shafer

formatting link

formatting link

formatting link

Reply to
Jeff Shafer

Thanks for the update on your situation and also for the links to the Answer Records and IBM's PDF. Hopefully, others will fare better than we did.

Nju

Reply to
Nju Njoroge

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.