Use Xilinx VP20 with 2 ppc and one DRAM chip

- T
- T Lee
  
  Contact options for registered users
posted
19 years ago

Sat, Oct 9, 2004 5:17 AM

What's the easiest way to partition one 128 MB dram into two 64 MB sections with each section control by one of the ppc in the VP20?

Another way to ask this question:

Is it possible to use the same plb_ddr_sdram core or more plb_ddr_sdram core to control the same pysical DRAM interface while partition the DDR to share by both ppc?

Also if I like to have some special fpga logic to

offload some special ppc packet processing functions * the functions require access the DDR used and update by the ppc CPU at the same time.

what's the best way to do this?

-Tony

- J
- John Williams
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Oct 11, 2004 2:03 AM

I don't know any easy ways - but there are at least two fairly hard ways:

1 - as you suggest, somehow mux the physical off-chip interface between two independent controllers. This is going to be tricky to say the least, and probably quite inefficient, depending on how much L1/L2 cache you have to play with. You'd need some sort of arbiter that can selectively stall one of the controllers, and switch the mux in such a way that it all works seamlessly. Maybe not as hard as a I think, but surely not easy. 2 - implement a classic SMP (symmetric multiprocessing) architecture with the two PPCs tied to a common bus, bus arbiters, a single memory interface and so on. I'm not sure how the PPC architecture typically implements SMP, and if the V2Pro / PLB can support it without radical effort.

Also this results in all memory being shared (at least at the physical level, you could use the VM capabilities of the processors to make it look like they are distinct).

If what you really want is two completely independent PPC systems it could be easier to build/buy a more suitable board. Take a look at the Insight/Memec V2P20/30/50 boards, they have dual independent SDRAM banks for exactly this purpose.

formatting link

There may be others as well.

Interesting questions, with no easy answers. If you make your custom processing logic bus master capable, then in principle you could use DMA to handle all of this. You need to be careful that you don't end up in a bus-bound mess, losing any performance gains that might otherwise be achieved with the hardware offloading in the first place!

It depends on the nature of what you are trying to do - if it is heavy on the computation, and light on the communication, then it's not so bad. However, if the opposite is true, you will end up saturating the bus.

Regards,

John

- P
- Peter Ryser
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Oct 11, 2004 4:16 AM

Tony,

if you do not have any specific requirements (like "either one of the two processors must see its part of the memory at address zero") it is straight forward:

1) hook up both PPC to the same PLB together with the PLB SDRAM controller 2) write seperate linker scripts for each of the two processors. The first processor uses the memory in [0,64[ and the second one uses [64,128[.

- Peter

- J
- John Williams
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Oct 11, 2004 5:50 AM

Nice solution - what will the caches do in this configuration? Duplicate / thrash? Get incoherent?

John

- T
- T Lee
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Oct 11, 2004 2:15 PM

John, Peter, Thanks both for reply.

Caches will be fine for the application I have in mind, since each PPC will run different application without share any common data structures.

Worst case, I can use volatile with the data structures access and also design the code/structure that one always read, the other always write.

For an application that needs to process 1-2 millions packets per seconds, process = * look at the headers and some data structures and make forwarding decision, * append packets descriptor to various Qs. * A TX engine, check all Qs and tx them out to RIO base on time and QOS setting, update counter values in the DDR.

The packets is already stored in a different DDR. This process only need to controls the DMA descriptors of the packets at the rate of 1-2 millions per second.

Thanks again for both of your comments.

-Tony

- E
- Erik Widding
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Oct 11, 2004 5:36 PM

The caches for the two processors are fully independent and located within the individual PPC cores. So the behavior of them will be completely independent. You are correct, the instruction caches will duplicate, and the data caches will risk incoherency if one is not careful. The PPC architecture offers for a great deal of cache control from software, so with careful planning these issues can be dealt with.

We just wrapped up a 2VP20 design utilizing both processors, where minimizing PLB cycles was of the utmost importance. In the software, we had a number of different types of data that we were dealing with:

70% random byte reads, with data accesses almost always near one of the last 320 bytes accessed 20% sequential byte writes, never needing to reaccess this data again 10% sequential word reads, never needing to access this data again

- P
- Peter Ryser
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 13, 2004 5:05 AM

Tony,

if you want to share data structures between the two processors you will need to flush the caches if the data structures are in cacheable area. EDK has some simple functions in the xcache_l.h library that help you do that. Further, Erik has described the process very well in his email.

Alternatively, you can use some BRAM memory for shared data structures and put them into uncached memory. Instead of a BRAM you can alias the main memory, i.e. give it in EDK a larger address range than it really has and then cache the first 128MB but not cache the second (aliased)

128MB. By putting the shared structures into the second 128MB they are automatically uncached and immediately written back to main memory.

Last but not least you can use the MMU and set up a few pages and divide the main memory into cached and uncached areas at page boundaries. For many applications this can be done statically, ie. you do not have to worry about TLB replacement.

- Peter

T Lee wrote:

second.