MontaVista Linux and Virtex-II & 4

Does anyone know if MontaVista Linux or other distributions support SMP in Virtex-II Pro and Virtex-4? Thanks.

Reply to
Osnet
Loading thread data ...

No, the PowerPC405 caches in the current Xilinx FPGAs are not cache coherent and so do not support SMP.

Paul

Osnet wrote:

Reply to
Paul Hartke

I don't really get the relation between the two facts ... The OS could enforce coherency in software, by forcing a cache flush during task switching I think ...

Sylvain

Reply to
Sylvain Munaut

In article , Sylvain Munaut writes: |> Paul Hartke wrote: |> > No, the PowerPC405 caches in the current Xilinx FPGAs are not cache |> > coherent and so do not support SMP. |> |> I don't really get the relation between the two facts ... The OS could |> enforce coherency in software, by forcing a cache flush during task |> switching I think ...

The idea of having caches also means that they are transparent so that you do *not* need any sort of special treatment by the programmer or operating system.

Besides, flushing with task switching wouldn't help as memory write accesses will occur independently from task switches, so your OS would need to keep track of memory accesses of all CPUs in your SMP system to block reads to "dirty" addresses until they have been written back by the "dirtying" CPU, i.e. the OS would have to establish a cache coherency protocol entirely in software and without typical hardware assist as required for cache coherency protocols in hardware like MESI.

Rainer

Reply to
Rainer Buchty

Well, there are quite a few CPU where you need to enfore coherency by hand when using DMA for example (there is even a flag for not coherent cache cpu in the kernel) ...

I don't get that sorry ... (note that you may be right, i'm just trying to understand here) I'm not that familiar with SMP but here is how it goes for me : Processes have two kind of writable memory zones, either private to the process or shared between several processes.

  • For independent address space, since the task will run only on a CPU at a time, cache problem only occurs when a task is stopped on one cpu and launched on the other. So just after stopping the task, the cpu should flush it's cache so that if the task is launched on the other cpu, the other has access to up-to-date memory. * For shared zone between processes, there is no problem either. While the two processes are running simultaneously, no problem can occur because the processes must handle the synchronisation themselves even on cache coherent system (by semaphore, or flag in memory, whatever ...). And when only one is running, the situation is similar to the independent zones.

Sylvain

Reply to
Sylvain Munaut

In article , Sylvain Munaut writes: |> Well, there are quite a few CPU where you need to enfore coherency by |> hand when using DMA for example (there is even a flag for not coherent |> cache cpu in the kernel) ...

Yes, I also could come up with a system e.g. requiring non-cacheable memory areas because one or more of the devices accessing the respective memory area is not able to support a cache coherency protocol.

No doubt that it can be done otherwise, but that's not the point.

|> * For independent address space, since the task will run only on a CPU |> at a time, cache problem only occurs when a task is stopped on one cpu |> and launched on the other. So just after stopping the task, the cpu |> should flush it's cache so that if the task is launched on the other |> cpu, the other has access to up-to-date memory.

And why would you specifically need shared memory in this respect?

If your application / system does not require shared memory access by design then of course you can come up with a light-weight solution like the above where it seems like that the one task is the only one dealing with a specific set of data.

|> * For shared zone between processes, there is no problem either. While |> the two processes are running simultaneously, no problem can occur |> because the processes must handle the synchronisation themselves even on |> cache coherent system (by semaphore, or flag in memory, whatever ...).

Ok, assume that the semaphore is placed in memory and we have a two-processor system where each processor runs one of those two tasks.

You could of course switch off caching of that very memory area holding the semaphore(s) and never have a problem. But also no cache for that area, i.e. the accesses will be dog slow.

You could also, as I understand your example, trigger a flush so that whenever one processor tries to read the semaphore, the other processor flushes the cache line(s) holding your semaphore(s) while the reading processor waits for that process to finish. Would work, but induces unnecessary bus load, waiting times (the semaphore might not have been changed since the last read), and furthermore a more or less complex communication protocol which needs to be triggerend whenever any of the processors tries to access the shared memory region.

The idea behind having a cache coherency protocol is to get consistency and coherency at no extra cost on software side. The programmer (or the OS) do not need to care about the entire process of access monitoring, stopping a read access on a memory region which has been dirtied by another processor, writing back that dirty information to memory, and restarting the read processor. The price for that you pay on hardware side, i.e. you require a common, snoopable bus, some additional communication signals (3 in case of MESI), logic to implement the light-weight protocol, and a slightly altered cache to hold the actual MESI state.

The idea behind MESI (or cache coherency protocols in general) is to keep the additional bus traffic as low as possible, i.e. accesses to memory only when necessary, keeping as much traffic inside the cache as possible.

Of course you could do all that also on software side using communication methods on OS and application level. But for the price of increased complexity, bus traffic, and access latencies.

Try scaling the 2-processor example up to 3, 4, or more processors.

Rainer

Reply to
Rainer Buchty

This cannot handle threaded applications running on multiple CPUs since they will share the same memory. If you do not have any threaded applications it might work. The biggest problem for the original poster is however that the multiprocessor support in the Linux kernel itself is designed on the principle that the memory system is cache coherent. Rewriting all of that is going to be non-trivial to say the least. The best you can hope for in this case is to run two copies of the Linux kernel, one on each processor.

/Andreas

Reply to
Andreas Ehliar

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.