PPC405 32 bit aligned accesses

I

I. Ulises Hernandez 20 years ago

Hello everybody,

Hopefully someone can give me a hand with a PPC405 issue...

How do you configure the PPC so that it only performs 32-bit aligned accesses to DDR? The DDR2 module I am interfacing is byte wide and I do NOT have access to the Data Mask pins, a DDR Word becomes 8 (bits/byte) x 4 (Burst Length) = 32 bits wide...

Thanks in advance,

-- Ulises Hernandez " I'm not normally a praying man, but if you're up there, please save me, Superman!" - Homer Simpson ;O)

Vote

S

Symon 20 years ago

Vote

I

I. Ulises Hernandez 20 years ago

Thanks Symon,

I had actually enabled the caches and still seems to be performing byte accesses every now and then... if variables have been defined as 'bytes' then the compiler seems to be generating Load Byte assembler instructions, also for strings (I can see in simulation - PPC swift model than 'printf' for instance is generating PLB byte accesses)...

Regards,

-- I.U. Hernandez " I'm not normally a praying man, but if you're up there, please save me, Superman!" - Homer Simpson ;O)

Vote

S

Symon 20 years ago

Vote

K

Kolja Sulimma 20 years ago

If it happens rarely you can create a bus error and trap it to a service routine that berforms a 32-bit wide read-modify-write transaction to do the byte write.

Of course you can as well add hardware that does this to the memory controller.

Also, make sure that the compiler knows that the memory region is cachable. In an embedded world it could be that all addresses default to address space with side effects. In that case the compiler has no choice but to perform access to data of type byte as individual byte accesses.

Kolja Sulimma

Vote

I

I. Ulises Hernandez 20 years ago

Thanks for your replies...

See below for my comments.

Exactly what we thought ;O)

I am not sure if that is the case, I'll double check! I am not a software expert... just trying to speed up things in this project giving a hand to the softies. I am actually the guy who wrote the HDL for the PPC system on this card and part of it is the PLB_DDR2 controller.

Ok, I see what you mean... the plan is to use Montavista Linux on the card and I am not sure how easy would be to add such service. I'll check with the software guys.

32-bit wide read-modify-write transaction to do byte writes... that is going to be the solution for the meantime, messy PLB_DDR2 controller but, it will get us further.

I have checked that and it is cacheable...

Regards, U. Hernandez

Vote

P

Peter Ryser 20 years ago

If the data cache is in write-back mode and is turned on for a certain memory region you will not see byte, halfword or word transactions. All you will see are cache-line aligned cache-line transactions.

Only a good idea for a limited set of applications. Anyway, before doing anything in the DDR2 memory you need to turn the data cache on for the address region where it is mapped. You can do that with a small program running from BRAM and the jump to the entry point of the application in the DDR2 memory. Here is some code that will do that for running Linux later on:

#include "xcache_l.h"

int main() { void (*f)(void);

// enable I/D cache for first 128 MB XCache_EnableICache(0x80000000); XCache_EnableDCache(0x80000000);

f = (void*) 0x400000; // entry point of the Linux kernel bootloader (*f)();

/* we should never get here */ return -1; }

It is very likely that you will need to patch your card and bring the byte masks out. Linux uses the MMU to set up cacheable regions, ie. it will/may put descriptors (networks, etc.) into uncacheable areas. Unless you plan to modify the Linux kernel heavily it is very unlikely that your system will work. You can boot Linux though from a RAM disk to least get started.

That has nothing to do with the compiler, i.e. the compiler does know nothing about caches. The programmer is responsible to maintain memory coherency.

Again, please consider to patch your board. IMHO, it's the fastest way to make progress.

- Peter

Vote

P

Peter Ryser 20 years ago

If the data cache is in write-back mode and is turned on for a certain memory region you will not see byte, halfword or word transactions. All you will see are cache-line aligned cache-line transactions.

Only a good idea for a limited set of applications. Anyway, before doing anything in the DDR2 memory you need to turn the data cache on for the address region where it is mapped. You can do that with a small program running from BRAM and the jump to the entry point of the application in the DDR2 memory. Here is some code that will do that for running Linux later on:

#include "xcache_l.h"

int main() { void (*f)(void);

// enable I/D cache for first 128 MB XCache_EnableICache(0x80000000); XCache_EnableDCache(0x80000000);

f = (void*) 0x400000; // entry point of the Linux kernel bootloader (*f)();

/* we should never get here */ return -1; }

It is very likely that you will need to patch your card and bring the byte masks out. Linux uses the MMU to set up cacheable regions, ie. it will/may put descriptors (networks, etc.) into uncacheable areas. Unless you plan to modify the Linux kernel heavily it is very unlikely that your system will work. You can boot Linux though from a RAM disk to least get started.

That has nothing to do with the compiler, i.e. the compiler does know nothing about caches. The programmer is responsible to maintain memory coherency.

Again, please consider to patch your board. IMHO, it's the fastest way to make progress.

- Peter

Vote

I

I. Ulises Hernandez 20 years ago

Peter,

Really appreciatte all your comments, I am going to inform the software guys straight away; I think we have caught this problem on time, just before prototyping...

I'll keep you informed on how things go.

Regards, Ulises Hernandez.

Vote

S

Symon 20 years ago

Hi Peter, Could you explain why Linux puts these things in uncacheable areas? Also, how does a RAM disk help you boot? Thanks very much, Syms.

Vote

P

Peter Ryser 20 years ago

It's not only Linux, i.e. the PPC, but also the (DMA based) peripherals.

Assume the PPC and a peripheral share some memory in the memory address space to communicate status. One such example can be the descriptor for an Ethernet transfer where one bit indicates whether the descriptor is free or busy.

The PowerPC can have the descriptor cached or uncached depending on the implementation of the device driver. In the cached version the driver writes to the status bit and the flushes the memory area out of the cache. If the driver allocates uncached memory for the descriptor no flushing is needed and the area gets updated immediately upon a write.

Now, a peripheral (almost) always accesses this bit uncached and will directly write to the byte containing this bit. However, if the memory does not support single byte writes a whole word will be overwritten and information is changed that should not have been.

In Linux it is up to the developer of the device driver on how she implements the access to the shared memory. However, a user will not be able to tell what is implemented in the device driver without studying the code.

I was using the RAM disk based Linux as a placeholder for a very simple Linux system without DMA, i.e. a UART as the only peripheral in the system. The RAM disk does not generally fix the missing byte masks ;-)

- Peter

Sym>

Vote

S

Symon 20 years ago

Hi Peter, OK thanks, so the mists are starting to clear a little. I can see it's a problem if byte wide semaphores are present in the memory range that the PPC sees as cached that an external peripheral needs to access by DMA. So, here's my next question!!

If I locate all these peripheral status bits into memory on the Xilinx device, i.e. in BlockRAM, rather than in the external 'no mask bits' SDRAM/DDR RAM, does that solve the problem? It's possible to make the BlockRAMs work uncached and with byte access, even though the external SDRAM is cached.

Thanks again, Syms.

Vote

P

Peter Ryser 20 years ago

Theoretically yes. But you would still have to go through all Linux device drivers, or at least the ones you plan to use, and modify them so that they would use the BRAM address space instead of the main memory address space for these special variables.

IMHO, the soldering iron is the faster approach to get this working...

- Peter

Vote

I

I. Ulises Hernandez 20 years ago

...I'm still waiting for the softies to come back, in the meantime I have modified my DDR2 controller, actually the PLB side of it (I didn't want to change the controller itself for this mess) such that when it detects a byte/16-bit word write it performs a read-modify-write access. It works! Instead of taking 'x' clk cycles it takes twice that :O( What I've seen in my simulation though is that byte/16bit word accesses happen once in a blue moon. I have a card with no Data Masks and a card almost in Layout which is definetely getting Data Masks wired up, this mod will get me further with the old card.

Regarding...

The answer is yes, I have had our system using uncached BlockRAM and cached SDRAM, no problem. Although as Peter said you would have to modify all Linux device drivers and to me sounds painful (I insist, I'm not a software guy...yet)

Regarding the soldering iron :O), in our design the Byte Data Mask signals are going to be toggling @ 200 MHz times two because its DDR nature, 400 MHz... a wee bit too fast for a normal soldering jobby ;)

Regards, Ulises Hernandez

Vote

S

Symon 20 years ago

Hi Ulises, Peter, Thanks guys for an interesting discussion! Comments below...

Hey, good job!! Sounds like something Xilinx might wanna add to their RAM controllers in EDK! ;-)

cached

Linux

It's also a little tricky getting the iron underneath the BGA package! (Although there are companies that can remove the BGA, add enamelled wires to the pads you need, and replace the BGA.)

So, me and one of the guys were talking about this last night. He reckons that MIPS processors only ever do 32 bit accesses, i.e. no byte mask signals come out of the device. How does Linux run on a MIPS processor? Does it do read-modify-write, or does the software guy define all bytes as 32 bit words in the compiler?

Cheers, Syms.

Vote

P

Peter Ryser 20 years ago

That's what's being done in ECC modes but in general it's not what you want to do for performance reasons.

The last time I looked at MIPS on an R4000 a long time ago the byte enables where encoded into a combination of Address and System Cmd Bus. I reckon this did not change too much since then. Further, on the MIPS you can not do unaligned accesses (i.e. access a word at address 1 or a halfword at address 1 or 3) whereas the PowerPC can do that without problems. Last but not least you can get a PowerPC embedded in an FPGA where you cannot get a MIPS ;-)

- Peter

Vote

PPC405 32 bit aligned accesses

Join the Discussion

Didn't find your answer?