Puzzled trying to bring up an XScale PXA255 board

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
The company I work sells a product designed around a StrongARM
SA-1110 processor.  The board and all software for this product
were designed in-house.  Now that Intel has discontinued the StrongARM
we're redesigning the board to use a PXA255.  I did the original
firmware on the old product and now I'm assigned me to port the
firmware to the new board.

As a first step I wrote a simple program that runs completely from
flash (no RAM used at all) that should toggle GPIO pin 66.  I
expected to see a nice square wave on the oscilliscope.  Well as
you might guess I wouldn't be posting this message if I had seen
the square wave.  Instead of a square wave the osciliscope shows a
pattern like

            _            _
           | |          | |
   ________| |__________| |___________
 

I've placed the actual ARM assembly code program at the end of this
message.  In pseudo-code the program is

    vectors:   @ This is linked to load into loc 0x00000000
         b reset_handler
        
    reset_handler:
         initialize the CPU
         call the delay subroutine
         initialize the I/O pins
    main_loop:
         Set the pin high
         call the delay subroutine
         Set the pin low
         call the delay subroutine
         branch to main_loop

    delay:
         delay ~200 usec
         return to caller


What's got me puzzled is that when I add/subtract NOPs or change
the location of subroutines it can make the program stop or start
working.  I modified the program slightly, swaping the locations
of the delay subroutine and the main loop, and now I see something
like

            __________            __________
           |          |          |          |
 __________|          |__________|          |___________
 

The pseudo-code for the this program is

    vectors:       @ This is linked to load into loc 0x00000000
         b reset_handler
        
    delay:
         delay ~200 usec
         return to caller

    reset_handler:
         initialize the CPU
         call the delay subroutine
         initialize the I/O pins
    main_loop:
         Set the pin high
         call the delay subroutine
         Set the pin low
         call the delay subroutine
         branch to main_loop


To me the two programs look like they should do exactly the same
thing.  The only difference between the two is the location of the
delay subroutine.  I've attached the actual source code to the both
of the programs below.  I'm using the gas assembler.

Is there anything obvious I'm missing in the initialization?  (The
initialization code is largely, but not completely taken from eCos.)
Or is there a possible hardware problem that could cause this?  Our
Intel rep says it's not a hardware issue but hasn't come up with
any hints about what's wrong with the software.




Quoted text here. Click to load it
@ This is the program doesn't seem to work on my board
@
@ The CPWAIT macro is explained in the PXA255 Microarchitecture Manual
@ in section 2.3.3
@
..macro CPWAIT register=r0
        mrc p15, 0, \register, c2, c0, 0
        mov \register, \register
        sub pc, pc, #4
..endm

        
@
@ This macro does nothing but it expands to the same number of bytes
@ as the CPWAIT instruction
@
..macro NO_CPUWAIT register=r0
        nop
        nop
        nop
..endm


        .equ PXA_OSCR,     0x40A00010      @ OS timer counter register
        .equ GPCR2,        0x40E0002C      @ GPIO pin output clear register
        .equ GPSR2,        0x40E00020      @ GPIO pin output set register
        .equ GPDR2,        0x40E00014      @ GPIO pin direction register
        .equ PMCR,         0x40F00000      @ Power Manager Control register
        .equ PSSR,         0x40F00004      @ Power Manager Sleep Status reg.
        .equ ICMR,         0x40D00004      @ Interrupt controller mask reg.

        .set CPSR_IRQ_DISABLE,     0x80
        .set CPSR_FIQ_DISABLE,     0x40
        .set CPSR_SUPERVISOR_MODE, 0x13
        
                                
        .section cpuvectors, "a"
                  
        @
        @ Here are the processor vectors.  They need to be linked at
        @ location 0x00000000.  When the CPU is first initialized
        @ it doesn't respond to external interrupts and hopefully we
        @ won't have any undefined instructions, aborts, or software
        @ intterupts.  Any exception (other than the reset) will cause
        @ us to stop.
        @
arm32_vectors:  
        b reset_handler         /* reset                   */
        b stop                  /* undefined instruction   */
        b stop                  /* swi vector              */
        b stop                  /* abort prefetch          */
        b stop                  /* abort data              */
        b stop                  /* not used                */
        b stop                  /* IRQ                     */
        b stop                  /* FIQ                     */


        .global reset_handler
reset_handler:
_start:
        @ Disable interrupts, by setting the Interrupt Mask Reg. to all 0's
        ldr     r1,=ICMR
        mov     r0,#0
        str     r0,[r1]
        nop
        nop
        nop
        nop
        nop
        nop
                
        @ disable MMU
        mov     r0, #0x0
        mcr     p15, 0, r0, c1, c0, 0

        @ flush TLB
        mov     r0, #0x0
        mcr     p15, 0, r0, c8, c7, 0   @  Flush TLB

        @ flush I&D caches and BTB
        mov     r0, #0x0
        mcr     p15, 0, r0, c7, c7, 0   @  Flush caches

        CPWAIT r0

        @ Disable the IRQ's and FIQ's in the program status register and
        @ enable supervisor mode
        ldr     r0,=(CPSR_IRQ_DISABLE|CPSR_FIQ_DISABLE|CPSR_SUPERVISOR_MODE)
        msr     cpsr, r0

        @
        @ The number of NOPs seems to have an effect on the program
        @
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        nop

        @
        @ Delay a bit before we setup the output pins.
        @
        bl delay
        nop
        
        @
        @ Set up GPIO66 as an output pin
        @ See PXA250 Dev Man 4.1.3.2
        @
        ldr r3, =GPDR2
        mov r2, #0x04
        str r2, [r3]

        @
        @ Clear the RDH bit.  See Dev. Man. 3.5.7
        @
        ldr     r3, =PSSR
        mov     r2, #0b110000
        str     r2, [r3]

top_of_loop:
        @
        @ Set GP66 high.  See PXA250 Dev Man 4.1.3.3
        @
        ldr r3, =GPSR2
        mov r2, #0x04
        str r2, [r3]

        bl delay

        @
        @ Set GP66 low.  See PXA250 Dev Man 4.1.3.3
        @
        ldr r3, =GPCR2
        mov r2, #0x04
        str r2, [r3]

        bl delay
        
        b top_of_loop

stop:   b stop        

    
delay:          
        ldr r3, =PXA_OSCR     @ reset the OS Timer Count to zero
        mov r2, #0  
        str r2, [r3]

2:      ldr r2, [r3]
        cmp r2, #0
        beq 2b
        
        ldr r4, =0x300        @ 0x2E1 is about 200usec, so 0x300 will be plenty
        @ ldr r4, 36%86400    @
1:      
        ldr r2, [r3]          @ Get the current timer value
        cmp r4, r2            @ Have we delayed long enough?
        bgt 1b                @ If we haven't delayed enough, keep looping

        mov  pc, lr           @ return to the calling code.
        
                        
                
        @ Adjust the alignment of the literal pool
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        nop
..ltorg
        
Quoted text here. Click to load it







@ This program works fine on my board
@
@ The CPWAIT macro is explained in the PXA255 Microarchitecture Manual
@ in section 2.3.3
@
..macro CPWAIT register=r0
        mrc p15, 0, \register, c2, c0, 0
        mov \register, \register
        sub pc, pc, #4
..endm

        
@
@ This macro does nothing but it expands to the same number of bytes
@ as the CPWAIT instruction
@
..macro NO_CPUWAIT register=r0
        nop
        nop
        nop
..endm


        .equ PXA_OSCR,     0x40A00010      @ OS timer counter register
        .equ GPCR2,        0x40E0002C      @ GPIO pin output clear register
        .equ GPSR2,        0x40E00020      @ GPIO pin output set register
        .equ GPDR2,        0x40E00014      @ GPIO pin direction register
        .equ PMCR,         0x40F00000      @ Power Manager Control register
        .equ PSSR,         0x40F00004      @ Power Manager Sleep Status reg.
        .equ ICMR,         0x40D00004      @ Interrupt controller mask reg.

        .set CPSR_IRQ_DISABLE,     0x80
        .set CPSR_FIQ_DISABLE,     0x40
        .set CPSR_SUPERVISOR_MODE, 0x13
        
                                
        .section cpuvectors, "a"
                  
        @
        @ Here are the processor vectors.  They need to be linked at
        @ location 0x00000000.  When the CPU is first initialized
        @ it doesn't respond to external interrupts and hopefully we
        @ won't have any undefined instructions, aborts, or software
        @ intterupts.  Any exception (other than the reset) will cause
        @ us to stop.
        @
arm32_vectors:  
        b reset_handler         /* reset                   */
        b stop                  /* undefined instruction   */
        b stop                  /* swi vector              */
        b stop                  /* abort prefetch          */
        b stop                  /* abort data              */
        b stop                  /* not used                */
        b stop                  /* IRQ                     */
        b stop                  /* FIQ                     */


delay:          
        ldr r3, =PXA_OSCR     @ reset the OS Timer Count to zero
        mov r2, #0  
        str r2, [r3]

2:      ldr r2, [r3]
        cmp r2, #0
        beq 2b
        
        ldr r4, =0x300        @ 0x2E1 is about 200usec, so 0x300 will be plenty
        @ ldr r4, 36%86400    @
1:      
        ldr r2, [r3]          @ Get the current timer value
        cmp r4, r2            @ Have we delayed long enough?
        bgt 1b                @ If we haven't delayed enough, keep looping

        mov  pc, lr           @ return to the calling code.
        
                        
                
        .global reset_handler
reset_handler:
_start:
        @ Disable interrupts, by setting the Interrupt Mask Reg. to all 0's
        ldr     r1,=ICMR
        mov     r0,#0
        str     r0,[r1]
        nop
        nop
        nop
        nop
        nop
        nop
                
        @ disable MMU
        mov     r0, #0x0
        mcr     p15, 0, r0, c1, c0, 0

        @ flush TLB
        mov     r0, #0x0
        mcr     p15, 0, r0, c8, c7, 0   @  Flush TLB

        @ flush I&D caches and BTB
        mov     r0, #0x0
        mcr     p15, 0, r0, c7, c7, 0   @  Flush caches

        CPWAIT r0

        @ Disable the IRQ's and FIQ's in the program status register and
        @ enable supervisor mode
        ldr     r0,=(CPSR_IRQ_DISABLE|CPSR_FIQ_DISABLE|CPSR_SUPERVISOR_MODE)
        msr     cpsr, r0

        @
        @ The number of NOPs seems to have an effect on the program
        @
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        nop

        @
        @ Delay a bit before we setup the output pins.
        @
        bl delay
        nop
        
        @
        @ Set up GPIO66 as an output pin
        @ See PXA250 Dev Man 4.1.3.2
        @
        ldr r3, =GPDR2
        mov r2, #0x04
        str r2, [r3]

        @
        @ Clear the RDH bit.  See Dev. Man. 3.5.7
        @
        ldr     r3, =PSSR
        mov     r2, #0b110000
        str     r2, [r3]

top_of_loop:
        @
        @ Set GP66 high.  See PXA250 Dev Man 4.1.3.3
        @
        ldr r3, =GPSR2
        mov r2, #0x04
        str r2, [r3]

        bl delay

        @
        @ Set GP66 low.  See PXA250 Dev Man 4.1.3.3
        @
        ldr r3, =GPCR2
        mov r2, #0x04
        str r2, [r3]

        bl delay
        
        b top_of_loop

stop:   b stop        

    
        @ Adjust the alignment of the literal pool
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        nop
..ltorg
        
Quoted text here. Click to load it



--
=======================================================================
 Life is short.                  | Craig Spannring
We've slightly trimmed the long signature. Click to see the full one.
Re: Puzzled trying to bring up an XScale PXA255 board
Quoted text here. Click to load it

Where did you set up the stack?  I guess it doesn't matter as you
didn't have nested routines.  However, this part of the delay routine
seems strange,

2:      ldr r2, [r3]
        cmp r2, #0
        beq 2b

If the timer increment after you have reset it and before you check,
then you could be waiting for an overflow.  I don't think you need this
code... but I don't really know anything about the PXA255 timer
registers.  This just seems strange from general knowledge of timers.

fwiw,
Bill Pringlemeir.

--
Little girls, like butterflies need no excuses.  - Robert Heinlein

Re: Puzzled trying to bring up an XScale PXA255 board
Quoted text here. Click to load it



The OSCR timer on the PXA255 is a little different than what I'm used
to as well.  From section 4.4.2.4 in the PXA255 Developer's Manual-

   After the OSCR is written, there is a delay before the register is
   actually updated.  Software must make sure the register has changed
   to the new value before relying on the contents of the register.

I tried replacing the

   2:      ldr r2, [r3]
           cmp r2, #0
           beq 2b

with the same number of NOPs and it didn't seem to make any difference
either way.  The broken one still put out a little periodic spike, the
working one still put out a nice regular square wave.

I have to admit that I'm a bit nervous about the OSCR code.  I haven't
found anything in the documentation that guarantees how long the OCSR
will keep the new value.  As you pointed out if it updates the
register to the new value and then increments the value before I can
read the value it will stay in that loop for ~19 minutes.  Since I've
never seen either program lock up for 19 minutes I don't think I've
encountered that sort of problem.  Still, in production code I should
probably take more care to avoid that problem.

Even with the question about properly writing to the OSCR register, I
still can't figure out why the two programs should behave in such
totally different fashions.

Re: Puzzled trying to bring up an XScale PXA255 board
Are you sure that it is related to the board and code ?
Maybe the trigger setting on the oscilloscope ? The wave
form looks like it is impossible to create it..



Quoted text here. Click to load it
register
register
reg.
reg.
0's
r0,=(CPSR_IRQ_DISABLE|CPSR_FIQ_DISABLE|CPSR_SUPERVISOR_MODE)
Quoted text here. Click to load it
plenty
register
register
reg.
reg.
plenty
0's
r0,=(CPSR_IRQ_DISABLE|CPSR_FIQ_DISABLE|CPSR_SUPERVISOR_MODE)
Quoted text here. Click to load it


Re: Puzzled trying to bring up an XScale PXA255 board
Hi,
you nothing tell about OS & interrupts disable. They speak under NT/XP
have worked the DOS application because for ones direct use of
hardware registers are implemented over driver.

Cheers

Re: Puzzled trying to bring up an XScale PXA255 board
snipped-for-privacy@hotmail.com (Craig Spannring) wrote in message
Quoted text here. Click to load it
What happens if you look at the OE on the flash along with the I/O
output?  Do you see bursts of instruction fetches that correspond
to the waveform you are getting?  Is there a chance that some of the
code is being locked in the instruction cache?  Is the flash being
read in page mode?  Those kinds of things can throw off expected
timing.

Re: Puzzled trying to bring up an XScale PXA255 board
snipped-for-privacy@aol.com (Dingo) wrote in message
Quoted text here. Click to load it


The OE has a burst of activity around the time when the pin
state is changed and is quiet the rest of the time.  

Between the times when the pin is changing, the code is executing
the delay loop which is tight enough to fit in one of the two 32
byte instruction fetch buffers which would explain why it doesn't
access the Flash in between the times when the output changes.

Site Timeline