OFFTOPIC?: arm-linux-gnueabi-gdb error with cortex-m3 code

- J
- jackbenimble
  
  Contact options for registered users
posted
10 years ago

Thu, Aug 22, 2013 10:01 AM

So I have encountered a very odd gdb error that I cant make sense of. I am using version 4.4.5 of the gcc tools (arm-linux-gnueabi) and version 7.0.1 of the gdb (arm-linux-gnueabi) debugger. I am using stm32f103 cortex-m3 board

Basically gdb seems to be clobbering the values passed to functions. Heres an example:

Breakpoint 1, main () at apps/core/core_test.c:46

46 wdTemp = wdTemp; /*dummy ins for breakpoint*/ (gdb) n 47 tclib_printf("\r%d", wdTemp); (gdb) p wdTemp $1 = 0 (gdb) s tclib_printf (ptrString=0x0, wdValue=536874884) at tclib/IE_tclib.c:140 140 while ((*ptrString) != NULL) (gdb) p strSystick $2 = {dwMsTick = 0, dwSeconds = 1, dwMsTotal = 1000, ptrFunc = 0} (gdb)

ptrString should be an address in the range (0x2000 0000 to 0x2000 5000) see the disassembly below and wdTemp passed as wdValue should be 0

a disassembly of the lines just before the call to my tclib_printf() routine shows that r0 and r1 are initialized as needed since they are the only two arguments to the function

20000e0a: 687b ldr r3, [r7, #4] 20000e0c: 607b str r3, [r7, #4] 20000e0e: 687b ldr r3, [r7, #4] 20000e10: f640 6050 movw r0, #3664 ; 0xe50 20000e14: f2c2 0000 movt r0, #8192 ; 0x2000 20000e18: 4619 mov r1, r3 20000e1a: f7ff f9a3 bl 20000164

a disassembly of the tclib_printf() routine shows that it starts up as expected and does nothing special to the values passed. what gives? I am completely stumped. The stack is at the top of memory and there is no issue there since these parameters are passed on r0 and r1

20000164 : 20000164: b580 push {r7, lr} 20000166: b086 sub sp, #24 20000168: af00 add r7, sp, #0 2000016a: 6078 str r0, [r7, #4] 2000016c: 6039 str r1, [r7, #0] I am stumped!!! Is there something in gdb' setup or view of this object file I am omitting?

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Thu, Aug 22, 2013 10:23 AM

Please check that your stack is initially aligned on two-fullword boundary (8 bytes). The EABI specification assumes 8 byte aligned stack.

Another question is if the library code is compiled with optimization. Certain optimization options make the code very difficult for the debugger. You can check the register contents at the breakpoint (info reg).

--

Tauno Voipio

- J
- jackbenimble
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Thu, Aug 22, 2013 5:06 PM

Thanks for replying ... my response below.

There is 20k worth of ram on this chip and I have set my linker script to MEMORY { STM32_RAM : ORIGIN = 0x20000000, LENGTH = (20480 - 1024) }

and my ivt table to

.global stm32_ivt .equ STM32_SRAM_BASE,0x20000000 .thumb .extern main .data stm32_ivt: .word STM32_SRAM_BASE + (20 * 1024) .word (main + 1) .skip (14 * 4) .skip (60 * 4) .text

I addition I have bit 9 of the NVIC CCR (STKALIGN) bit set (gdb) monitor mdw 0xe000ed14

0xe000ed14: 00000210

There are no issues with void functions ... just functions that pass arguments.

No optimization here - at least by habit whenever I use -g CFLAGS = -g -c -Wall -nostdlib -mcpu=cortex-m3 -mlittle-endian -mthumb - I core/include -I tclib \ -mabi=aapcs -O0 LDFLAGS= -nostdlib -e main -Map flash.map -L linker -T IE_stm32.ld -- cref

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Aug 23, 2013 5:23 PM

1024)

Are you linking in the GDB stub? If so, you're likely blowing the stack and corrupting your heap ... the stub itself may use up to several KB of stack [chip and I/O dependent].

If you're not using the stub, then I'm out - I don't work with ARM and I haven't otherwise run into this particular GDB problem.

Good luck! George

- L
- Luis Filipe Rossi
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Aug 23, 2013 7:38 PM

s

40

0}

- R
- rombios
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Aug 23, 2013 10:27 PM

Its weird because non of the parameters are on stack. As you know the arm procedure calling convention uses r0-r3 for the first four parameters. Somehow execution under gdb corrupts r0 and r1 (basically any parameters passed to a function)

Heres a debugging session to highlite what I mean the gdb (layout asm) and stepi command clearly shows r0 and r1 being initialized correctly before the call to tclib_printf (prologue as it were)

46 tclib_printf("\r%d", wdTemp);

|0x20000e0e ldr r3, [r7, #4] ?0x20000e14 movw r0, #3668 ; 0xe54 ?0x20000e18 movt r0, #8192 ; 0x2000 ?0x20000e1c mov r1, r3 ?0x20000e1e bl 0x20000388

Here is a disassembly of the first few lines of tclib_printf ?0x20000388 lsls r1, r6, #26 ?0x2000038a movs r0, #0 ?0x2000038c lsls r1, r6, #26 ?0x2000038e movs r0, #0 ?0x20000390 lsls r1, r6, #26 ?0x20000392 movs r0, #0 Which bear NO RESEMBLANCE to the objdump -d disassembly of the out file

THIS HAS ME STOMPED. I dont know how those instructions got there. Heres the c code of the first few lines of tclib_printf and the objdump of the .out file before loading to gdb

void tclib_printf(char *ptrString, int wdValue) { unsigned char sbString[9]; int wdTemp;

while ((*ptrString) != NULL) { wdTemp =*ptrString; switch((char)wdTemp) { case '%': { wdTemp = *(++ptrString); switch(wdTemp

arm-linux-gnueabi-objdump -d core_test.out |grep tclib_printf

20000388 : 20000388: b580 push {r7, lr} 2000038a: b086 sub sp, #24 2000038c: af00 add r7, sp, #0 2000038e: 6078 str r0, [r7, #4] 20000390: 6039 str r1, [r7, #0] 20000392: e0f6 b.n 20000582 20000394: 687b ldr r3, [r7, #4] 20000396: 781b ldrb r3, [r3, #0] 20000398: 60bb str r3, [r7, #8] 2000039a: 68bb ldr r3, [r7, #8] 2000039c: b2db uxtb r3, r3 2000039e: 2b25 cmp r3, #37 ; 0x25 200003a0: d002 beq.n 200003a8 200003a2: 2b5c cmp r3, #92 ; 0x5c 200003a4: d073 beq.n 2000048e 200003a6: e0b9 b.n 2000051c

objdump matches the C code. but somehow arm-linux-gnueabi-gdb has replaced the instructions in the code .. with manipulations of r0 and r1 that clobber their values.

I cant for the life of me figure out why this is happening ...

So I decide to dump the binary values in memory after tclib_printf (gdb) p tclib_printf $1 = {void (char *, int)} 0x20000388

(gdb) monitor mdh 0x20000388 20

0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 d002 2b5c d073 e0b9 0x20000388: b580 b086 af00 6078 6039 e0f6 687b 781b 60bb 68bb b2db 2b25 d002 2b5c d073 e0b9 0x200003a8: 687b f103 0301 607b

So these start out fine!!! after the code is loaded and before gdb runs.

I set a breakpoint at line 45 again (tclib_printf) then at the breakpoint I dump the memory again monitor mdh 0x20000388 10

0x20000388: 06b1 2000 06b1 2000 06b1 2000 06b1 2000 06b1 2000

AND the instructions have changed. Now any casual observer would reach the conclusion that somehow/somewhere after execution I am overwriting these values. But I assure thats not the case. I am not doing anything to clobber memory. I am almost certain of that - prior to this has been initialization or the core. To prove it

So I change the layout back to source set a breakpoint at line 140 of the tclib_printf and

?138 int wdTemp; ?139 B+>?140 while ((*ptrString) != NULL) ?141 { ?142 wdTemp =*ptrString; ?143 switch((char)wdTemp) ?144 { ?145 case '%':

No issues there ... but the program will segfault on invalid parameters if I continue. So its only the first few instructions of ANY function thats being clobbered ...

Stomped! Never saw this when I was working with the arm7tdmi - but probably had another version of the gnu dev tools ...

currently using gcc 4.4.5 gdb 7.0.1 gnueabi-

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Aug 24, 2013 3:17 AM

stack

of

parameters.

Sorry, I don't work with ARM. However, it's clear that R0 is being loaded with the address of the format string ... are you certain that the format string in memory is valid?

More to the point, does the code work if you just run it as a release compile or as a debug compile but without using the debugger?

breakpoint

That you know of.

The bit of linker script you provided didn't specify stack or BSS (uninitialized data) segments. You did mention the location of your stack, but it's generally a good idea to explicitly define the areas you want to use for BSS, code, heap and stack in your script.

The GDB stubs I'm familiar with [not for ARM but for other chips] allocate a pair of large static buffers (>= 1KB each) for I/O and also use a fair amount of stack when in operation ... up to 6KB of stack on one platform I've used.

If you don't include space for the debugger's static buffers in your BSS segment, then even just initializing the debugger stub may corrupt your code. BSS data and code normally are adjacent in memory, but where each is placed is up to the linker/loader.

Note that the compiler and/or linker will correctly size the BSS segment, but directives in the linker script override computed values. Since you didn't specify a BSS segment, the generated load file itself may be bad [not corrupt per se, but lacking necessary information]. You may need to define the BSS area and specify that it be sized using computed values [this is toolchain dependent].

And of course, if you don't allow sufficient extra space for the stack [or better, a separate stack if possible], using the debugger may blow the stack and corrupt adjacent memory.

Check the linker's output map file and make sure there is no overlap between the BSS data and code segments. Allow the program at least a few KB of stack and then see what happens.

the

if

That doesn't prove anything - your disassembly showed that the code bytes corresponding to your main() function were ok. In any event, the C code listing will appear to be correct regardless of whether memory has been corrupted: GDB isn't showing you a decompilation of the code bytes in memory, it is reading from the project file(s) on your build system. With memory corruption, a breakpoint set on the C code may never be hit or may break into unrecognizable assembly code.

George

- J
- jackbenimble
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Aug 24, 2013 7:38 AM

Sorry for the time waste. I have found the error after all. Wasnt gdb so much as my script file and the location of my Interrupt Vector Table.

I had a chance to revisit this with a clear head tonight and the clue should have been apparent as the repeating sequence of 0x200006b1 which is the value of my stm32_nvic_unknown_isr handler and my attempt to rebuild it in memory before changing the vector table.

Time to revisit the linker script ...

- J
- jackbenimble
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Aug 24, 2013 7:39 AM

Sorry for the time waste. I have found the error after all. Wasnt gdb so much as my script file and the location of my Interrupt Vector Table.

I had a chance to revisit this with a clear head tonight and the clue should have been apparent as the repeating sequence of 0x200006b1 which is the value of my stm32_nvic_unknown_isr handler and my attempt to rebuild it in memory before changing the vector table.

Time to revisit the linker script ...

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Aug 24, 2013 7:45 AM

^^^^^^^^^^^^^^

If you're running from RAM, find the piece of code overwriting the code with 0x200006b1, which seems to be a data pointer.

--

-Tauno