This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Nasty debugging problem

I have a debugging problem that is utterly baffling to me.  I need some clues on how to proceed.  I'm using CCSv4 with the LogicPD EVM c6748 board.

Something is contaminating memory, but it occurs in a repeatable fashion, always at the same point in the code -- yet the code itself has no visible fault!  The code looks fine.

Here is a snippet of the code where the problem occurs:

 int i;

unsigned short a[6043], b[6043];

for ( i=0; i<6043; i++)  a[i] = b[i];

The error occurs when i=5110.  Yet there's nothing wrong with the code.

Moreover, the error is that i (the index) SUDDENLY JUMPS TO A LARGE VALUE. The index i is a local variable that the compiler has stored on the stack, and I can visibly watch this stack location as I step through the disassembled code.  The index i is incremented nicely until it gets to 5110, whereupon it inexplicably jumps to a large value.

Here is the same code as seen from the disassembler (with my added comments).  I marked in red where the index i gets contaminated on the stack.  Again, the code looks fine, yet the error occurs.  I need some ideas on how to proceed with debugging:

            ZERO.L2       B4                                // B4 = 0 = i
            STW.D2T2      B4,*+SP[1]                  // SP[1] = 0 = i
            NOP           2                                     //
            MVK.S2        0x179b,B5                     // B5 = 6043
            CMPLT.L2      B4,B5,B0                     // Is B4 < B5, result in B0
     [!B0]  B.S1          C$DW$L$_memoryroutine$28$E (PC+88 = 0xc0a59118)
            NOP           5


            MVK.S2        0x2600,B6                    // B6 = &b
            MVKH.S2       0xc07c0000,B6            // "  
            MV.L2         B4,B5                            // B5 = B4 = i
 ||         LDHU.D2T2     *+B6[B4],B4                // B4 = b[i]
            MVK.S2        0xffffe060,B9                 // B9 = &a
            MVKH.S2       0x80000000,B9            // "
            NOP           2                                     // "
            STH.D2T2      B4,*+B9[B5]                 // a[i] = B4 [index i gets clobbered here on the stack]
            NOP           2                                     //
            LDW.D2T2      *+SP[1],B4                  // B4 = i
            NOP           4                                     //
            ADD.L2        1,B4,B4                          // i++
            STW.D2T2      B4,*+SP[1]                    // SP[1] = i
            NOP           2                                       //
            MVK.S1        0x179b,A3                      // A3 = 6043
            CMPGT.L1X     A3,B4,A1                     // Is 6043 > i? Result in A1
     [ A1]  B.S1          C$L42 (PC-48 = 0xc0a590d0)             // Branch back if greater than
            NOP           5

  • Walter Snafu said:
    unsigned short a[6043], b[6043];

    are these local or global variables? if a and b are local they may not fit in the stack, try to make them global and see if it solves the issue.

     

  • Mariana,

    The arrays (a and b) are global and static (though I didn't include that fact in my previous post) and stored in a defined section of memory. The automatic variable i is local and on the stack.

    Everything about the code looks okay to me, even when I look at the disassembled code.  Yet and error occurs inexplicably on the stack.  I'm guessing it has something to do with the environment?  (such as the CCV4 and the LogicPD EVM, mixed with interrupts and breakpoints).  But I don't have much of a clue -- I'm just guessing, because it's sooo inexplicable to me.

     

  • Can you make a simple test project and attach to the forum post?

  • Mariana,

    Another clue is that the problem isn't always present.  The problem has come, and gone, and come again, as my code changed and grew.  The problem wasn't present when my code was smaller, but it eventually creeped in when my code got larger.  I suspect the problem has something to do with my larger code size these days.

    I could try chopping out large chunks of the software, to make a tidy test project, however, I doubt the problem would manifest itself, so it wouldn't make a good test file.  I would have to create a reasonably small test file that manifests the problem -- and I doubt I could do that.  On the other hand, I can't go posting my entire project here.  So I'm stuck on how to proceed. 

  • Walter,

    Before going off and spending a bunch of time moving and modifying code, can you possibly try doubling the size of your stack and seeing if it makes a difference (or maybe just increase by 10-20%).  Perhaps the stack is being overflowed and encroaching on the .bss location where the global arrays are stored.   Or the stack could be encroaching on other memory that gets overwritten in between call and return.  So, you store i = 5110 and then upon return the value has been modified to some other value.  i know you said that this problem occurred consistently at 5110.  I'm wondering if increasing the size of the stack will either eliminate it, or cause it to occur at some other value of i.  If there is no change, then we can eliminate this as the source of the problem.

     

    Regards,

    Dan

  • Mariana,

    I tried (already) increasing the stack size, and the problem remained. 

    Also, the problem occurs within one-and-the-same subroutine.  That is, a context switch (a call or return) does not occur within the problem.  The index i (used within the FOR loop) gets created, and successfully used -- and eventually globbered and corrupted on the stack -- all within one subroutine

    I am greatly baffled by this.  I've tried all the obvious things to detect/correct this problem.  I need some new insight into this.

  • Usually for problems like this the problem could in another place and it just propagates, so sometimes it is necessary to think "outside of the box".

    Without a test case is hard to help, but here are some hints:

    1) look for broken pointers:

    - remember when you increase a pointer type long, it increases (++) by 32 bit the address in memory, etc

    - make sure that if you are allocating anything (malloc, MEM_alloc) that the function is returning a non-zero value before you use the allocated area. Could be that you need to increase the hep.

    2) Are you using DSP/BIOS? If you are and this routine is a TSK, you need to increase the task stack in the properties, not only the global stack.

     

  • I solved my debugging problem.  Too difficult to explain here in full.  But my stackpointer was getting corrupted, and then the cpu would go every-which-where and corrupt memory until it lost its mind and behaved in ways that were unpredictable just looking at the code.   

    I'm thinking of setting up the exceptions to catch this sort of thing sooner.  (That is, I'll hook the exceptions into the non-maskable interrupts, so when weird things happen, it will be detected and stop the cpu from going further.)   Seems straightforward to do.

    Thanks for your help.

  • That is great news. Thanks for the feedback!