This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Gracefully Resetting TI-RTOS Application

Other Parts Discussed in Thread: AM3358, SYSBIOS, SYSCONFIG

Hi!

I have an XDS560v2 STM LAN Debugger that I am using to debug my AM3358 TI-RTOS application with.  Every time I click the little "bug" button to start a debug session, it pauses for several seconds, as though it is re-programming the Debugger before starting the session.  I see the message "Configuring Debugger" and the "Activity-1" light on the Debugger blinks while this is happening.

That's all fine and well, but I often find myself in the middle of an application (classic example is tracing down the source of a CPU exception), and need to start the whole application over and over again to see how it got to where it is.  I would like to be able to QUICKLY get back to the beginning of the application again, yet I have not found a way to do so.  This is even more important if, indeed, the Debugger is being reprogrammed(?) every time I start a new debug session.  I see at the top of Code Composer Studio (CCS) that there are several options for resetting: (CPU SW, CPU HW, System Reset, and Emulator) and still another button to "Restart".  However, I have not had any luck in using these with TI-RTOS, because when I walk the application forward, I hit completely different behavior and problems as though something in memory is in fact in a different state than when I restart a whole new debugging session.  So the ONLY thing that has worked reliably (creating the same behavior) is starting a whole new debug session from scratch each time I need to re-start the application.

What I'm working with:  Win7-64-bit,

Dev Env:  CCS 6.1.2

Platform:  Custom board with MYIR brand MCC-AM335X-Y board with AM3358, 250MB RAM and other electronics that seem to be working perfectly.

Packages:  SYS/BIOS 6.45.1.29, UIA 2.0.5.50, AM335x PDK 1.0.3

Q1:

Has anyone solved this so they can restart the AM3358, and the TI-RTOS application -- without the long pause of re-starting a whole new debugging session?

Q2:

On starting a new debug session, is my Debugger indeed being reprogrammed?

Kind regards,
Vic

  • I'm not sure about #2, but have you remove the CCS option of running to main? With this turned off, you start at the beginning of the application and can then single step into the program. The option is under CCS->Tools->Debugger Options->Auto Run and Launch Options.

    Todd
  • Hi Vic,

    Victor Wheeler61 said:
    I would like to be able to QUICKLY get back to the beginning of the application again, yet I have not found a way to do so.  This is even more important if, indeed, the Debugger is being reprogrammed(?) every time I start a new debug session.  I see at the top of Code Composer Studio (CCS) that there are several options for resetting: (CPU SW, CPU HW, System Reset, and Emulator) and still another button to "Restart".  However, I have not had any luck in using these with TI-RTOS, because when I walk the application forward, I hit completely different behavior and problems as though something in memory is in fact in a different state than when I restart a whole new debugging session.  So the ONLY thing that has worked reliably (creating the same behavior) is starting a whole new debug session from scratch each time I need to re-start the application.


    The "restart" button *should* be what you need. That action will simply reset the program counter to the application entry point and optionally run to main (or some other specified label) if that option is enabled. Why it is not working for you is odd. The big difference between restarting the application and starting a whole new debug session is that you would reload/execute the GEL startup file (if one is specified). It would run the actions in the StartUp(), OnTargetConnect() and OnFileLoaded() functions in the GEL file (I may be off a bit on some of the exact function names). The latter two are ones with more impact since that often does some target initialization. The most common one is OnTargetConnect(), which is called when the target is connected to the debugger. I'm not sure what actions is in your OnTargetConnect() call where you may need to run again after a restart. But you can try it out by perhaps disconnecting and reconnecting to the target to call OnTargetConnect and then restarting the application.

    Victor Wheeler61 said:
    On starting a new debug session, is my Debugger indeed being reprogrammed?

    There are a few ways to launch a debug session:

    1) Via the "Debug" button, which is the method you are doing

    This will autoconnect to the target and load/flash the program. But for programs in Flash, you can have it just load debug symbols only.

    2) a "project-less" debug sessiom

    See the below link for a video that covers both:

    https://www.youtube.com/watch?v=ycknoB55ytI

    Thanks

    ki

  • Hi, Ki & Todd!

    Thank you for your attention and guidance on this. If I step into my application and then click "Restart", it and indeed it runs and halts at the symbol main. In my own attempts, observing that then continuing to let the program run causes an exception, whereas if I exit the whole debugging session and start a new one, my application runs fine. With ONLY that amount of observation, I came to the (possibly premature) conclusion that there is something different in the environment between clicking "Restart" and starting a whole new debugging session, and didn’t inquire further until my posting yesterday.

    With Ki's encouraging words that the "Restart" button *should* be what I need, I tried it again and looked a little closer. I let my application run for a bit paused it, observed some variables in the BSS section are populated, and then after pressing "Restart", indeed they have been re-initialized to 0.  Yeayy!  Then stepping forward, something in my board initialization steps is causing an exception!

    Investigating this further (with the help or clicking the "Restart" button — thank you, Ki!) and bringing some or the Starterware library code into my project where it is generating the exception, I have uncovered more of what my problem has been:

    In this application, I pulled in some evmAM335x initialization code (I don’t remember which), adapted it to use the source-code output of the PINMUX utility, which is working well.  And just to see if it would work (I have a small I2C EEPROM in my hardware with a different I2C address, which I adjusted), I left this little blurb of code in my board initialization code:

    	/* I2C */
    	if (ret == BOARD_SOK) {
    		if (cfg & BOARD_INIT_I2C) {
    			ret = Board_internalInitI2C();
    			if (ret == BOARD_SOK) {
    				ret = Board_getIDInfo(&bb);
    			}
    		}
    	}
    

    and while EEPROM is empty (it's on a custom board) and it just reads OxFFs, I left this code in there to be used/adapted later because I have several I2C devices that I'm going to need to talk with later.  'bb' is a 'Board_IDInfo' object on the stack.  And I haven't (yet) spent the lime to really understand the I2C driver code. But I thought I could ASSUME that a "Restart" sets up everything as if the application was running anew.  Apparently this is a bad assumption.

    As it turns out:

    ret = Board_internalInitI2C();


    calls XDC Tools Memory_alloc() to allocate space for a MUTEX for the I2C peripheral.   (Up to this point, I have not prepared my application to have a HEAP yet.)  That Memory_alloc()  succeeds on the first pass after freshly loading a new debug session.  But after a "Restart", it fails!  But the above call to Board_internalInitI2C() returns BOARD_SOK (probably a bug if the MUTEX didn't get created), and so the code continues to try to do an I2C transfer which fails on an assertion when the handle to the MUTEX is 0x00000000 instead of a valid pointer.  Okay, no surprise there.

    =-=-=-=

    So this is why I was posting in the TI-RTOS forum, as I was pretty sure it was boiling down (and it is) to this question:  "What in TI-RTOS is not properly reset when the "Reset" button is clicked?

    In theory, I could remove that code from my application and "work around" this, but...

    I THINK THIS GIVES US A UNIQUE opportunity to find out what isn’t actually being “Reset” and fix it (l could be using my “Reset” button very fruitfully, and if I don't actually HANDLE what is causing this, I’m pretty sure it will come back and bite me later, and given that my firmware needs to be ultra-reliable, it is probably a good time to stop and understand what is actually happening here.)

    =-=-=-=

    Towards that end:  Here are screenshots of the TI-RTOS settings related to HEAPS that I could find (I haven't studied HEAPS in the RTOS v6.45 User's Guide yet).

    1 of 3:

    2 of 3:

    3 of 3:

    Does this provide enough information to get to the bottom of this?  In case it helps, I have previously posted a project called "MCC_DEBUG_2_for_sasha" that has that same board init code in it, and will probably run on the  evmAM335x Eval Board up to the point required to demonstrate this problem (though some of the PINMUX pin settings will be wrong):  STEP TWICE (past the UART_printf()) and then "Reset" and then step forward over the board initialization code again and it should generate this error every time.  (For convenience, I have re-tested that project, proven the "Reset" behavior is the same, and have re-posted it here.)

    0486.01_MCC_DEBUG_2_for_sasha.zip

    Is this indeed a bug that something is not being properly reset?  (It causes whatever HEAP serviced that   Memory_alloc()  successfully the first time, wasn't there any more after the "Reset" button was pressed?)

    Is this something I can fix/patch at this end?

    Kind regards,
    Vic

  • Hi! Updated info: for the record, when I commented out the code that initializes I2C, something else is now breaking. Yet the application is runs perfectly when I do a fresh re-start of a new debug session. *sigh* I'm hoping TI-RTOS and the "Restart" button start living in harmony soon. If it is settings that I need to change, it would be very helpful to know which ones.

    Kind regards,
    Vic
  • What happens when you just reload the application (not the whole debug session)? Also, can you open up ROV and do a BIOS->Scan for Errors when you are in main after the restart.

    Todd

  • Hi, Todd!

    Excellent question!

    With the I2C init code not running, after a "Reset", now the break is in the first UART_printf() (which works fine on a fresh debug session):

    This is the contents of the "Scan for errors..." tab after scanning:

    ,ti.sysbios.family.arm.exc.Exception,Module,N/A,exception,An exception has occurred!
    ,ti.sysbios.knl.Task,Detailed,ghUiTaskHandle,stackPeak,Overrun!
    ,ti.sysbios.knl.Task,Detailed,ghUiLedBlinkTask,stackPeak,Overrun!
    ,ti.sysbios.knl.Task,Detailed,ti.sysbios.knl.Task.IdleTask,stackPeak,Overrun!

    Under normal operation (fresh debug session + program re-load), the above tasks have plenty of stack space:

    UiTask: size = 8192, peak = 1584
    UiLedBlink: size = 2048, peak = 480
    Idle: size = 2048, peak = 500.

    It's almost like the CONNECT GEL file (or something that gets similarly involved in the TI-RTOS environment) is doing some initializations that then AREN'T getting done on pressing the "Reset" button. (But this is 100% speculation, could be completely wrong target.)

    VERY INTERESTING!

    When I just re-load the program (w/o starting a new debug session), it runs fine again! Woo hoo! Well, that's one small bit of time savings. So it is something happening at (or just after) program load time that some initialization gets done that isn't getting done after pressing "Reset". (And this is still speculation, and bears proving, but I think this is on the right trail! (I.e. maybe GEL file for real instead of being in the TI-RTOS Startup.c code!)

    Over to you...
    Kind regards,
    Vic
  • I would definitely look in the gel file(s) to see what is being done on the load. Easiest way to look at them is to open Tools->Debugger Options-><any one of them> in the debug perspective when you are connected to the device. In the new window, there is a GEL Files option.

    Todd
  • It's the default Beaglebone-Black GEL file, which also loads another GEL file on startup:

    StartUp()
    {
    GEL_LoadGel("$(GEL_file_dir)/AM335x_PRU_ICSS.gel");
    }

    Looking through it, I only find 3 "functions" that SEEM (to these inexperienced eyes) to be related:

    AM335xStartState()
    {
    CPSR &= ~0x20;
    CPSR = (CPSR & ~0x1F) | 0x13;
    CP15_CONTROL_REGISTER &= ~0x1;
    }

    OnPreFileLoaded()
    {
    AM335xStartState();
    }

    OnRestart()
    {
    AM335xStartState();
    }

    Given that the two "On...()" functions are identical, it certainly is not leading anywhere.

    Any ideas?

    Kind regards,
    Vic
  • Vic,

    I’ve not used an AM355x, but will jump in anyways with an idea…

    I’m thinking that what may be happening is that the reset you are invoking does not actually reset the A8 processor.  So a restart will start the app running again, but interrupts could be enabled and fire during the boot process.  This might explain why stacks are overflowing, because interrupts are taken during boot, when they normally should be held off until the kernel has fully initialized.  

    Working with various devices in the past, there are some emulation implementations where a “reset” in CCS does a full device reset, and in others it is only a partial reset. And sometimes a “system reset” or “board reset” is needed to fully reset the device.  I don’t know about AM355x, but I’m wondering if this might be part of what is going on here.

    Looking at the A8 reference guide, the I and F bits in the CPSR are supposed to be set when the device is indeed reset.  In the GEL code you post I don’t see any setting of these bits.  So if the device is not actually reset, it can be restarted with interrupts enabled.

    I wonder if you can try adding “CPSR |= 0xC0;” to your AM335xStartState() function to see if that makes a difference?  If no difference, can you print out to the console the CPSR value at the end of OnRestart()?

    This could well be a red herring, but wanted to toss out the idea…

    Regards,
    Scott

  • Hi, Scott!

    You may be onto something hot here in terms of an investigatory lead....  This may be a multi-faceted problem here!  I agree:  an unhandled interrupt could explain the stacks being overwritten and everything else for that matter, corrupting data that would have otherwise been correctly initialized (either to a fixed value or to '0' if in the BSS section), which could case just about ANY kind of corrupt behavior, including in "memory allocation" failure.  I added "CPSR |= 0xC0;" to AM335xStartState() anxiously and tried it, but alas, there the exception after the "Restart" and stepping forward is still there.

    Here is the output of CPSR at the end of that function -- I think we're onto an interesting trail here:

    ---- ( without "CPSR |= 0xC0;" )----------------------

    CortxA8: GEL Output: CPSR = [0x40000193]  <- Started new debug session, before program load.
    Program loads here.
    CortxA8: GEL Output: CPSR = [0x40000193]  <- After program load, at the beginning of a new debug session.
    Stepped forward into the program as before.
    CortxA8: GEL Output: CPSR = [0x60000193] <- after clicking the "Restart" button.

    ---- ( with "CPSR |= 0xC0;" )----------------------

    CortxA8: GEL Output: CPSR = [0x400001D3]  <- Started new debug session, before program load.
    Program loads here.
    CortxA8: GEL Output: CPSR = [0x400001D3]  <- After program load, at the beginning of a new debug session.
    Stepped forward into the program as before.
    CortxA8: GEL Output: CPSR = [0x600001D3] <- after clicking the "Restart" button.

    I also verified:  you are quite correct that the "Restart" doesn't actually reset anything -- it leaves all the peripherals running (if they were running before), and leaves the interrupts firing, the interrupt controller with all the previously unmasked interrupts still unmasked.  Apparently the "Restart" merely set the PC to 0!  Wow... that could cause some problems!

    (Note about the 0x6... vs 0x4... in the CPSR register:  The left-most CPSR bits are:   N, Z, C and V, thus the C (carry) bit is set after "Restart", which at a glance (guess on my part) should be irrelevant.  True?)

    (See partial screenshot of Interrupt Controller after clicking "Restart")

    This is showing the following interrupts unmasked:

    #0  (emulation interrupt, understandable since I'm running a debugger/emulator)

    #16  (touchscreen_ADC module in use from a previous session, however, TSC_ADC module is in whatever state it was in before "Restart")

    #36  (LCD controller (from a previous session), LCD controller module is in whatever state it was in before "Restart" ))

    #98 (GPIO1 input interrupt that had been on in a previous session.)

    I haven't created a user ISR for GPIO1 yet, but I don't think this is firing any interrupts yet as I don't have the one INPUT pin connected to anything yet.

    The AM335x TRM shows that after a reset, all the MIR (mask) registers should be 0xFFFFFFFF (all interrupts masked)!  So I am concluding that the "Restart" button didn't actually perform a reset as you suspected.  I further looked at peripherals that had previously been started, and they are also running according to their register values (e.g. the Touchscreen/ADC module had built up 50-60 words in a 128-word FIFO0, the LCD controller was running, if I had previously started it since reset, etc.).

    To explore this further, I added this to the GEL file :

    //*******************************************************************
    //HM: Interrupt Controller Registers
    //*******************************************************************
    #define    INTCPS_BASE_ADDR                                    (0x48200000)
    #define    INTC_SYSCONFIG_ADDR                                 (INTCPS_BASE_ADDR + 0x10)
    #define    INTC_SYSCONFIG_SOFT_RESET_MASK                      (0x00000002)

    and then this to the  AM335xStartState()   function:

      WR_MEM_32(INTC_SYSCONFIG_ADDR, INTC_SYSCONFIG_SOFT_RESET_MASK);

    to perform a soft reset on the Interrupt Controller at Restart.

    And then repeated the test:  fresh load and fresh debug session and program load, then step forward 2 times, then "Restart", then step forward 2 times, and again I got the exception.  Upon looking at the INTC registers, all the interrupts were masked.  So UNHANDLED INTERRUPT does NOT appear to be the cause.  (At least of the IMMEDIATE exception I'm getting.)

    Continuing to investigate:

    One other thing is also providing a direct contradiction to thinking that having peripherals still running and interrupts firing is the source of the problem:  when I do a fresh program load, and the breakpoint stops at   main()  -- guess what:  all the peripherals I started are still running (according to their registers), and interrupt controller still with the prior set of unmasked interrupts still unmasked, yet if I let it run from this point, there are no problems, and the system runs fine.  And while this triggers the  AM335xStartState()  in the GEL file to run, which now writes a 1 to the IRQ & FIQ bits -- it did not previously do this, yet letting the system run from that point encountered no exceptions and no problems and the system ran fine.   :-(   So I think we're CLOSE, but this does not appear to be the cause (or at least not all of it).

    Continuing to gather data:

    If I do a fresh program load and then this test:

    Stepping forward into the program twice makes it step over

    1.  board init code which sets up UART0 as debug output, I2C, etc. and those drivers do a few "Memory_alloc()" calls WHICH WORK immediately after a fresh program load.

    2.  a UART_printf(), which works fine immediately after a fresh program load.

    However, if I just go THAT far and then click "Restart" and then step forward twice again, I get an exception with the following is in my Console window:

    [CortxA8] ti.sysbios.heaps.HeapMem: line 361: out of memory: handle=0x8190b3f8, size=32
    ti.sysbios.heaps.HeapMem: line 361: out of memory: handle=0x8190b3f8, size=32
    Exception occurred in ThreadType_Main.
    Main handle: 0x0.
    Main stack base: 0x8194cee8.
    Main stack size: 0x8000.
    R0 = 0x00000000  R8  = 0x0000000d
    R1 = 0x81954e16  R9  = 0x4030cdf4
    R2 = 0x00000001  R10 = 0x00029940
    R3 = 0x0000000a  R11 = 0x81954e3c
    R4 = 0x814bb921  R12 = 0x81954e40
    R5 = 0x8194cddc  SP(R13) = 0x814a9a5c
    R6 = 0x814bb94d  LR(R14) = 0x8191957c
    R7 = 0x0000002d  PC(R15) = 0x814a9a5c
    PSR = 0x81954e3c
    DFSR = 0x00000008  IFSR = 0x00000000
    DFAR = 0x00000000  IFAR = 0x00000000
    ti.sysbios.family.arm.exc.Exception: line 205: E_dataAbort: pc = 0x814a9a5c, lr = 0x8191957c.
    xdc.runtime.Error.raise: terminating execution

    =-=-=-=

    Tracing this down:

    I pulled directly into my project the library source code for where the exception was happening:  UART_drv.c   and UART_v1.c.

    I did the above test again, and set a breakpoint at the  UART_write function and guess what I found:

    int32_t UART_write(UART_Handle handle, const void *buffer, size_t size)
    {
        return (handle->fxnTablePtr->writeFxn(handle, buffer, size));
    }
    

    and after a fresh program load, when it gets here, the 'handle' argument is a valid pointer, but after a "Restart", and it gets here, the value for the 'handle' argument is 0x00000000!

    And why is it 0?  Because the Memory_alloc() done by the call to

    Board_uartStdioInit();

    in my board init code FAILED, and so there was no handle for whatever object was supposed to have been created in that call.

    We're back to the same cause of the I2C code init failing:  Memory_alloc() failing.  And alas, the first lines of the Console output:

    [CortxA8] ti.sysbios.heaps.HeapMem: line 361: out of memory: handle=0x8190b3f8, size=32
    ti.sysbios.heaps.HeapMem: line 361: out of memory: handle=0x8190b3f8, size=32

    I was REALLY hoping an unhandled interrupt would have caused this, but I THINK the above (setting IRQ/FIQ CPSR status bits to 0, soft resetting the INTC (which re-masked all the interrupts), plus the evidence with the fresh program reload working, but prior to an hour or two ago, it did not mask interrupts or turn off any of the peripherals) eliminates that as the cause....

    We're back to SOMETHING in the fresh program re-load causes an XDCtools   Memory_alloc() to WORK correctly the first time, but to FAIL after a "Restart".  I think this holds the key that will unlock the door here.

    So we're back to the question of:  what is different after a "Restart"?

    =-=-=-=

    A  CPU Reset (SW) seems to run into the same or similar problem:

    System_printf:  01_MCC_DEBUG_2 app started...
    Exception occurred in ThreadType_Swi.
    Swi handle: 0x8190b154.
    Swi stack base: 0x8194d6e8.
    Swi stack size: 0x8000.
    R0 = 0x00000000  R8  = 0xffffffff
    R1 = 0x8190959c  R9  = 0xffffffff
    R2 = 0x01d5f880  R10 = 0xffffffff
    R3 = 0x00086470  R11 = 0xffffffff
    R4 = 0x8190b910  R12 = 0x8194d6e8
    R5 = 0x51f4d5c0  SP(R13) = 0x00000000
    R6 = 0x00000003  LR(R14) = 0x81919d7c
    R7 = 0x81909654  PC(R15) = 0x00000000
    PSR = 0xffffffff
    DFSR = 0x00000008  IFSR = 0x00000008
    DFAR = 0x00000000  IFAR = 0x00000000
    ti.sysbios.family.arm.exc.Exception: line 201: E_prefetchAbort: pc = 0x00000000, lr = 0x81919d7c.
    xdc.runtime.Error.raise: terminating execution

    A fresh program load after this causes the system to run just fine.  Then I pause....

    =-=-=-=

    A CPU Reset (HW)

    Causes this message:

    "No source available for "0x2086c", and STEPPING or RUNNING doesn't seem to do anything.  I still get that message.

    The only thing in the MAP file that is close to that address is:

     .debug_info    0x0001aace    0x1ead5 E:\Dev\Clients\HM\Dash_2\fw\App\01_MCC_DEBUG_2\Debug\configPkg\package\cfg\app_pa8fg.oa8fg

     .debug_line    0x00020478      0x4d2 C:\ti\uia_2_00_05_50\uia_2_00_05_50\packages\ti\uia\runtime\lib\release\ti.uia.runtime.aa8fg(QueueDescriptor.oa8fg)
     .debug_line    0x0002094a      0x30f C:\ti\bios_6_45_01_29\packages\gnu\targets\arm\rtsv7A\lib\gnu.targets.arm.rtsv7A.aa8fg(Assert.oa8fg)

     .debug_loc     0x00020355     0x2bd5 C:/ti/bios_6_45_01_29/packages/gnu/targets/arm/libs/install-native/arm-none-eabi/lib/fpu\libc.a(lib_a-vfprintf.o)

    A fresh program load after this causes the system to run just fine.  Then I pause....

    =-=-=-=

    A System Reset

    seems to send it off into "never never land":  "No source available for "0x233be" and showing the disassembly and attempting to step through it causes this message in the Console window:

    CortxA8: Can't Single Step Target Program: (Error -1205 @ 0x233BE) Device memory bus has an error and may be hung. Verify that the memory address is in valid memory. If error persists, confirm configuration, power-cycle board, and/or try more reliable JTAG settings (e.g. lower TCLK). (Emulation package 6.0.407.3)

    =-=-=-=

    So I'm back at hoping I can somehow discover what causes a Memory_alloc() to succeed after a fresh program load, and fail after a "Restart"....

    Kind regards,
    Vic

  • ...next step: checking out the Memory_alloc() variables to see what values they hold in both passes....
  • I never use restart by itself. I always do a "reset" (depending on the device a system or CPU) and then "restart".

    (fyi...I accidently hit "Verify Answer" on your post instead of "Reply"...I then "Rejected Answer"...so you might hav gotten some emails from that oops moment).

    Todd
  • Good morning, Todd!  No worries.

    When I woke up this morning, my mind kept going to the question:  "What is different after a "Restart", and the big elephant in the room is that "The SoC wasn't actually RESET!".  So even with no interrupts, there are likely dozens of other things going on....

    Strange as it may seem, I never tried the combination of a reset followed by "Restart", so I did some tests.  Each of these tests involves re-loading the program, letting it run (proving that the system is running without errors), and then clicking the PAUSE button, and then:

    =-=-=-=

    TEST  1A.  [CPU Reset (HW)] followed by a "Restart" and it ran to the breakpoint at  main(),  but on letting it run, it ran into yet another exception with this little message in red in my Console window:

    "CortxA8: SemiHosting : Read Failure : Memory at Address: fffffe18 Length: c"

    TEST  1B.  Same thing, but instead of letting it run, I clicked STEP OVER and the arrow next to  main()  didn't change on the first click, and on the 2nd, the program is at _exit() function (where it goes after an exception) with NO MESSAGES in the console.

    >>> Note carefully:  the different behavior in letting it run vs stepping indicates a timing difference with something that might be "running" in the background.

    >>> Something else interesting:  when I just do a "program re-load" as opposed to starting a fresh debug session, the first image that appears on the LCD panel is shifted about 20-25 pixels to the right, and then it shifts left after about 150 ms.  Probably a phenomenon caused by the LCD controller (and its DMA and various synchronization matters with the LCD panel).  A fresh debug session does not do this.

    =-=-=-=

    TEST  2A.  [CPU Reset (SW)] followed by a "Restart" and it ran to the breakpoint at  main(),  but on letting it run winds it up at _exit() function (where it goes after an exception) with NO MESSAGES in the console.

    TEST  2B.  Same thing, but instead of letting it run, I clicked STEP OVER and the arrow next to  main()  moved to the first line in main (about to call the board init code), and then STEP OVER again puts me at _exit() again with no messages in the console.

    A peek at the registers after the [CPU Reset (SW)] and [CPU Reset (HW)] showed that in BOTH CASES the LCD controller and TSC_ADC modules are still active!  (And not surprisingly, the last image is still showing on the LCD panel.)  So in fact, I STILL wasn't really resetting the SoC, and so that remains to be an elephant in the room....

    =-=-=-=

    TEST  3.  [System Reset]  appears to definitely reset the SoC (the debug UART is showing "CCCCC" indicating it is alive and looking for boot input from UART0).  I have a little quirk, however, in that my AM3358 is sitting on a MYIR-brand MCC-AM335x-Y board, which is then a component atop my development board.  That MCC-AM335x-Y board has an external watchdog chip (CAT823TTDI-GT3) that -- in the AM3358's reset state -- resets the AM3358 every 2.5 seconds.  So my GEL file has to (and does early in the OnTargetConnect() and OnReset()) turn off the pull-up resistor in the EMU1 pin in order to disable that chip's watchdog function.  However, the "OnReset()" function does not appear to be run after a [System Reset] (though it definitely runs after the resets in tests 1 and 2 above).  :-(  The result is that debugger shows the Cortex-A8 going around and around being reset by the external watchdog chip.  RARELY (like 1 out of 20 tries), I can manually do a [Connect Target] and get the Cortex-A8 back under control again.  However, I'm not getting the hoped-for results:  letting it run winds it up at the top of this function in  <pjt>/Debug/configPkg/package/cfg/app_pa8fg.c:

    xdc_Void xdc_runtime_Startup_exec__I(void)
    {
        xdc_Int state[15];
        xdc_runtime_Startup_startModsFxn__C(state, 15);
    }

    And if I STEP OVER it stops on that 2nd line, and STEP OVER again puts the PC back at the top of the function again with this new message in the console window:

    ti.sysbios.heaps.HeapMem: line 361: out of memory: handle=0x8190bffc, size=28
    xdc.runtime.Error.raise: terminating execution

    And letting it run at this point produces the same result:  it winds up halted at the top of that function.

    Yet another elephant in the room is this:

    During this series of tests I have tried both fresh debug sessions and simply doing a PROGRAM RE-LOAD, and in each case of the latter alone, this also made the system run without errors, and THAT wasn't resetting the SoC (peripherals still running and prior to our GEL mods last night, interrupts were still firing) and yet no errors or exceptions occurred....

    I'm going to continue to pull the string on the  Memory_alloc()  and see if it yields us any "guiding fruit"....

    Kind regards,
    Vic

  • I think we're getting warmer, and I think this problem is multi-facetted.  In the above tests I had [Disable interrupts [x] When source stepping] (part of different behavior while stepping), and on a hope I turned this on:  [Connect options [x] Reset the target on a connect].

    Unfortunately, part of my development platform has an external watchdog chip that I have to turn off after a reset.  I turn it off in GEL script in the OnTargetConnect()  function by turning off the pull-up resistor which is active in the EMU1 pin's reset state.  However, doing this in the  OnReset()  function isn't working since this doesn't appear to be being run after a [System Reset].  Furthermore, the external watchdog chip, left "unquenched and un-petted" resets the AM3358 every 2.5 seconds.  And getting a CONNECT to work by mouse in this situation is difficult.  So I tried doing it with the keyboard:  Ctrl-Shift-S to [System Reset] followed by repeating Ctrl-Alt-C until the connect works, and it turns off the EMU1 pull-up resistor and quiets that watchdog chip, and hopefully resets the AM3358 in the process.

    However, interestingly, letting it run winds me up at the top of this function in the generated  app_pa8fg.c file:

    xdc_Void xdc_runtime_Startup_exec__I(void)
    {
        xdc_Int state[15];
        xdc_runtime_Startup_startModsFxn__C(state, 15);
    }
    

    which is part of what gets executed at the beginning of BIOS_start().  And whether I STEP OVER or let it run, it winds the PC back up at the top of this function with this new message in the Console window:

    ti.sysbios.heaps.HeapMem: line 361: out of memory: handle=0x8190bffc, size=28
    xdc.runtime.Error.raise: terminating execution

    Stepping into this to try to ferret out where the exception is coming from, even though the  Task.c  code seems to be optimized, it appears to be failing on this function  Memory_calloc(), where an attempt to STEP INTO doesn't work and puts the PC back at the top of the above function again!

    Next step:  simple test:  fresh debug session followed by  STEP OVER, STEP OVER, followed by "Restart" followed by STEP OVER, STEP INTO to see why (if possible) the Memory_alloc() function is failing in UART_open_v1().....

  • Well, I've gotten to the bottom of it, or at least extremely close to that. I FOR SURE found out why my environment and the TI-RTOS and the Restart button are NOT getting along. I'm testing some solutions and will post the complete results as soon as they are proven. :-)
  • Okay, Todd, here is what I know at this point.

    1.  In my environment where I have an AM335x (microPROCESSOR as opposed to a microCONTROLLER), which typically loads all if its program AND data into RAM before executing from RAM, this causes a linker situation whereby these symbols defined in the linker script have the same value:

         __data_load__ == __data_start__.

    2.  With a TI-RTOS application, after a FRESH DEBUG SESSION is started, or after a FRESH PROGRAM RELOAD or after a RESTART, the program counter (PC) is set to (or branches to) _c_int00  (an assembly routine in  <bios>/packages/gnu/targets/arm/rtsv7A/boot.asm), which at its tend branches to the  gnu_targets_arm_rtsv7A_startupC  function in  <bios>/packages/gnu/targets/arm/rtsv7A/startup.c, which carries out 2 loops to initialize RAM (BSS section and DATA section) before calling Startup_exec().

    The loop that initializes the DATA section of the program initializes with the same data as is expressed in initializers in source code (example:   int  my_global = 255;).  This includes all program sections named ".data" -- which includes all global and static variables that have initializers -- and any sections that start with ".data" -- which includes all the TI-RTOS  ...__state__V  struct variables, which also have initializers.  This is KEY to the "Restart" failures as you will see shortly.

    The source for that loop is this (as of SYSBIOS 6.45.01.29):

    	/* relocate the .data section */
    	dl = & __data_load__;
    	ds = & __data_start__;
    	de = & __data_end__;
    	if (dl != ds) {
    		while (ds < de) {
    			*ds = *dl;
    			dl++;
    			ds++;
    		}
    	}
    

    Pay careful attention to what happens when  __data_load__ == __data_start__!  The loop doesn't run, understandably, because it would just be copying a block of RAM into itself, which would accomplish nothing except occupying the CPU for no change in the system.  So it isn't done.

    So when there is a NEW DEBUG SESSION or a FRESH PROGRAM LOAD, with  __data_load__ == __data_start__,  the linker itself has initialized this area of RAM and the FRESH PROGRAM LOAD writes this data in its pristine, initialized state, where each variable, and each struct, contains the value(s) of its initializer.

    Next, the program is executed.  Next the program is PAUSED under the debugger.  Next the "Restart" button is pushed.  The PC goes back to  _c_int00  routine, and comes forward and THAT LOOP doesn't execute, and the once initialized variables are now IN THE STATE THEY WERE IN WHEN THE DEBUGGER WAS PAUSED, and the program will enter  main()  with these variables in that state.

    Guess what:

    There are at least 22 TI-RTOS state struct variables (ending in ...__state__V) that are in this situation.  Only one of them is  'xdc_runtime_Startup_Module__state__V'.  And that variable has a field  'execFlag'  that after a FRESH PROGRAM (RE)LOAD, contained the value 0, and now (after TI-RTOS initialized once), now contains the value 1.

    So the next step in the start-up process is calling  xdc_runtime_Startup_exec__E().  This (through a function pointer) ends up in the  Startup_exec()  function in  <xdctools>\packages\xdc\runtime\Startup.c,  the code to which is:

    /*
     *  ======== Startup_exec ========
     */
    Void Startup_exec()
    {
        Int i;
    
        if (module->execFlag) {  // <-- 'module' is a pointer to the xdc_runtime_Startup_Module__state__V struct,
            return;              //     so the function exits here, and all of the below does not execute!
        }
    
        module->execFlag = TRUE;
    
        for (i = 0; i < Startup_firstFxns.length; i++) {
            Startup_firstFxns.elem[i]();
        }
    
        (Startup_execImpl)();
    
        for (i = 0; i < Startup_lastFxns.length; i++) {
            Startup_lastFxns.elem[i]();
        }
    }
    

    Therefore:

    1.  All the "firstFxns" do not run (HeapMem_init() is one of these -- this is what causes Memory_alloc() and Memory_calloc() to fail, which further down the line caused NULL pointers to be accessed as objects, or wisely-placed assertions to fail, generating exception messages and terminating the program).  This list can contain custom functions, as it does in my case, required to ensure DMTIMER3 (my source for my application's Clock module) has the correct input clock source.

    2.  The Startup_startMods()  function doesn't run (most TI-RTOS system modules require this).

    3.  All of the "lastFxns" do not run (which list can contain custom functions).

    =-=-=-=

    So there you have it.  Either by design, or by oversight, TI-RTOS was not DESIGNED to work with the CCS "Restart" button where __data_load__ == __data_start__.  Although a fix to this is not difficult to implement, nor would it be difficult to implement in an elegant way in the TI-RTOS design.

    =-=-=-=

    I'm including (attached here) some source code that has been tested and makes the CCS "Restart" button work reliably under TI-RTOS.

    2330.RestartAssistantSource.zip

    Note that I have in my  main()  routine, before calling  BIOS_start(), I call this:

    	/* Report on Restart Assistance data as soon as UART0 is ready for reporting. */
    	RestartAssistant_ReportImportantDataBufferSizeDifferences();
    

    and it reports:

    INFO:  Size of DATA section storage array is larger than needed at [65536],
            whereas size of DATA section is [7116].

    Note that in that code, SOLUTION NUMBER 2 is not implemented and has not been tested, but is a suggestion as to how the TI-RTOS team could remedy this problem for good by implementing this design concept:

    Instead of:

          type_specifier  module_xyz__state__V = {struct initializer values};
    
    

    It could be:

          type_specifier  module_xyz__state__V;
    
          type_specifier  module_xyz__state__V__initial_values = {struct initializer values};
    
          // then at start-up time, regardless of whether the program was freshly loaded or not:
    
          module_xyz__state__V = module_xyz__state__V__initial_values;   // for each state struct
    
    
    

    And I cannot attest (yet) whether those structs ending in ...__state__V   are the ONLY  ones that would need to be treated this way, but this could easily be determined by the TI-RTOS team.

    I hope this helps a lot of people.

    I know it's going to allow me to save a TON of time (because my program is LARGE and requires quite a lag to re-load it) -- since I can now use my CCS "Restart" button and have it work as it was intended!

    Kind regards,
    Vic

  • Hi, Todd!

    I'm excited that I'm now able to use the "Reset" button with my TI-RTOS application and it works like a champ now! Did you see the solution?

    Kind regards,
    Vic
  • Victor Wheeler61 said:
    Hi, Todd!

    I'm excited that I'm now able to use the "Reset" button with my TI-RTOS application and it works like a champ now! Did you see the solution?

    Hi Vic,

    I saw your new "solution" thread for this issue, along with your assessment that this thread was dead, so I wanted to advise you that Todd has not responded because he is out of the office this week, and I don't know if he is able to access his emails or the Forum.  I will leave it to him to comment because he is already up to speed on the thread and would be the best one to evaluate your solution.

    Thank you very much for your very thorough explanation of the issue and your solution. Your desire to help the community in general, and specifically other folks who might be having the same problem, is very noble and generous.  TI as a whole very much appreciates this sort of contribution to the Forum, as community contributions and solutions is one of the primary goals of the TI E2E Forums.

    Thanks & Regards,

    - Rob

  • Outstanding! :-)
    Thanks, Rob!