This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

UCD3138: Background loop is not responding, but time interrupt is fine

Part Number: UCD3138

Hi experts,

 - I encounter an issue that pmbus is not responding after running the application for few days. Since pmbus_handler() is in the background loop, so I add a DEBUG_PIN toggle in the background loop and find DEBUG_PIN stop toggle when pmbus is not responding.

 - Meanwhile, when pmbus is not responding (and background loop DEBUG_PIN stop toggle), functions within timer interrupt is still working.

 - I also tried to add DEBUG_PIN toggle within all while loop, but DEBUG_PIN is not toggle when issue occur. (I guess this means background loop is not stuck in the while loop.)

 - Is there any thing that may cause this issue, the background loop stopped after running for few days?

 - It's really weird. Any suggestion would be very helpful. Thanks in advance.

  • Hi, Jack,

    A quick question is: Did you test your firmware or TI EVM code?

    Thanks,

    Sean

  • Jack, this doesn't happen very often, but there is a fix for it in the PFC EVM code.

    I'll give the fix first, and then explain the complex background

    There are 3 abort handlers which handle different exceptions:

    - undefined_instruction_exception

    - abort_prefetch_exception

    - abort_data_fetch_exception

    The simplest thing to do is to add a software reset to each one like this:

    void undefined_instruction_exception(void)
    {
             SysRegs.SYSECR.bit.RESET = 2;
    }

    This should solve your issue. 

    If you want to go further and make your noise caused reset more reliable, and even debug your code, you can look at how the PFC EVM handles this issue.  I describe that after the explanation below.

    What is happening is that the background loop is getting lost.  Every time we have seen it, it is because of layout/noise issues causing the UCD to get lost.  

    Theoretically it could also be caused by the program having a bug, but I don't think we've ever seen that.

    If there is a single fault caused by noise or a bug, the ARM7 will generally detect an illegal memory access or illegal instruction fault.  When this happens, it will go to the fault handler, which will try to push things on the stack.  Most of our codes do not initialize these stack pointers, so generally they are pointing at an illegal address.  With 2 faults like that, the UCD seems to reset automatically. 

    Specific UCD parts tend to have the same stack pointer come up every time, and a few will have a legal address.  These are the ones that will display this issue.

    Also to have this issue, the noise and the individual part's sensitivity to it have to be just right so that the noise only causes one fault.  Most often, there are more faults, so again, instant reset.  

    This is background to to explain why you don't see this very often.  

    Next for an explanation of why this happens:

    The ARM7 was originally designed for use in PCs.  The idea behind the illegal memory access and instruction aborts was that the operating system could chose to fix the issue and then return to the same location where the abort occurred.  So the exceptions return to the address at which the abort was found.  This means that they just keep returning to the instruction and getting another abort.  So the background loop just gets stuck.  

    However, interrupts can occur when the abort returns to the background loop, so they keep running.  

    The way to fix it is to make sure that the abort resets the processor.

    If you also put in a valid stack pointer address for all the aborts.

    This makes sure that an intermittent fault also resets the processor.  It's possible, even likely, that other things may be corrupted when an intermittent fault occurs.  Even if that specific fault goes away, it's a good idea to reset the processor to avoid damaging the power supply.  

    For the stack pointer setting, you need to find load.asm.  In the older codes it is in the main source directory.  In the newer codes, which support multiple processors, there is a unique load.asm for each UCD part number.  It will be in Device/UCD3138/Source, for example.

    You need to add ABT_STACK_TOP to the .equ statements at the top of the file:

    SUP_STACK_TOP .equ 0x19ffc ;Supervisor mode (SWI stack) starts at top of memory
    FIQ_STACK_TOP .equ 0x19e00 ;allocate 256 bytes to supervisor stack, then do FIQ stack
    IRQ_STACK_TOP .equ 0x19d00 ;allocate 256 bytes to fiq stack, then start irq stack
    ABT_STACK_TOP .equ 0x19b50 ;Allocate 432 bytes to irq stack
    UND_STACK_TOP .equ 0x19b50 ;allocate 80 bytes to abt stack
    USER_STACK_TOP .equ 0x19b00 ;allocate 80 bytes to und stack, regular stack gets rest, down to variables

    then you need to add initializing the 2 abort stacks.  It's OK to have them all at the same address, since you will use only one at a time:

    ;*------------------------------------------------------
    ;* SET TO IRQ MODE, init IRQ stack
    ;*------------------------------------------------------
    MRS r0, cpsr
    BIC r0, r0, #0x1F ; CLEAR MODES
    ORR r0, r0, #0x12 ; SET IRQ MODE
    MSR cpsr_cf, r0

    LDR R13, c_irq_stack_top ; initialize stack pointer

    ;*------------------------------------------------------
    ;* SET TO ABORT MODE, init ABORT stack
    ;*------------------------------------------------------
    MRS r0, cpsr
    BIC r0, r0, #0x1F ; CLEAR MODES
    ORR r0, r0, #0x17 ; SET ABORT MODE
    MSR cpsr_cf, r0

    LDR R13, c_abt_stack_top ; initialize stack pointer
    ;*------------------------------------------------------
    ;* SET TO UNDEFINED MODE, init UNDEFINED stack
    ;*------------------------------------------------------
    MRS r0, cpsr
    BIC r0, r0, #0x1F ; CLEAR MODES
    ORR r0, r0, #0x1B ; SET UNDEFINED MODE
    MSR cpsr_cf, r0

    LDR R13, c_und_stack_top ; initialize stack pointer

    ;*------------------------------------------------------
    ;* SET TO USER MODE, init user stack

    You'll also need to add the commands to put the addresses into flash for the processor to read:

    c_sup_stack_top .long SUP_STACK_TOP
    c_abt_stack_top .long ABT_STACK_TOP
    c_und_stack_top .long UND_STACK_TOP
    c_user_stack_top .long USER_STACK_TOP

    If you set up the stack pointers this way, you should get more reliable reset for all chips, not just the few who have stack pointers that come up in a legal address.  

  • Thanks for the explanation. Your answer is really helpful. I'll add the stack pointer and put a DEBUG_PIN in the exception handler to check. Thank you very much.

  • - undefined_instruction_exception

    - abort_prefetch_exception

    - abort_data_fetch_exception

    Hi Ian,

     - I want to add a DEBUG_PIN toggle in the exception handler and test what exception am I encountered. But first, I need to make sure that the adding of exception handler codes are correct.

     1) I tried to add codes for exception stacks and handlers. But in the first time, I encounter a ROM mode issue (UCD not reset). Then I found there is an exception trigger for UCD to reset. 

            func_ptr=(FUNC_PTR)0x10000;
            func_ptr();

     2) So I comment these 2 lines, and replaced with software reset.

            // func_ptr=(FUNC_PTR)0x10000;
            // func_ptr();
            SysRegs.SYSECR.bit.RESET = 0;
     
     3) However, UCD still does not reset after ROM mode. Then I modified zoiw_size to 0x100. It works. ROM mode is ok now.
            // for(counter=0; counter < zoiw_size; counter++)  //Copy program from PFLASH to RAM
            for(counter=0; counter < 0x100; counter++)  //Copy program from PFLASH to RAM
            {
              *(program_index++)=*(source_index++);
            }
     4) Then I removed software reset, and add back exception trigger. ROM mode is fine.
            // SysRegs.SYSECR.bit.RESET = 0;
            func_ptr=(FUNC_PTR)0x10000;
            func_ptr();
      5) Now, I add a DEBUG_PIN toggle in all the 3 exception handlers for checking the exception handler is functional. However, DEBUG_PIN is not toggle when I applied the ROM mode command.
           DEBUG_PIN = !DEBUG_PIN;
           SysRegs.SYSECR.bit.RESET = 0; // Software reset
     - Do you have any idea why DEBUG_PIN is not toggle? Does UCD really reset after ROM mode command by exception triggered?
     - Or, do you suggest any way for verifying the exception handler? So that I can see that I do implement the exception codes correct.
    Thanks you very much.
  • I would expect the pin to toggle based on what you say.  

    I think two or three things could prevent you from seeing it:

    1. The reset right after the toggle of the debug pin will probably cause the debug pin to go to either ground or to high impedance right away maybe 400 ns after the toggle, so you might miss it, or you may be pulling it to ground anyway, in which case you won't see it.  I'd suggest toggling it several times before the reset.

    2.  It's possible that there is also a reset happening in the zero out integrity word, so you might want to look there

    3.  There may be something else resetting it before it gets to the toggle.

    You could always try changing the 0x100 in the counter back to a 500, and that should cause a reset.  

    You can also read from the SYSESR register in ROM mode after the reset.  If there was an exception, one of the exception bits should be set.  

  • Hi Ian,

     - Thanks for the suggestion.

     - I changed zoiw_size to 500 and actually triggered an illegal memory access (screenshot below). However, which exception would be trigger in this case? Because I still didn't see DEBUG_PIN toggle.

    Or

     - How can I trigger the 3 exceptions? So that I can make sure DEBUG_PIN will be working next time I encounter an exception.

    Ps. The 3 exception handlers are coded below. Is it ok to you?

    #pragma INTERRUPT(abort_data_fetch_exceptionDABT)
    void abort_data_fetch_exception(void)
    {
        int i, j;

        for (j = 0; j <= 5; j++)
        {
          DEBUG_PIN = !DEBUG_PIN;
          for(i = 0; i < 10000; i++){
            asm(" nop ");
          }
        }

        SysRegs.SYSECR.bit.RESET = 2;  // SOfrw
    }
    ** Update 2021/11/17 **
     - I am sorry. I think it is the for-loop of "nop" that cause ROM mode failure and the missing of DEBUG_PIN toggle.
     - After removing the for loop, leave only DEBUG_PIN toggle. It works fine now. I can see DEBUG_PIN toggle very clearly, and everything works fine.
     - Screen shot below.
    Exception handler:
    #pragma INTERRUPT(abort_data_fetch_exceptionDABT)
    void abort_data_fetch_exception(void)
    {
          DEBUG_PIN = !DEBUG_PIN;
          DEBUG_PIN = !DEBUG_PIN;
          DEBUG_PIN = !DEBUG_PIN;
          DEBUG_PIN = !DEBUG_PIN;
          DEBUG_PIN = !DEBUG_PIN;
          DEBUG_PIN = !DEBUG_PIN;

        SysRegs.SYSECR.bit.RESET = 2;  // SOfrw
    }
  • Great.  I'll close this thread, and you can test to see if you still get your situation with the background loop crashed and the timer interrupt still running.  If it happens, please start another thread.  I don't expect it to happen.