This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/PROCESSOR-SDK-AM437X: NDK thread abort exception

Part Number: PROCESSOR-SDK-AM437X
Other Parts Discussed in Thread: SYSBIOS

Tool/software: TI-RTOS

Hi,

In some conditions, the rtos raises the abort exception related to the NDK thread.

Exception occurred in ThreadType_Task.
Task handle: 0x80061eec.
Task stack base: 0x800eaab0.
Task stack size: 0x2000.
R0 = 0x6000015f  R8  = 0x800b2d84
R1 = 0x80065728  R9  = 0x800903a2
R2 = 0x00000000  R10 = 0x00000080
R3 = 0x0000003d  R11 = 0x0000002a
R4 = 0x80065728  R12 = 0x4b5a6978
R5 = 0x8018f0dc  SP(R13) = 0x8003c67c
R6 = 0x00000000  LR(R14) = 0x800ee964
R7 = 0x8018f018  PC(R15) = 0x8003c67c
PSR = 0x0000002a
DFSR = 0x00000805  IFSR = 0x0000000f
DFAR = 0x00000008  IFAR = 0x833cfc91
ti.sysbios.family.arm.exc.Exception: line 205: E_dataAbort: pc = 0x8003c67c, lr = 0x800ee964.
xdc.runtime.Error.raise: terminating execution

Some times I´ve go this error too:
01037.645 NIMUReceivePacket: Bad Size; Payload is 0 bytes
Exception occurred in ThreadType_Task.
Task handle: 0x8006205c.
Task stack base: 0x800eac70.
Task stack size: 0x2000.
R0 = 0x6000015f  R8  = 0x80063db8
R1 = 0x80070f60  R9  = 0x8018eff0
R2 = 0x00000000  R10 = 0x8018eff0
R3 = 0x0000005e  R11 = 0x800ecbbc
R4 = 0x80070f60  R12 = 0x4b5a6978
R5 = 0x8018f29c  SP(R13) = 0x8003c678
R6 = 0x00000000  LR(R14) = 0x800eeb24
R7 = 0x8018f32c  PC(R15) = 0x8003c678
PSR = 0x800ecbbc
DFSR = 0x00000805  IFSR = 0x0000040f
DFAR = 0x00000008  IFAR = 0x833cff91
ti.sysbios.family.arm.exc.Exception: line 205: E_dataAbort: pc = 0x8003c678, lr = 0x800eeb24.
xdc.runtime.Error.raise: terminating execution

On my test environment, the eth0 of an AM437x Starter Kit board and a PC are connected to a switch at 100Mbps full duplex. The PC sends udp datagrams of 1280 bytes every 15 ms (average).

I believe an error on the reception or a peak of datagrams is causing the problem.

The problem is that an neither of those events shall cause the software to abort.

Installed itens:

CCS 7.2
GCC ARM Compiler 4.9.3
processor_sdk_rtos_am437x  4.00.00.04
am437x PDK v1.0.7
bios 6.46.05.55
xdctools 3.32.02.25_core
Board: AM437x Starter Kit

regards,
Marcio.

  • Hi Marcio,

    Have you tried to launch ROV to see any useful info for narrowing down the exception cause? Also, does the PC pointer e.g. pc = 0x8003c67c give some hint about the exception? You might be able to find the function from the memory map or step into disassembly.

    Regards,
    Garrett
  • Hi Garrett,

    I used ROV to check the task handle (NDK thread) only.

    Unfortunately I did not check the PC when the fault has happened, but inspecting the map file I can see that the PC ( pc = 0x8003c67c) points to pbm.oa9fg

    .text.PBM_close
    0x8003c63c 0x14 C:\ti\ndk_2_25_01_11\packages\ti\ndk\stack\lib\stk.aa9fg(pbm.oa9fg)
    0x8003c63c PBM_close
    .text.PBMQ_enq
    0x8003c650 0x84 C:\ti\ndk_2_25_01_11\packages\ti\ndk\stack\lib\stk.aa9fg(pbm.oa9fg)
    0x8003c650 PBMQ_enq
    .text.PBM_open
    0x8003c6d4 0x9c C:\ti\ndk_2_25_01_11\packages\ti\ndk\stack\lib\stk.aa9fg(pbm.oa9fg)
    0x8003c6d4 PBM_open


    I´ll try to reproduce the error and get more information.

    Thanks,
    Marcio.
  • Marcio,

    OK, also lr register (0x800ee964) is an interesting one as well and may help narrow down the problem.

    Regards,
    Garrett
  • Ok. Thank you.

    It seems the LR register points to ti_sysbios_family_arm_exc_Exception_Module_State_0_excStack_0__A

    COMMON 0x800edb70 0xa1410 C:\ALSTOM\CED2\Workspace\ced2_skAM437x_17\Debug\configPkg\package\cfg\startup_pa9fg.oa9fg
    0x800edb70 ti_sysbios_knl_Task_Instance_State_0_hookEnv__A
    0x800edb78 ti_sysbios_family_arm_exc_Exception_Module_State_0_excStack_0__A
    0x800eeb78 xdc_runtime_SysMin_Module_State_0_outbuf__A
    0x800eef78 ti_sysbios_knl_Task_Instance_State_1_hookEnv__A
  • Thanks Garrett,


    It will help for sure.

    My software already dumps the exception report to the uart.

    I forgot to say that the fault has happened with the release version, running without the emulator and boot from a SD card.

    I really believe that there is an unhandled condition on the NDK.

    regards,

    Marcio

  • Hi Garrett,

    I´ve managed to reproduce the problem, sending a 256 udp packet in less than 10 microseconds.

    Attached are screenshots showing the Debug and ROV views. Everything points to NIMU_pktService in cpsw_nimu_eth.c.

    Edited 1:

    After a quick analysis:

    Function EMAC_rxPacket (in emac_drv_v4.c) receives a packet descriptor (p_cslPkt_desc) and passes it to NIMU, calling nimu_rx_pkt_cb (in cpsw_nimu_eth.c).

    I beleive some other function will use the buffer within the descriptor (field AppPrivate) and releases it.

    Function EMAC_rxPacket then calls nimu_alloc_pkt (in cpsw_nimu_eth.c) to allocate a new buffer, which is copied to p_cslPkt_desc which is the return parameter of EMAC_rxPacket.

    If nimu_alloc_pkt fails to allocate a buffer, p_new_pkt_desc will be null, p_cslPkt_desc will point to the same buffer.

    In some point in time, the EMAC_rxPacket will receive again the same descriptor and will pass it to nimu_rx_pkt_cb, that will cause an attempt to release the same buffer again and an error, which I am not sure its is handled.

    Edited 2:

    I´ve changed EMAC_rxPacket to the following. It seems the problem went away. It is not a good solution because is still loses packets, but at least the program is not crashing.

    static EMAC_PKT_DESC_T* EMAC_rxPacket(Handle      hApplication, EMAC_PKT_DESC_T*   p_cslPkt_desc)
    {
        EMAC_PKT_DESC_T*    p_new_pkt_desc;
        uint32_t from_port = 0U;
    
        from_port = ((p_cslPkt_desc->Flags & EMAC_PKT_FLAG_FROM_PORT_MASK) >> EMAC_PKT_FLAG_FROM_PORT_SHIFT) -1U;
        if( from_port < EMAC_NUM_MAX_MAC_PORTS )
        {
    // MB: 2018-01-29
    #if 0
            EMAC_RX_PKT(from_port, p_cslPkt_desc);
            p_new_pkt_desc = EMAC_ALLOC_PKT(from_port, EMAC_PKT_SIZE(from_port));
        
            if (p_new_pkt_desc != NULL)
            {
                memcpy((void *)p_cslPkt_desc, (void *)p_new_pkt_desc, sizeof(EMAC_PKT_DESC_T));
            }
        }
    #else
            p_new_pkt_desc = EMAC_ALLOC_PKT(from_port, EMAC_PKT_SIZE(from_port));
            if (p_new_pkt_desc != NULL)
            {
                EMAC_RX_PKT(from_port, p_cslPkt_desc);
    
                memcpy((void *)p_cslPkt_desc, (void *)p_new_pkt_desc, sizeof(EMAC_PKT_DESC_T));
            }
        }
    #endif
        return p_cslPkt_desc;
    }

    regards,

    Marcio.

    Capture_Exception_20180129.zip

  • Marcio.

    The call flow you described is correct and your analysis makes sense. Have you tried to add a statement to debug if the condition really can hit? i.e. p_new_pkt_desc == NULL. In the meantime, I will discuss with the driver developer to see if any comment.

    Regards,
    Garrett
  • Hi Garrett,

    Yes, I can confirm that p_new_pkt_desc == NULL really happens. I beleive the cause of p_new_pkt_desc == NULL is related to the the amount of messages versus the number of buffers versus the capacity of the application to consume the data. The problem is about this condition cannot break the software. It would be interesting to add a error counter to this condition in EMAC_STATISTICS_T.

    Regards,
    Marcio.
  • Hi Marcio.

    The p_new_pkt_desc == NULL actually has been addressed in PDK 1.0.9 from the latest PRSDK 4.2, please upgrade to the latest one.

    static EMAC_PKT_DESC_T* EMAC_rxPacket(Handle      hApplication, EMAC_PKT_DESC_T*   p_cslPkt_desc)

    {

       EMAC_PKT_DESC_T*    p_new_pkt_desc;

       uint32_t from_port = 0U;

       uint32_t bufLen = 0U;

       uint8_t* bufPtr = NULL;

       from_port = ((p_cslPkt_desc->Flags & EMAC_PKT_FLAG_FROM_PORT_MASK) >> EMAC_PKT_FLAG_FROM_PORT_SHIFT) -1U;

       if( from_port < EMAC_NUM_MAX_MAC_PORTS )

       {

           /* Allocate the PKT to replenish the RX_FREE_QUEUE queue first.

             * Only provide packet to the applicaton if EMAC_ALLOC_PKT suceeds,

             * otherwise re-use the packet descriptor */

           p_new_pkt_desc = EMAC_ALLOC_PKT(from_port, EMAC_PKT_SIZE(from_port));

           if (p_new_pkt_desc != NULL)

           {

               EMAC_RX_PKT(from_port, p_cslPkt_desc);

               memcpy((void *)p_cslPkt_desc, (void *)p_new_pkt_desc, sizeof(EMAC_PKT_DESC_T));

           }

           else

           {

               bufLen = p_cslPkt_desc->BufferLen;

               bufPtr = p_cslPkt_desc->pDataBuffer;

               memset(p_cslPkt_desc, 0, sizeof(EMAC_PKT_DESC_T));

               p_cslPkt_desc->BufferLen = bufLen;

               p_cslPkt_desc->pDataBuffer = bufPtr;

               p_cslPkt_desc->AppPrivate = ( uint32_t)p_cslPkt_desc;

               gRxDropCounter++;

           }

       }

       return p_cslPkt_desc;

    }

    Regards, Garrett

  • Ok. Thank you!