This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

question about demo code

Other Parts Discussed in Thread: SYSBIOS

Hi TI-driends,

dm8168, rdk3,0

when we use demo code, the command will be executed one by one. And in realistic application, some command like "Core Status: Active/In-active" will be execute inside periodic function to check if the status every core is ok. My question is, is there a problem if we put the command in another function which may be executed with other commands at the same time?

  • There is no problem because there is mutex which ensures there in only one outstanding system_linkControl command always.

  • hello Badri-SuperMan

    we got a signal like the following,

    /lib/libc.so.6(__default_rt_sa_restorer_v2+0) [0x2acd2630]

    /lib/libpthread.so.0 [0x2abcc808]

    /lib/libpthread.so.0(pthread_mutex_lock+0x1a0) [0x2abc5c20]

    /opt/dvr_rdk/ti816x/bin/dvr_rdk_demo_mcfw_api.out [0x56898]

    we check 0x56898 and found

    00056864 <System_ipcMsgQSendMsg>:

       56864:                           e92d4ff0     stmdb      sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}

       56868:                           e24dd02c    sub   sp, sp, #44       ; 0x2c

       5686c:                           e58d2014    str    r2, [sp, #20]

       56870:                           e1dd25b0    ldrh  r2, [sp, #80]

       56874:                           e3530801    cmp r3, #65536       ; 0x10000

       56878:                           e1a08003    mov r8, r3

       5687c:                           e1a09000    mov r9, r0

       56880:                           e58d1018    str    r1, [sp, #24]

       56884:                           e58d2010    str    r2, [sp, #16]

       56888:                           8a000095    bhi   56ae4 <System_ipcMsgQSendMsg+0x280>

       5688c:                           e1a0ae29    mov sl, r9, lsr #28

       56890:                           e59f0378    ldr    r0, [pc, #888]  ; 56c10 <$d>

       56894:                           eb0009a9    bl     58f40 <OSA_mutexLock>

       56898:                           e35a0003    cmp sl, #3        ; 0x3

       5689c:                           8a000082    bhi   56aac <System_ipcMsgQSendMsg+0x248>

       568a0:                           e288b034    add  fp, r8, #52       ; 0x34

       568a4:                           e3a00000    mov r0, #0       ; 0x0

       568a8:                           e1a0100b    mov r1, fp

       568ac:                           eb003b1e    bl     6552c <MessageQ_alloc>

       568b0:                           e3500000    cmp r0, #0       ; 0x0

       568b4:                           e1a07000    mov r7, r0

       568b8:                           e1a06000    mov r6, r0

       568bc:                           0a0000ad    beq  56b78 <System_ipcMsgQSendMsg+0x314>

       568c0:       e59d2014       ldr    r2, [sp, #20]

    did you have further idea about this?

  • I test stability for long time. By my test without HDMI output initially, and several days later it may hang without any information.

    According to the our messages, it always stuck right behind OSA_mutexLock, I'm not very sure if it happens deadlock under some case.

    Did you have further idea?

  • This looks like glibc code or data structures are corrupted. Deadlock situation will not cause SEGFAULT.Also what is the error reported by the seg fault exception ?Is it invalid memory access ? Are you using a nand file system ? If so do you see the same issue with NFS file system ? We have seen issues with some customer board where there are bit flips in nand which can cause crash in libc code.

  • Hello Basri-SuperMan,

    thanks for your reply.

    my environment, 8168evm, nfs, after long time test, there's no log display but the shown video was gone(I connect the output to the monitor to observe). I have no idea about what happen then I Ctrl+C to go thru signal handler and use backtrace to look for previous steps and got the previous info. Here's no SIGSEGV so it's not invalid memory access. Any further idea?

  • I think this issue is unrelated to OSA_mutex . If you press Ctrl+C it will just unblock threads which were blocked on MessageQ_get.If display is blanked out do you get "No signal" or just black background color ? Are all the displays showing blank or only HDMI ? Are you running remote debug client ? Do you see any M3 exception logs ? Display of black background color indicates either VPSS m3 exception or Display list hang. This is most likely due to your board issue .Check the below post for things to verify on your board:

    http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/p/250529/878774.aspx#878774

    http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/p/264273/924280.aspx#924280

     

  • hello SuperMan-Badri,

        thanks for your response.

    If you press Ctrl+C it will just unblock threads which were blocked on MessageQ_get.

    - mm... got it

    If display is blanked out do you get "No signal" or just black background color ?

    - just grey background color

     Are all the displays showing blank or only HDMI

    - we just turn on HDMI

    Are you running remote debug client ?

    - no

    Do you see any M3 exception logs ?

    - no

    Display of black background color indicates either VPSS m3 exception or Display list hang.

    - but I just see grey color back ground,  what does grey color background mean?



  • It is not clear what you mena by grey screen.Pls attached screen shots of the TV when you see hang. Also check if graphics logo is getting displayed when you see hang or not. Always run with remote_debug_client logs enabled and attach the logs when you see hang. Also when you see hang share logs of Vsys_printBufferStatistics and Vsys_printDetailedStatistics

  • Hello SuperMan-Badri,

    Q1. It is not clear what you mena by grey screen.Pls attached screen shots of the TV when you see hang.

    - see  ..it's like what we see after "load.sh"


    Q2. Also check if graphics logo is getting displayed when you see hang or not.

    - we didn't use the logo in our application.

    Q3. Always run with remote_debug_client logs enabled and attach the logs when you see hang. Also when you see hang share logs of Vsys_printBufferStatistics and Vsys_printDetailedStatistics

    - What I saw is just the following messages, no more

    videoSourceStatus.numChannels 8
    DEMO: 0: Detected video at CH [0,0] (720x240@59Hz, 1)!!!
    DEMO: 1: Detected video at CH [0,1] (720x240@59Hz, 1)!!!
    DEMO: 2: Detected video at CH [0,2] (720x240@59Hz, 1)!!!
    DEMO: 3: Detected video at CH [0,3] (720x240@59Hz, 1)!!!
    DEMO: 4: Detected video at CH [1,0] (720x240@59Hz, 1)!!!
    DEMO: 5: Detected video at CH [1,1] (720x240@59Hz, 1)!!!
    DEMO: 6: Detected video at CH [1,2] (720x240@59Hz, 1)!!!
    DEMO: 7: Detected video at CH [1,3] (720x240@59Hz, 1)!!!

    I tried to telnet and use "top" command for the attach 

    and use "vmstat" command for the attach 


    Any further idea?

  • This looks like VPSS M3 crash. Connect CCS+JTAG to M3VPSS core and check the status.Also I see remote_debug_client running in the process list Check the last prints from [m3vpss] to see if you get any error or exception msg print

  • hello SuperMan-Badri,

    because I print too lots of message, I miss the print form [m3vpss]. But I got the attach

    4048.CCS_CRASH_DUMP_VPSS-M3.txt

    Could I got any info from the attach?

    I'll tried again to log...and will update if happen

  • zip and attach the contents of /dvr_rdk/build/dvr_rdk/bin/ti816x-evm

  • Hello SuperMan-Badri,

    see attach for requirement

    http://e2e.ti.com/cfs-file.ashx/__key/communityserver-discussions-components-files/717/8171.ti816x_2D00_evm.7z

    because I add more printf and now I remove them and back to the original condition...and recompile then provide you...I'm not sure if it's ok....just tell you first..

  • I need the exact same firmware image corresponding to the CRASH DUMP provided previously .Otherwise no analysis is possible.

  • hello Super-badri,

      pls use the following files in the above artile, because I found I add more print was only in A8 side instead of DSP/M3 side.

    4048.CCS_CRASH_DUMP_VPSS-M3.txt
    8171.ti816x-evm.7z

    Thanks.

  • Below is the crash dump backtrace:

     

    It indicates a s/w exception was raised by SharedRegion. This should have been printed the reason for the s/w exception on the console if you log is correct. From the exception it looks like this is HeapMemMP buffer overflow where some component is writing beyond allocated memory. Check if your application is ensuring that it is not writing beyond allocated size of bitstream buffer and check that your swms layout is correct.ALso if your application is allocating some buffer using Vsys_allocBuf ensure you are not writing beyond allocated memory. You will have to connect CCS to determine exact memory location that is corrupted to debug the issue further.

     

     0 Vps_rprintf(unsigned char *) at /home/medwin/Projects/TI-8168/DVRRDK_03.00.00.00_ori_2012-04-29/ti_tools/hdvpss/hdvpss_01_00_01_37_patched/packages/ti/psp/vps/common/src/remote_d
    ebug_server.c:168 PC = 0x9DF03894 FP = 0x3F00DDDC
    1 Utils_errorRaiseHook(struct xdc_runtime_Error_Block *) at /home/medwin/Projects/TI-8168/DVRRDK_03.00.00.00_ori_2012-04-29/dvr_rdk/mcfw/src_bios6/utils/src/utils_execp_trace.c:247
     PC = 0x9DEFC68E FP = 0x3F00DDF0
    2 ti_sysbios_BIOS_errorRaiseHook__I(struct xdc_runtime_Error_Block *) at /home/medwin/Projects/TI-8168/DVRRDK_03.00.00.00_ori_2012-04-29_ori/ti_tools/bios/bios_6_33_05_46/packages/
    ti/sysbios/BIOS.c:193 PC = 0x9DF13DD6 FP = 0x3F00DE18
    3 xdc_runtime_Error_raiseX__F(struct xdc_runtime_Error_Block *, unsigned short, unsigned char *, int, unsigned int, int, int) at /db/rtree/install/trees/products/xdcprod/xdcprod-p4
    7/product/Linux/xdctools_3_23_02_47/packages/xdc/runtime/Error.c:153 PC = 0x9DEF6994 FP = 0x3F00DE28
    4 xdc_runtime_Error_raiseX__E(struct xdc_runtime_Error_Block *, unsigned short, unsigned char *, int, unsigned int, int, int) at /home/medwin/Projects/TI-8168/DVRRDK_03.00.00.00_or
    i_2012-04-29_ori/dvr_rdk/../dvr_rdk/build/dvr_rdk/obj/ti816x-evm/m3vpss/release/dvr_rdk_configuro/package/cfg/MAIN_APP_m3vpss_pem3.c:24897 PC = 0x9DF17742 FP = 0x3F00DE98
    5 xdc_runtime_Assert_raise__I(unsigned short, unsigned char *, int, unsigned int) at /db/rtree/install/trees/products/xdcprod/xdcprod-p47/product/Linux/xdctools_3_23_02_47/packages
    /xdc/runtime/Assert.c:34 PC = 0x9DF1393A FP = 0x3F00DEB0
    6 SharedRegion_getPtr(unsigned int) at /home/medwin/Projects/TI-8168/DVRRDK_03.00.00.00_ori_2012-04-29_ori/ti_tools/ipc/ipc_1_24_03_32/packages/ti/sdo/ipc/SharedRegion.c:305 PC = 0
    x00406D8A FP = 0x3F00DED0
    7 ti_sdo_ipc_heaps_HeapMemMP_getStats__E(struct ti_sdo_ipc_heaps_HeapMemMP_Object *, struct xdc_runtime_Memory_Stats *) at /home/medwin/Projects/TI-8168/DVRRDK_03.00.00.00_ori_2012
    -04-29_ori/ti_tools/ipc/ipc_1_24_03_32/packages/ti/sdo/ipc/heaps/HeapMemMP.c:909 PC = 0x0040A1EE FP = 0x3F00DEF0
    8 xdc_runtime_IHeap_getStats(struct xdc_runtime_IHeap___Object *, struct xdc_runtime_Memory_Stats *) at /home/medwin/Projects/TI-8168/DVRRDK_03.00.00.00_ori_2012-04-29_ori/dvr_rdk/
    ../ti_tools/xdc/xdctools_3_23_03_53/packages/xdc/runtime/IHeap.h:152 PC = 0x9DF186A0 FP = 0x3F00DF08
    9 utils_sw_exception_copy_info() at /home/medwin/Projects/TI-8168/DVRRDK_03.00.00.00_ori_2012-04-29/dvr_rdk/mcfw/src_bios6/utils/src/utils_execp_trace.c:193 PC = 0x9DEE984E FP = 0x
    3F00DF08
    10 <symbol is not available> PC = 0x9DF02D90 FP = 0x3F00DF48

  • hi super-badri,

    thanks for your reply.

    Check if your application is ensuring that it is not writing beyond allocated size of bitstream buffer

    - I'll check


    and check that your swms layout is correct.

    - I'm not clear about this meaning? could you describe more detail??


    ALso if your application is allocating some buffer using Vsys_allocBuf ensure you are not writing beyond allocated memory.

    - By my checking, we don't use that...

  • hi badri,

    by following inside CCS_CRASH_XX

    M 0 0x3f005f60 0x00008000

    and following in dvr_rdk_m3vpss_release.xem3.map

    3effdf60 00008000 : captureLink_tsk.oem3 (.bss:taskStackSection)
    3f005f60 00008000 : systemLink_tsk_m3vpss.oem3 (.bss:taskStackSection)
    3f00df60 00008000 : system_common.oem3 (.bss:taskStackSection)

    I know something stuck in systemLink_tsk_m3vpss.c and found our added function inside SystemLink_cmdHandler()

    our added function is as below

    case SYSTEM_COMMON_CMD_GET_FREE_SPACE:
    {
    SystemCommon_GetFreeSpace *prm = (SystemCommon_GetFreeSpace *) pPrm;

    prm->framefreeSpace = Utils_memGetBufferHeapFreeSpace();

    prm->bitfreeSpace = Utils_memGetBitBufferHeapFreeSpace();

    and we just call the above two functions from TI, and I check these two functions more detail and got below

    UInt32 Utils_memGetBufferHeapFreeSpace(void)
    {
    UInt32 size;
    Memory_Stats stats;

    Memory_getStats(gUtils_heapMemHandle[UTILS_MEM_VID_FRAME_BUF_HEAP], &stats);

    size = stats.totalFreeSize;

    return ((UInt32) (size));
    }

    UInt32 Utils_memGetBitBufferHeapFreeSpace(void)
    {
    Memory_Stats stats;

    Memory_getStats(gUtils_heapMemHandle[UTILS_MEM_VID_BITS_BUF_HEAP], &stats);

    return ((UInt32) (stats.totalFreeSize));
    }

    The two functions ask the status by Memory_getStats and the behavior is like your item7 operation. 

    I want to check more inside Memory_getStats() but I don't know where the source code is. Could you help me to trace more deeply??

  • The info is available in the backtrace I shared above:

     

    Memory_getStats -> xdc_runtime_IHeap_getStats -> HeapMemMP_getStats ->SharedRegion_getPtr(s/w exception here).

    The source code for HeapMemMP_getStats is present in

     /home/medwin/Projects/TI-8168/DVRRDK_03.00.00.00_ori_2012 -04-29_ori/ti_tools/ipc/ipc_1_24_03_32/packages/ti/sdo/ipc/heaps/HeapMemMP.c:909

    As I mentioned previously the issue is due to memory corruption due to buffer overflow. You will have to debug the cause of the corruption and not debug this function.

    You can get the address that is corrupted and debug from there.

     

  • We have seen issues with some customer board where there are bit flips in nand which can cause crash in libc code.

    Could you please explain it in detail?

    What may cause the bit flips in nand?Wrong ECC config?wrong-configured ubifs?

    Thanks in advance!