This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

question about debug SIGSEGV

Dear friend,
  When I got a SIGSEGV signal in A8, does it mean something happen in A8 side instead of M3/DSP?
If not, how did I debug this?

  • SIGSEGV is generated when application tries to access invalid memory. This indicates an A8 side bug and not related to M3/DSP. You need to figure out exactly which code statement results in this error. Easiest is to add prints and isolate the code segment.

  • Dear Govindasamy,
      Thanks for your reply.
     
      I have more question
    Q1. if the access invalid memory happens in M3/DSP, is there any signal or message to remind us?
    Q2. if yes, what's that? if not, how do we know that?

  • There is provision to get details on exceptions arising out of remote processors from A8 Linux. Refer VSYS_EVENT_SLAVE_CORE_EXCEPTION in demo.c. This dump will be more useful in a production system.

    During development, you could connect JTAG / CCS and get details on the exceptions.

  • Dear Govindasamy,
      Thanks for your reply.
     
      When exception happens, we may get some files like "CCS_CRASH_DUMP_VPSS-M3.txt", how do we parse this file?

  • The file"CCS_CRASH_DUMP_VPSS-M3.txt" is stored in a format that can be used  by the CCS crash dump analysis utility .The ccs crash dump utility enables to get call stack leading to the point of crash. Details of CCS crash dump utlity can be found at : http://processors.wiki.ti.com/index.php/Crash_Dump_Analysis

  • Hello,
    SIGSEGV is generated when application tries to access invalid memory. This indicates an A8 side bug and not related to M3/DSP. You need to figure out exactly which code statement results in this error. Easiest is to add prints and isolate the code segment.
    - I try to use signal as below to trace the corruction

    void HandleSignal( int signum, siginfo_t* sig_info, void* context )
    {
    ......

       size = backtrace (str_array, 15);
       str = backtrace_symbols (str_array, size);
     
       printf ("Totally Obtained %zd stack frames. signal number =%d \n", size, signum);

       if(signum == SIGSEGV)
       {
        int i;
        
          printf(" Signal number = %d, Signal errno = %d\n",
           sig_info->si_signo, sig_info->si_errno);
          switch(sig_info->si_code)
          {
           case 1: printf(" SI code = %d (Address not mapped to object)\n",
             sig_info->si_code);
            break;
           case 2: printf(" SI code = %d (Invalid permissions for \
                mapped object)\n",sig_info->si_code);
             break;
           default: printf("SI code = %d (Unknown SI Code)\n",sig_info->si_code);
           break;
          }
          printf(" Fault addr = 0x%x \n",sig_info->si_addr);
       }

       for (i = 0; i < size; i++)
       {
        printf ("%s\n", str[i]);
       }
    ......
    }

    and got following messages

    Totally Obtained 6 stack frames. signal number =11
     Signal number = 11, Signal errno = 0
     SI code = 1 (Address not mapped to object)
     Fault addr = 0x442b8f18
    /opt/dvr_rdk/ti816x/bin/dvr_rdk_demo_mcfw_api.out [0xaa8c]
    /lib/libc.so.6(__default_rt_sa_restorer_v2+0) [0x2acef630]
    /opt/dvr_rdk/ti816x/bin/dvr_rdk_demo_mcfw_api.out [0x1c0a8]
    /opt/dvr_rdk/ti816x/bin/dvr_rdk_demo_mcfw_api.out [0x14998]
    /lib/libpthread.so.0 [0x2abe15f4]
    /lib/libc.so.6(clone+0x88) [0x2ad87368]

    I use arm-linux-objdump to got the location of 0xaa8c inside dvr_rdk_demo_mcfw_api.out
    And that is the "HandleSignal" function. I want to trace the location of fault address 0x442b8f18.

    is there any method to get the memory map for the fault address?

  • Hi Badri,
       I got following messages since I add the signal code as the above in my system(our SW and 816x EVM)

    Obtained 6 stack frames. signal number =11
     Signal number = 11, Signal errno = 0
     SI code = 1 (Address not mapped to object)
     Fault addr = 0x44257f14
    /opt/dvr_rdk/ti816x/bin/dvr_rdk_demo_mcfw_api.out [0xaac8]
    /lib/libc.so.6(__default_rt_sa_restorer_v2+0) [0x2acd9630]
    /opt/dvr_rdk/ti816x/bin/dvr_rdk_demo_mcfw_api.out [0x220c8]
    /opt/dvr_rdk/ti816x/bin/dvr_rdk_demo_mcfw_api.out [0x18920]
    /lib/libpthread.so.0 [0x2abcb5f4]
    /lib/libc.so.6(clone+0x88) [0x2ad71368]


       I found it could be duplicated always once I modify the following

            Scd_ipcFramesOutVpssPrm.baseCreateParams.outputFrameRate = 30;
            Scd_dspAlgPrm.scdCreateParams.inputFrameRate         = 30;
            Scd_dspAlgPrm.scdCreateParams.outputFrameRate        = 30;

       And my behavior is 16ch analog video input, 16-split video screen. I turn off the HDMI monitor initially befor turn on EVM,
    and once I make sure the 16-split video was shown, I turn on the HDMI monitor. And I got the 11(SIGSEGV) signal.
    I guess there's something wrong with hdmi driver.

  • There were issues with HDMI driver where mdelay was used in driver instead of msleep which would cause the kernel to go into busy loop for nearly 800ms when HDMI monitor is connected. The A8 would be frozen for this duration. There were also issues with improper mutex protection when invoking HDMI driver. THese issues are resolved in RDK 3.5 . I am not sure the SIGSEGV issue is due to that though. Can you try out the above test on RDK 3.5 ?

  • Hi Badri,
       Thanks for your reply.

       We can't port our program from DVR 3.0 to 3.5 because of lots of effort.
    By your speaking, it seems there's no big change in hdmi driver except your mention

    Could we just use the latest driver only?

  • You dont have to port your program. You can try standard RDK 3.5 release with the scd frame rate change and try out the hdmi plugin plugout sequence you mentioned. If the issue is resolved then we can point you to the relevant patches in the external dvr rdk arago kernel git repo.

  • Hi Badri,
       Thanks for your reply.

    By my testing with several patterns, it's ok till now.
    Please provide the patch for DVR 3.0.
    Thanks. @@y

  • Hi,

    If you have linux kernel of RDK 3.5 you could take the following files from that release into linux kernel used in RDK 3.0.
    OR, I can generate a patch and give you once i am back in office. I am off for two days.

    Thanks,
    Sujith

    arch/arm/plat-omap/hdmi_lib.c
    drivers/video/ti81xx/ti81xxhdmi/edid.c
    drivers/video/ti81xx/ti81xxhdmi/hdmi.c
    include/linux/edid.h
    include/linux/ti81xxhdmi.h
    sound/soc/davinci/davinci-hdmi.c

  • Hi Sujith
        Thanks for your reply.

        Do you mean that I can replace these files in DVR 3.0 with these from DVR 3.5 directly?
    If not, I need your patch for DVR 3.0. Thanks

  • Yes, please try replacing the files directly and check. If not, i can give you a patch.

    Thanks,
    Sujith 
  • Hi,

      By my long time test till now, it seems no happen. Instead, I got the following messages. Did you know that?

    *** HeapMemMP_free: Address is in this free block
            Error [0xfffffff9] at Line no: 1417 in file /home/Projects/TI-8168/DVRRDK_03.00.00.00/ti_tools/syslink/syslink_2_10_02_17_patched/packages/ti/syslink/utils/hlos/knl/Linux/../../../../../../ti/syslink/ipc/hlos/knl/HeapMemMP.c

     [m3vpss ]  86432856: DISPLAY: HDDAC(BP0) : 10 fps, Latency (Min / Max) = ( 11 / 47 ), Callback Interval (Min / Max) = ( 15 / 18 ) !!!
     [m3vpss ]  86432856: DISPLAY: UNDERFLOW COUNT: HDMI(BP0) 5199811, HDDAC(BP0) 10575882, DVO2(BP1) 10575892, SDDAC(SEC1) 1056                                 2976
     [m3vpss ]  86432856: SYSTEM  : FREE SPACE : System Heap      = 5160 B, Mbx = 10240 msgs)
     [m3vpss ]  86432857: SYSTEM  : FREE SPACE : SR0 Heap         = 10811648 B (10 MB)
     [m3vpss ]  86432857: SYSTEM  : FREE SPACE : Frame Buffer     = 155500160 B (148 MB)
     [m3vpss ]  86432857: SYSTEM  : FREE SPACE : Bitstream Buffer = 189770624 B (180 MB)
     [m3vpss ]  86432858: SYSTEM  : FREE SPACE : Tiler 8-bit      = 89128960 B (85 MB)  - TILER ON
     [m3vpss ]  86432858: SYSTEM  : FREE SPACE : Tiler 16-bit     = 44040192 B (42 MB)  - TILER ON
     [m3vpss ]  172832871: DISPLAY: HDDAC(BP0) : 10 fps, Latency (Min / Max) = ( 11 / 34 ), Callback Interval (Min / Max) = ( 16 / 18 ) !!!
     [m3vpss ]  172832871: DISPLAY: UNDERFLOW COUNT: HDMI(BP0) 5191818, HDDAC(BP0) 10576394, DVO2(BP1) 10576400, SDDAC(SEC1) 105                                 63234
     [m3vpss ]  172832871: SYSTEM  : FREE SPACE : System Heap      = 5160 B, Mbx = 10240 msgs)
     [m3vpss ]  172832872: SYSTEM  : FREE SPACE : SR0 Heap         = 10811648 B (10 MB)
     [m3vpss ]  172832872: SYSTEM  : FREE SPACE : Frame Buffer     = 155500160 B (148 MB)
     [m3vpss ]  172832872: SYSTEM  : FREE SPACE : Bitstream Buffer = 189770624 B (180 MB)
     [m3vpss ]  172832872: SYSTEM  : FREE SPACE : Tiler 8-bit      = 89128960 B (85 MB)  - TILER ON
     [m3vpss ]  172832872: SYSTEM  : FREE SPACE : Tiler 16-bit     = 44040192 B (42 MB)  - TILER ON

  • Have you applied the MessageQ patch on top of RDK 3.0.1 ? If not then this is most likely the same issue

  • Yes, please try replacing the files directly and check. If not, i can give you a patch.

    Hi Badri,

        By my testing till now, I found that I use the problem stat to test on DM8168 PG2.0, and it won't happen.

        Did you know the difference between PG1.1 and 2.0 and even 2.1(the latest)?  We need to clarify this. Thanks

  • Pls check silicon errata for issues fixed in different PG revisions: http://www.ti.com/litv/pdf/sprz329d . The HDMI driver issue is a s/w issue and not a silicon issue.

     

  • hi Badri,

    As I use the EVM with PG1.1 and the patch, I didn't get the message till now. 

    But I sometimes get the following message, is there something I miss??????

    "E-EDID checksum failed!!"

  • Hi Jack,

     

    I think our hdmi driver check for only block 0, so if your display device sends more than one block, this error statement could come. i will check hdmi driver expert for this.

     

    Regards,

    Brijesh

  • Hi Jack,

    EDID check failure means the EDID in the monitor/TV dosent have a valid check sum.
    Are you seeing this error for all TVs/monitor?

    Thanks,
    Sujith
  • I just have one monitor for test and it is TI's monitor.

  • OK, this was not expected. When you say "TI monitor" its the small 7`` display right.
    The EDID might be corrupted on this display. 

    Unless, this is causing any issues. You could ignore this message. The HDMI control thread would configure the display for 1080P60 in case of EDID failures.

    I hope you are able to see something on the display.

    Thanks,
    Sujith
  • Sujith said:

    OK, this was not expected. When you say "TI monitor" its the small 7`` display right.

    The EDID might be corrupted on this display. 

    right
     
    Sujith said:
    Unless, this is causing any issues. You could ignore this message. The HDMI control thread would configure the display for 1080P60 in case of EDID failures.
    I didn't see something wrong on the display and no more messages were shown.
     
    Sujith said:
    I hope you are able to see something on the display.
    I'll update if I found something wrong on the display
     
  • Hi,

    By testing in other HDMI monitors,  we still got the same messages "E-EDID checksum failed!!"

    Hereunder are our cases for reproduction. You may try it in TI's monitor first...

    case 1. plug/unplug the HDMI cable

    case 2. turn of/off the switch of HDMI monitor

    case 3. plug/unplug the power cable of the HDMI monitor

    one more thing is we ever see the "I2C No Ack" when unplug the power cable of the HDMI monitor

    Please help to clarify this issue. Thanks

     

     

  • hello badri, Sujith

    I have two questions

    Q1. could you duplicate this in your side?

    Q2. will the message "E-EDID checksum failed!!" affect any thing?

  • hi badri,

    could you try the above case 3 to duplicate?

    one more thing is I found that there's patch about this  in https://patchwork.kernel.org/patch/1116022/  provided by TI

    but it's for other product.  I'm not sure if it fits for this product.

    Could you help to confirm??

     

     

     

  • Hi Jack,

    The failure to read EDID should not have any adverse effect. Worst case, HDMI driver will default to 1080.

    Yes, this patch is for a different device. We will have make a similar patch for our device also. This patch will take care calculating checksum for all blocks. We do it only for block 0. This should remove "edid checksum failure" message go away in most of the cases.

    I2c no ack, error message can occurs is TV is turned OFF. Its safe to ignore, provided TV was turned off or switched to a different video input.

    Thanks,

    Sujith

  • hi Sujith-san,

       Could you provide a patch for fix about this?

  • Hi Jack,

    I do not have patch for this right now. Once, i have it, i will let you know.

    Thanks,
    Sujith