This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DRA821U: Ethernet can't startup normal under power on pressure test

Part Number: DRA821U

Hello TI Engineers,

    We are now performing a power on test, and use our own dm.bin(MCU1_0) by put dm.bin file in the SD card.  The ethernet can't start normally sometime. 

The test sequence is power on for 35s, power off for 5s, a cycle time is 40s. The issue occurs in 1 hour.

Environment infos:

    SDK version: 

        - pdk_j7200_08_00_00_37

       -  u-boot-2021.01+gitAUTOINC+53e79d0e89-g53e79d0e89

    Boot mode:  SD card,   SPL/Uboot

   OS:  Etas's RTAOS

    Board: product board 

  

When we removed the dm.bin from the SD card, The issue disappeared.  From the code and tispl.bin structure , we can infer that load the default MCU1_0 firmware is loaded.

What is the possible reason of the issue? 

I know  Resource Management is lying on MCU1_0,  but not clear how it works and how it affects other cores. 

Also I want to compare the code of the defaut MCU1_0  firmware with our own MCU1_0 firmware, but can't find the code of defaut firmware.

  • Hello Quanyao,

    Due to a regional holiday, half our team is out of office this week. Please expect 1~2 business-day delay on responses.
    Apologies for the inconvenience and thank you for your patience.

    -Josue

  • Hi,

      We are now performing a power on test, and use our own dm.bin(MCU1_0) by put dm.bin file in the SD card

    I hope, this is not the correct way of having custom DM binary.

    As you have pointed above DM will be loaded by tispl.bin So, you need make your binary as part of SPL. For this, you need to update the UBOOT_DM with the custom binary path in the make file of Linux SDK and rebuild u-boot.
    After building the u-boot, copy the "tiboot3.bin" file from r5 folder and "u-boot.img, tispl.bin" from a72 folder under u-boot_build to the boot partition of SD card.

    I know  Resource Management is lying on MCU1_0,  but not clear how it works and how it affects other cores. 

    Yes, Resource Management is handled by MCU1_0 via Sciserver integration.
    All other cores will integrate Sciclient module and request server for Resources allocation.
    If Sciserver is not running on MCU1_0, then other cores which requires resources will not success in the process result in application stops.

    Also I want to compare the code of the defaut MCU1_0  firmware with our own MCU1_0 firmware, but can't find the code of defaut firmware.

    You can find the default MCU1_0 firmware used in Linux from PDK component in RTOS SDK.
    Path: <RTOS-SDK>/pdk_j7200_xxx/packages/ti/drv/ipc/examples/

    When we removed the dm.bin from the SD card, The issue disappeared.  From the code and tispl.bin structure , we can infer that load the default MCU1_0 firmware is loaded.

    From above statement, issue could be due to the process you have followed for custom DM or DM firmware itself.

    Best Regards.
    Sudheer

  • Hi, 

           Thank you for your response and now I am more clear about the multicore startup process.

           We compared own code with example code, also followed  the steps on Appication Report  spracy6, still not working.

           PS: the default DM firmwrae is loaded into SRAM on MCU R5F, but our own DM fireware is loaded into DDR, will this difference affect the program?

  • Hi,

     PS: the default DM firmwrae is loaded into SRAM on MCU R5F, but our own DM fireware is loaded into DDR, will this difference affect the program?

    Only startup-code, boot-code, Reset Vectors are in internal memory of MCU1_0 and the code and data will be in DDR space even in default DM firmware.
    If you integrate Sciserver and Client and use the region reserved for MCU1_0 from DDR space and follow the steps mentioned above for making binary as part of SPL then it should work fine. 

    May be you can take Default DM firmware as reference and and add your code on top of it and check once.

    Also, if possible can you please share details about which application you want to load in MCU1_0 as your DM (I mean MCAL example or PDK RTOS example)

    Best Regards,
    Sudheer

  • HI Sudheer,

          We add some functional safety functions in MCU1_0, with Classic Autosar. But in our test, we had remove all the user functions and only the SCI function and the autosar framework left.

        These days we have done a lot of work on it but issue still there.

         I tred to print out all the SCI Commands but once I add the UART_Printf, the every time I power up the data , a type with TISCI_MSG_RM_IRQ_SET(0x1000) always fails and the network is unreachable.

     

    When I removes the UART_Printf funciton. The Program stuck at  tiipc-mgr for exactly 10minutes, and after that  it will go to the normal status.

     

  • Hi,

    If Sciserver call fail means, It could not able to allocate the Resources your are trying to allocate.
    The Possible reason could be some one might have already used the same resource.

    Can you please share details about your changes in DM binary for better understanding of the issue. If possible can you please share your DM firmware to review.

    Best Regards,
    Sudheer

  • Hi Sudheer,

            Toady I tried several tests to about UART_Printf

            1). Put  Uart_Printf  before Sciserver_processtask(), the SCI_server fails 100% percent.

                

             2). Put  Uart_Printf  after  Sciserver_processtask(), the SCI_server fails about 50% percent.

                  

           3). Remove all the UART_Printf, so MCU1_0 will not print out data. 

               It failed in 1 hour. Show one low runnable SCI command fail 

            4). Comment out Uart_Init codes, A72 stucked at Starting tiipc-mgr in half an hour, 40 seconds per test cycle.

               

               

     

        

           As this issue is occassionally occurs, it seems related to something crash together, so I stop using the UART_Printf, which may affect the codes, but not work.  

    About the details,  we run 5 CANFD and 1 IPC channel on MCU1_0.

    Here is  the task code of SciServer, note that  for high task, in OsShell_CntrSciServerHighOsTask_Proc there is just an counter counting up.

    SciServerTrigger_UserHi is an event. Similarly for low task.

    The SciServer functions is in the attatched file

    SciServerRtaos.zip

          

  • Hi,

    As per Sudheer's response, one theory that seems to fit the data gathered so far, is that the A72 initialization S/W is running in parallel, or in a non-desired sequence, in relation to other cores in the system, while the other cores are also undergoing initialization.   The issue only happens sporadically as it is timing dependent.  Adding / removing prints effects timing and thus effect reproducibility.

    Some general thoughts:

    1. Isolate the error on the A72 side, what is the API / line of code that is failing. 
      1. If it is always the first A72 call using SCI, that is failing then in all likely hood the SCI server on the MCU R5 is not ready.
      2. Consider adding a polling to read SCI version number, but with a timeout.
      3. If it is SCI that is failing but is not the first call, please provide full details on code location of failing SCI call.
    2. For test purposes, add delays on the A72 S/W initialization
      1. if a delay is added to the user.sh, does the issue go away?
      2. Adding the delay, in different locations can help with isolation of failure point.
      3. What does your user.sh look like, make sure 'waitfor' line is present:
          echo "Starting tisci-mgr.."
          tisci-mgr
          waitfor /dev/tisci 2

    3.  Remove all debug prints from MCU R5 / Main Domain R5 / A7
      1. If this is a timing issue, then any additional print will impact the timing.  Stay as close as possible to production S/W, else you may end up debugging issues that are not present when prints are removed.
      2. Consider writes to a known memory location, or toggling of GPIO for tracking where boot is at.

    Regards,

    kb

  • Thank you KB,

              We will try these tomorow. The user.sh has a waitfor command.  I don't know how to add a delay in user.sh, could show me an example? 

           

  • The 'watifor' can also be used for a delay.

    For example 'waitfor /asdf 1', would delay for 1 second looking for /asdf, then unblock and boot would continue.

    There are other QNX utiltiies available but their availability would depend on the content of your filesystem under test.

    Regards,

    kb