This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DRA821: DRA821 QNX execute "ifconfig" will be stuck on QNX site

Part Number: DRA821

Hi expert,

Now customer run RTOS+QNX(SDK8.0) find  QNX will stuck and it will cause network work abnormal on QNX site. When the issue reproduced, the following phenomena are worthy of attention.

    1. MCU2_0 can send and receive IPC message with QNX
    2. MCU2_0 will print link down & link up message when  plug and unplug the network cable
    3. QNX start a new process will be failed if this process is related with “QNX devctrl
    4. Ifconfig command will be stuck on QNX
    5. We also dump all of the register value when the CPSW5G work normally and abnormally.2311.dump_821_cpsw_reg_error.txt4721.dump_821_cpsw_reg_normal.txt
  • Customer will share the below driver code and reproduced step.

    The directories to include are:

    psdkqa/qnx/devnp

    remote_device/client-rtos/framework

    ethfw/ethremotecfg/client

  • 3125.code.zip See the attachment for the code. The recurrence allowance is as follows:
    We use a script to open ti's ipc_ Test program,
    This phenomenon occurs when we accidentally discover an operation that can stably reproduce this problem. 

    The script is as follows:

    boot_is_a=`boot -s`
    while [ "${boot_is_c}" != "c" ]
    do
    ipc_test &
    sleep 1
    done

  • MCU2_0 can send and receive IPC message with QNX

    For the above statement, for send & receive IPC message with QNX, can you tell us the what firmware image being loaded? Is it the ipc_echo_test firmware image?

    MCU2_0 will print link down & link up message when  plug and unplug the network cable

    For this statement, what is running on MCU2_0 core ? Is the EthFW image loaded? Can you share this log which show the link down & link up message?

    QNX start a new process will be failed if this process is related with “QNX devctrl

    What is the new process referred here? Please share the complete "slog2info -w".

    Ifconfig command will be stuck on QNX

    Can you share details on how you are starting the network driver?

    The script is as follows:

    boot_is_a=`boot -s`
    while [ "${boot_is_c}" != "c" ]
    do
    ipc_test &
    sleep 1
    done

    What is the ipc_test being used here. Is it the default ipc_test that is provided by TI as part of the Processor SDK QNX delivery. if yes, I am not sure if it is a valid test app to run unless you have loaded the ipc_echo_test firmware image need for this test to communicate. Can you clarify.

    Thanks.

  •  random.5                  low     0  -----UNSYNC-----
                                              random.5                 high     0  -----UNSYNC-----
    Jan 01 00:00:00.010                      console.2                           0  -----ONLINE-----
                                             console.2                  out     0  -----UNSYNC-----
    Jan 01 00:00:00.017                       random.5                           0  -----ONLINE-----
                                              random.5              default     0  -----UNSYNC-----
    Jan 01 00:00:00.018                    random.5..0                           0  -----ONLINE-----
                                           random.5..0                 slog     0  -----UNSYNC-----
    Jan 01 00:00:00.035             devb_sdmmc_am65x.8                           0  -----ONLINE-----
                                    devb_sdmmc_am65x.8                 slog     0  -----UNSYNC-----
    Jan 01 00:00:00.038             devb_sdmmc_am65x.9                           0  -----ONLINE-----
                                    devb_sdmmc_am65x.9                 slog     0  -----UNSYNC-----
    Jan 01 00:00:00.041                  io_usb_otg.10                           0  -----ONLINE-----
                                         io_usb_otg.10                 slog     0  -----UNSYNC-----
    Jan 01 00:00:00.153                       iopkt.11                           0  -----ONLINE-----
                                              iopkt.11          main_buffer     0  -----UNSYNC-----
    Jan 01 00:00:00.216                tisci_mgr.73745                           0  -----ONLINE-----
                                       tisci_mgr.73745                 slog     0  -----UNSYNC-----
    Jan 01 00:00:00.224           shmemallocator.86034                           0  -----ONLINE-----
                                  shmemallocator.86034                 slog     0  -----UNSYNC-----
    Jan 01 00:00:00.262                tiipc_mgr.90131                           0  -----ONLINE-----
                                       tiipc_mgr.90131                 slog     0  -----UNSYNC-----
    Jan 01 00:00:00.312                 powermgr.69648                           0  -----ONLINE-----
                                        powermgr.69648                 slog     0  -----UNSYNC-----
    Jan 01 00:00:00.336               tiudma_mgr.94228                           0  -----ONLINE-----
    Jan 01 00:00:00.350              devb_umass.106520                           0  -----ONLINE-----
                                     devb_umass.106520                 slog     0  -----UNSYNC-----
    Jan 01 00:00:00.371                   iopkt.118810                           0  -----ONLINE-----
                                          iopkt.118810          main_buffer     0  -----UNSYNC-----
    Jan 01 00:00:00.450            io_pkt_v6_hc.118810                           0  -----ONLINE-----
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog*     0  cpsw_ioctl:324 Came here -->cmd-0xc0306936
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc09069e7
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc090693a
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc118694b
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc0306936
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc0306936
    Jan 01 00:01:13.583            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:13.584            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc0306936
    Jan 01 00:01:13.584            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:56.030            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc0306936
    Jan 01 00:01:56.030            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:56.030            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc09069e7
    Jan 01 00:01:56.030            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:56.030            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc090693a
    Jan 01 00:01:56.030            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:56.030            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc118694b
    Jan 01 00:01:56.030            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:56.031            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc0306936
    Jan 01 00:01:56.031            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:56.031            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc0306936
    Jan 01 00:01:56.031            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:01:56.031            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:324 Came here -->cmd-0xc0306936
    Jan 01 00:01:56.031            io_pkt_v6_hc.118810                 slog      0  cpsw_ioctl:581 <--
    Jan 01 00:05:04.061               tiudma_mgr.94228                 slog*    55  udma_io_devctl:270 UDMA---> tx.preferredChNum=-65535, tx.ChNum=19
    
    Jan 01 00:05:04.061               tiudma_mgr.94228                 slog     55  udma_io_devctl:376 UDMA---> freering.ringNum=125
    
    Jan 01 00:05:04.062               tiudma_mgr.94228                 slog     55  udma_io_devctl:376 UDMA---> freering.ringNum=126
    
    Jan 01 00:05:04.063               tiudma_mgr.94228                 slog     55  udma_io_devctl:283 UDMA---> rx.preferredChNum=-65535, rx.ChNum=18
    
    Jan 01 00:05:04.064               tiudma_mgr.94228                 slog     55  udma_io_devctl:376 UDMA---> freering.ringNum=127
    
    Jan 01 00:05:04.066               tiudma_mgr.94228                 slog     55  udma_io_devctl:376 UDMA---> freering.ringNum=128
    
    Jan 01 00:05:04.070               tiudma_mgr.94228                 slog     55  udma_io_devctl:428 UDMA---> event.globalEvent=19
    
    Jan 01 00:05:04.070               tiudma_mgr.94228                 slog     55  udma_io_devctl:441 UDMA---> vintrbit.vintrBitNum=1
    
    Jan 01 00:05:04.072               tiudma_mgr.94228                 slog     55  udma_io_devctl:428 UDMA---> event.globalEvent=20
    
    Jan 01 00:05:04.072               tiudma_mgr.94228                 slog     55  udma_io_devctl:441 UDMA---> vintrbit.vintrBitNum=2
    
    Jan 01 00:05:04.076               tiudma_mgr.94228                 slog     55  udma_io_devctl:428 UDMA---> event.globalEvent=21
    
    Jan 01 00:05:04.076               tiudma_mgr.94228                 slog     55  udma_io_devctl:441 UDMA---> vintrbit.vintrBitNum=3
    
    Jan 01 00:05:04.077               tiudma_mgr.94228                 slog     55  udma_io_devctl:428 UDMA---> event.globalEvent=22
    
    Jan 01 00:05:04.077               tiudma_mgr.94228                 slog     55  udma_io_devctl:441 UDMA---> vintrbit.vintrBitNum=4
    

    The attachment is a log of slog2info, but slog2info does not promise some information about ipc.

    ipc_ Test is a test program for ipc communication provided by ti

  • By the way, all of the test base on MCU2_0 is ETHFW firmware.  

  • Can you explain why ipc_test program is run on when the MCU2_0 core is running EthFW firmware? Please note that ipc_test need to be run when the MCU2_0 and other cores are loaded with ipc_echo_test firmware images. So I think you test scenario has a fundamental flaw or if you can explain otherwise. Also, please clarify other questions that is posted above to help understand the issue reported?

    Thanks.

  • Hi Praveen,
    The ipc_test is customer modified by theirselve in order to communication with MCU2_0. Meanwhile the MCU2_0 is run ETHFW firmware and there is  a IPC task receive and send IPC message. This is customer project demand.  For other things need further clarify show as below:

    For the above statement, for send & receive IPC message with QNX, can you tell us the what firmware image being loaded? Is it the ipc_echo_test firmware image?

    MCU2_0 is loaded ETHFW firmware.

    For this statement, what is running on MCU2_0 core ? Is the EthFW image loaded? Can you share this log which show the link down & link up message?

    Yes, all of these test is loaded ETHFW firmware. The log will be share later.

    What is the new process referred here? Please share the complete "slog2info -w".

    The log has been attached before

    Can you share details on how you are starting the network driver?

    Could you help explain what's mean start the network driver? In my understand, the MCU2_0 loaded ETHFW and it will configure CPSW5G and then the network driver should be stated when QNX boot normally. 

    What is the ipc_test being used here. Is it the default ipc_test that is provided by TI as part of the Processor SDK QNX delivery. if yes, I am not sure if it is a valid test app to run unless you have loaded the ipc_echo_test firmware image need for this test to communicate.

    The ipc_test is customer's executable program and it's just in order to communication with MCU2_0 and MCU2_0 will be create a task to receive&send  IPC message. 

  • Hi Kangija,

    The ipc_test is customer modified by theirselve in order to communication with MCU2_0.

    The regression may have been introduced due to this, so do we know if their modification is properly done? 

    The log has been attached before

    The attached log is not proper. Suggest sharing the full log from the OS start including the network driver initialization

    .

    Could you help explain what's mean start the network driver? In my understand, the MCU2_0 loaded ETHFW and it will configure CPSW5G and then the network driver should be stated when QNX boot normally. 

    Yes, On the A72 side, when QNX boot normally, we are starting the CPSW5G virtual thin client DEVNP driver that interacts with EthFW. This is the network driver that is being referred here. We need details on the parameters passed when starting the driver. and logs that are generated from the start.

    To proceed further, we suggest if you can reproduce this problem on a TI EVM and shared us the steps to reproduce. Or alternatively, we can have a meeting to discuss the customer setup as we are unable to understand their setup clearly.

    Thanks

  • 8463.log.zipipc_code.zip hi all:    The problem can be reproduced without starting cpsw. See the attachment for the log and code.

  • Hi Kangija,

    Thanks for this feedback. This infers that the stall is not due to the  network driver (devnp cpsw5g virtual driver) , instead it is more of how the ipc echo test support is integrated into the EthFW.

    We will have a look at the shared code and see if we can provide some feedback.

    Thanks.


  • Hi Kewei,
        Some question from our RD team. Please help check~

    I looked at the logs that were shared and have some questions:

    1. The qnx_log.txt and the slog.txt and the r5f_log.txt seem to line up with the ipc_tests that were run. Seems like the test was run 16 times. Was the script aborted after that? It is not clear from the logs. Was the test aborted after you started seeing the rx_queue full errors (starting line 261 in slog.txt)?
    2. From the slog.txt I see that the cpsw5g driver was launched before starting the test. Was that intended? The last comment says that the issue is reproducible without launching the driver.

    I will look more into the code shared. There seems to be a ipc diag service added on the QNX side. Is that used? I dont see that in the slog.txt log

  • 1.Manually exited the script after the problem reoccurred.
    2.The uploaded log starts CPSW normally
    3.IPC_Diag was used for our diagnosis and was not activated during testing

  • Hi Kewei,
    Can you  get the following infomation the your board?

    slog2info > /tmp/log.txt
    pidin syspage=asinfo 

    and their <device>-evm-ti.build file content . Specifically the line that starts with [+keeplinked] startup-dra821-evm
    It could be different for the board that has the issue, but I am looking for the contents of the build file that builds the QNX IFS
    This is the screenshot for J721s2 build file for our evm. They should have a similar to line number 22

  • Hi,

    Thank you for sharing the info.

    I reviewed the files that you shared and here is what I could gather.

    Assumptions:
    1.  Only Ethernet firmware is the remotecore that is loaded.
    2. Within the ethw firmware binaries only the Ethernet server use case.
    3. Using SBL boot flow. (u-boot based boot flow uses some DDR for DMSC starting at 0xA0000000. This is already being taken care of in their board file, but I mention it here for clarity)
    4. Based on the build file, the carveout is  96MB of DDR at 0xA0000000 (till 0xA6000000) . NOTE: the TI reference build file has 0x60000000 (1536MB - since our reference has more carveout space).
    5. The IPC echo test sample for MCU2_0 is integrated to the ethernet firmware package.

    Next steps:

    1. If the sharedmemory allocator is not needed, it can be safely removed. From the build file, the carveout does not point to 0xbc000000 and hence the sharedmemoryallocator I assume is not useful. Please clarify/ remove accordingly.
    2. Please confirm that you have followed the https://software-dl.ti.com/jacinto7/esd/processor-sdk-qnx-j7200/08_06_00_07/exports/docs/qnx_sdk_components_j7200.html#ti-modifications-to-the-bsp to arrive on the 96MB carveout starting at 0xA0000000 . 
    3. Please confirm that other than DM and remote core MCU2_0 , there are no other cores loaded in your setup. If they are, then additional carveouts might have to be setup.
    4. For identifying the carveout size and location, the linker command file or the generated .map file would have to be used. For the ethernet firmware, the generated .map file should be in the location, 
      <PSDK_RTOS>/ethfw/out/J721E/R5Ft/FREERTOS/release/app_remoteswitchcfg_server_ccs.xer5f.map . For your integrated firmware, please identify the corresponding map file.
    5. Please share the map file in this ticket for review.

  • 1.I tried to remove sharedmemory and there were no exceptions or errors. It seems that sharedmemory is not used.

    2.Modify the memory address to 0xA0000000~0xA600000 without reporting any exceptions.

    3. We have used MCU1_0.

    4.Please refer to the attachment for the map file.app_remoteswitchcfg_server_ccs.xer5f.zip

  • Thank you for sharing this info. I did look at the map file and that looks reasonable. Can you also try increasing the carveout to 512M starting at 0xa0000000 ?

    BR
    Subbu

  • Hello,

    Seems like the issue is in the way the script is run and specifically the sample ipc_test code that is running on the A72 side. In ipc_testsetup.c file that is in the  examples/ipc/ipc_test_qnx/src folder has a while (1) at the end of the test. The way that the script is written, the sample would launch multiple of the ipc_test samples. After sending and receiving the echo messages, the sample would hit the while loop and just sit there. This starts hogging the CPU and once you have enough of these tests running in the background, A72 would run out of CPU cycles and none of the processes get to run (hence a non-functional cpsw driver).

    I am attaching a cooked up version of the ipc_testsetup.c file that will gracefully exit once the test succeeds. Please note that this version, similar to the initial version is only a sample test code provided for reference.

    I have attached a modified version of the script that might be handy as well.

    Also attached is the log on J7200 board with this test case run 1500+ times on top of the 8.0 release.

    Please let me know in case you need any more information.

    Thank you and regards

    Subbu

    ipc_echo.sh

    /cfs-file/__key/communityserver-discussions-components-files/791/ipc_5F00_echo.sh

    ipc_testsetup.c

    /cfs-file/__key/communityserver-discussions-components-files/791/ipc_5F00_testsetup.c

    Sample run log on 8.0 release

    /cfs-file/__key/communityserver-discussions-components-files/791/minicom_2D00_ipc_2D00_test_2D00_log.cap

  • Hi All,

    update the latest patch Patch_for_ethfw.zipipc_trace_logger.zipPatch_for_psdkqa.zip

    -1-  Patch_for_psdkqa.zip

    Note that I had earlier mentioned about providing a patch that would dump more log, but after careful analysis, it is too much work at this point and does not add any value. Instead, I am providing a patch to the DEVNP driver that would disable calling “EnetIf_CheckPortLinkUP()”. Note that this function is important for the functionality of the CPSW5G driver. Also, this is the function that would send periodic ioctl to the EthFW and if EthFW stalls or does not respond, then this is the function that holds up the io-pkt network stack thread, causing the “ifconfig” command to stall. By disabling this call, there would not be any stall when ever there is an EthFW  specific issue.

     

    The patch update is the j7_cpsw.c file and the change done is as below:

     

     

    -2- Patch_for_ethfw.zip

    This patch is to EthFW. This patch would help add the tracebuf address to the EthFW server map file. With this update, the newly generated map file will have any address that can be used to retrieve the trace message from the A72 QNX side using the utility (ipc_trace_logger).

    The patch update is the ipc_trace.c file and the change done is as below:

     

     

    To get the tracebuffer base address from the EthFW server image, run the readelf tool on the EthFW image as below:

     

    readelf -S app_remoteswitchcfg_server.xer5f | grep tracebuf

     

    Example of my image, where we see the tracebuf based address as 0x2936000. Note that this address value may change on the customer’s image.

     

     

    -3- ipc_trace_logger.zip

    This zip contains the ipc_trace_logger qnx binaries, that can be directly added to the customer’s QNX image.

     

    The tool need to be called as below. Once this is called, the EthFw traces will be re-directed to slogger. The tracebuffer_base value need to be obtained from the map file of the EthFW server image.

     

    slog2info -w &
    ipc_trace_logger tracebuffer_base=0xXXXXXXXX,tracebuffer_size=0x80000,print_to_slog=1,wait_mode=1 &

  • Hi Praveen,
    It can also reproduced the issue but the probability was reduced. We have dumped the register value and slog log. please help review and give your comments. Thanks~

    log(1).zip

  • Hi Praveen,
    Customer system block diagram are following

    When the issue reproduce on the car, we have check some the following things:

    1. Qnx can execute other program except related to IPC with MCU2_0 and ifconfig command, especially process is related with “QNX devctrl”.

    2. MCU2_0 can run normally because MCU2_0 can output "link down and link up" log when plug in front camera device.

    3. Other SOC can ping front camera device and surround view device.

    4. We have dumped the Qnx log and some register value are attached.

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/zm_5F00_log20230707.7z 

  • DAR821_log.zip

    hi: Please check the attached log

  • 20230721.zip

    hi: Please check the attached log

  • Hi Kewei,

    Is are these logs with the latest UDMA HC change?
    -Thanks

  • Hi, Michael, these logs are not merging the latest UDMA HC change. We are testing the latest UDMA HC change from yesterday.

  • slog_udma change20230727.txthi michael , this log is with the latest UDMA HC change. is there any information that can find the root cause?

  • hi Michael, this attachment is the latest logdel diag.zip

  • In addition to running "pidin mem" when ifconfig is stuck (while running as a background process) can you also run just "pidin" so we can see what states the threads are in. As for the memory usage, it looks like /bin/message_center is using a majority of the memory (1529M/2048M). Can you elaborate on what this program is doing? If it's not required, can you try running the networking tests without this program running?

  • message_center is used to communicate with other chips.

    There is a log today, can you check the logs as below?

    2030803log.zip

  • It looks as expected that the ifconfig command is stuck on the io-pkt process. The next steps would be to debug the io-pkt driver. Normally this is achieved by connecting to the process via the QNX Momentics IDE or by doing a post mortem debug with a core file. Has this been attempted yet? If not I would look into the QNX dumper utility to dump the io-pkt-v6-hc process. https://www.qnx.com/developers/docs/7.1/#com.qnx.doc.neutrino.utilities/topic/d/dumper.html

  • hi Micheal,there is a core file in this attachment.

    20230804.zip

  • Add more debug file. io-pkt-v6-hc   devnp-cpsw5g.so  io-pkt-v6-hc.sym  

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/cpsw5g_5F00_debug.7z

  • hi Michael,there is another core file and some debug files as below. It is another version. the devnp-cpsw5g.so is different from that kangjia had given to you.del_diag.zip

  • Hi Kewei, I have tried debugging the core file and momentics reports "no debug information" available. Was the project built with the debug profile?

  • Hi Michael,

    The following information was provided by QNX site.

    From the analysis of GDB, it seems that everything is normal. If there is a problem, it is likely to be a problem with the TI driver, because almost all threads are stopped in the driver handler. Please grab the core files several times for comparative analysis.

  • 20230810debug log.ziphi Michael, this log is from debug version.

  • Hi Kewei, is there a new version of the devnp-cpsw5g.so file?  The previous version I have does not match the one from the io-pkt before and after ifconfig core files. All the other libraries seems be loading correctly.

    From what I can see with the logs and the previous core file and logs however it looks like IPC is working. I am looking into whether the UDMA bug fix mentioned in the 9.0 release notes could be related.

  • Michael,

    The following debug file I have verified it on my. It should be worked now.

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/cpsw_5F00_debug_5F00_version.7z

  • This issue has been solved after align with customer. The key change was show as following