This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5728: DSP Load Failure

Part Number: AM5728
Other Parts Discussed in Thread: BQ40Z60

Tool/software: Linux

Background:

  • Processor SDK 03.03.00.04
  • DSP1 and DSP2 are dynamically started and stopped as the system is running using the bind/unbind nodes

Problem:

DSP periodically fails to load.

More Info:

Remote processor start-up looks fine:

[ 417.333737] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@e0000000
[ 417.341684] remoteproc2: 40800000.dsp is available
[ 417.346584] remoteproc2: Note: remoteproc is still under development and considered experimental.
[ 417.355991] remoteproc2: THE BINARY FORMAT IS NOT YET FINALIZED, and backward compatibility isn't yet guaranteed.
[ 417.384513] remoteproc2: powering up 40800000.dsp
[ 417.389437] remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 15350528
[ 417.404130] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
[ 417.410032] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
[ 417.415976] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
[ 417.498260] remoteproc2: remote processor 40800000.dsp is now up
[ 417.504904] virtio_rpmsg_bus virtio0: rpmsg host is online
[ 417.510484] remoteproc2: registered virtio0 (type 7)

IPC daemon log reveals some issues (no connection to DSP1 in this case - processor 4 - what I tried to start):

[835.080166] Retrieving command...
[836.080428] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x304d0, 'DSP1:MsgQ:01')...
[836.080474] NameServer_getLocal: entry key: 'DSP1:MsgQ:01' not found!
[836.080493] NameServer_getRemote: no socket connection to processor 1
[836.080510] NameServer_getRemote: no socket connection to processor 2
[836.080526] NameServer_getRemote: Sending request via sock: 5
[836.080542] NameServer_getRemote: requesting from procId 3, MessageQ: DSP1:MsgQ:01
[836.080574] NameServer_getRemote: pending on waitFd: 4
[836.080663] NameServer: back from select()
[836.080685] NameServer: Listener got NameServer message from sock: 6!
[836.080709] listener_cb: recvfrom socket: fd: 6
[836.080726] Received ns msg: nbytes: 484, from addr: 61, from vproc: 3
[836.080742] NameServer Reply: instanceName: MessageQ, name: DSP1:MsgQ:01, value: 0x13a8e
[836.080767] NameServer: waiting for unblockFd: 2, and socks: maxfd: 6
[836.080802] NameServer_getRemote: value for MessageQ:DSP1:MsgQ:01 not found.
[836.080822] NameServer_getRemote: no socket connection to processor 4
[836.080837] value = 0x80
[836.080852] status = -5
[836.080867] DONE

DSP emulator connection fails with this error:

"Error connecting to the target: (Error -1143 @ 0x0) Device core was hung. The debugger has forced the device to a ready state and recovered debug control, but your application's state is now corrupt. You should have limited access to memory and registers, but you may need to reset the device to debug further. Emulation package 6.0.407.6)"

It would appear that the remote processor framework thinks everything is OK but that in reality the DSP was not successfully loaded. Any ideas? Is this some sort of caching problem? The test was to start and stop the exact same firmware image over and over. Eventually this failure happens.

Thanks

  • Hi Gerard,

    I'm looking into this.
    Could you share if this is the AM57xx GP EVM or a custom board? In case this is a custom board, could you please share the .dts file & full bootlog (dmesg)?

    Best Regards,
    Yordan

  • Could you share if this is the AM57xx GP EVM or a custom board? In case this is a custom board, could you please share the .dts file & full bootlog (dmesg)? 

    This is a custom board based on the AM5728. I've attached the .dts file and full bootlog.

    Thank you!

    3731.boot_log.txt

    2806.am572x-custom-main.dts.txt

  • Hi,

    Can you remove the internal RTC from device tree. Its entry is located in arch/arm/boot/dts/dra7.dtsi, it seems that this is messing with the omap_hwmod driver...

    Also I am really confused by the following entry in your custom dts:
    &mmc2 {
    status = "okay";

    pinctrl-names = "default";
    pinctrl-0 = <
    &mmc1_pins_default
    &mmc2_pins_default
    &i2c2_pins_default
    &i2c5_pins_default
    &clkout3_pins_default
    &dcan1_pins_default
    &emu_pins_default
    &gpio1_pins_default
    &gpio2_pins_default
    &gpio3_pins_default
    &gpio4_pins_default
    &gpio5_pins_default
    &gpio6_pins_default
    &gpio7_pins_default
    &gpio8_pins_default
    &mdio_pins_default
    &nmin_pins_default
    &onreset_pins_default
    &rgmii0_pins_default
    &rtc_misc_pins_default
    &rgmii1_pins_default
    &rmii_pins_default
    &uart1_pins_default
    &uart3_pins_default
    &uart5_pins_default
    &usb1_pins_default
    &wakeup_pins_default
    &vin2a_iodelay_manual1_conf
    &rgmii0_iodelay_manual1_conf
    &mmc3_iodelay_manual1_conf
    &rmii_iodelay_manual1_conf
    >;

    Why are you enabling all those pins in mmc2 driver? Also what is the purpose of the &rtc_misc_pins_default(?) as I see these are mostly interrupts, rtc clk & jtag pins...

    Best Regards,
    Yordan
  • Yordan Kovachev said:

    Can you remove the internal RTC from device tree. Its entry is located in arch/arm/boot/dts/dra7.dtsi, it seems that this is messing with the omap_hwmod driver...


    Done; I've removed the internal RTC reference from dra7.dtsi.


    Yordan Kovachev said:

    Why are you enabling all those pins in mmc2 driver?

    We found that our custom board was still drawing current when powered off; different attempts were made to rectify this via the device tree and this hack ultimately worked. It's probably a separate topic of discussion but I'd be curious to hear your thoughts on that issue as well.

    Yordan Kovachev said:

    Also what is the purpose of the &rtc_misc_pins_default(?) as I see these are mostly interrupts, rtc clk & jtag pins...

    This is the result of the pinmux tool; we're not using the internal RTC but included it within our dts as we thought it was necessary in order to have a known state for the pins (reduce current, prevent damage, etc). Is this not true? Should we just remove this reference as well? 

    Thanks

  • To follow up, removing the internal RTC reference from the dra7.dtsi file had no effect on the DSP loading reliability.

    Looking forward to further feedback; this problem is a current roadblock to progress for my development effort. It's critical that we're able to reliably start and stop the DSPs with different firmware images.

    Thank you
  • Hi,

    I advised removing the RTC, because of the first kernel panic in your bootlog:
    [ 0.108632] omap_hwmod: l3_main_2 using broken dt data from ocp
    [ 0.216626] omap_hwmod: rtcss: no dt node
    [ 0.216634] ------------[ cut here ]------------
    [ 0.216649] WARNING: CPU: 0 PID: 1 at /vobs/cra_os/Ec/System/src/open_source/linux/linux-4.4.41+gitAUTOINC+f9f6f0db2d-gf9f6f0db2d/arch/arm/mach-omap2/omap_hwmod.c:2523 _init+0x1f4/0x418()
    [ 0.216657] omap_hwmod: rtcss: doesn't have mpu register target base

    As you can see it messes with device hwmod, however the kernel recovers from this error.

    Later you have:
    [ 7.284958] mpu9250: module is from the staging directory, the quality is unknown, you have been warned.
    Loading module bq40z60_battery
    Loading module fpga_config
    Loading module hw_info
    ......
    And then the remoteproc/rpmsg crashes & fails to load your dsp binary, the reason is a custom error returned from L4_PER...

    After that I suspect the kernel enters quite a long loop of trying to laod the binary & cannot get to the user space (console), right? This is why in my opinion you should make sure you do NOT call the RTC module (unless it is crucial to your system setup). I'll dig a bit more & update this with any feedback I have.

    We found that our custom board was still drawing current when powered off; different attempts were made to rectify this via the device tree and this hack ultimately worked. It's probably a separate topic of discussion but I'd be curious to hear your thoughts on that issue as well.

    Regarding drawing current after power off, you should also revise your hardware design.

    This is the result of the pinmux tool; we're not using the internal RTC but included it within our dts as we thought it was necessary in order to have a known state for the pins (reduce current, prevent damage, etc). Is this not true? Should we just remove this reference as well?
    [/quote]

    If unused, you should follow the Data Manual recommendations for tying unused pins (section 4.1.1 Unused Balls Connection Requirements), and you can remove these entries from the dts file.

    Best Regards,
    Yordan
  • Yordan Kovachev said:

    And then the remoteproc/rpmsg crashes & fails to load your dsp binary, the reason is a custom error returned from L4_PER...

    After that I suspect the kernel enters quite a long loop of trying to laod the binary & cannot get to the user space (console), right?

    No, everything is initially functional once the system boots up. Every time I load a DSP there is a single custom error - I had created a separate forum post about this some time ago with no conclusion reached as to the cause. Often I can boot up and load/unload a DSP 5 or more times before I get the failure described in this thread.

    Why would the remote processor framework report that the DSP has been successfully started yet the DSP isn't really running? It feels like some sort of caching issue. The initial load of the DSP image from eMMC to memory takes closer to ~6 seconds but subsequent loads only take ~200 milliseconds. I'm assuming that speed increase is due to the image being cached. Perhaps something is messing with the cache in between loads so we're left with an invalid DSP image in memory, causing the emulator to not even connect?

  • Do you load/unload the same dsp image?
    I'll test this on my AM5728 GP EVM to try and recreate the behavior on your side.

    Best Regards,
    Yordan
  • Yes, I am loading/unloading the same dsp image.

    Thank you
  • Yordan,

    Have you been able to recreate the problem on your EVM?

    Thanks
  • Hi, Gerard,

    what's the interval between the bind and unbind? If you have not tried a delay between them, could you give it a try adding 15s delay? That is bind - 15s delay - unbind - 15s - bind - 15s - etc. We suspect things are out of sync if they were issued too closely.

    Rex
  • Currently I have a 1 second delay between the unbind and bind. I experimented with 7, 15, and 20 seconds with no luck.

    A little more background to our system:

    • DSP talks to an FPGA when running; FPGA gets configured via GPMC and then data is transferred via PCIe
    • FPGA fires interrupts to the DSP using GPIOs
    • In these tests the FPGA is power cycled at the same time as the DSP

    Here is some new data after some testing with a script that continuously loads/unloads the DSP:

    • Test DSP build was generated for the AM5728EVM - the problem was not recreated (obviously no FPGA on the EVM)
    • Test DSP build was generated for our custom hardware - the problem was not recreated (FPGA was not used)
    • Production DSP + FPGA build loop test failed
    • Production DSP only (FPGA left disabled) build loop test was successful

    So, it seems that something with the DSP-FPGA interaction is to blame for why the subsequent DSP load fails. My best guess would be that peripherals that the DSP/FPGA were interacting with are interfering with the DSP around the power cycle. Are there best practices for this dynamic power cycling use case? For example, is it a requirement that the DSP disable all peripherals/interrupts prior to getting shut off? One might think that the lazy path - not disabling peripherals/interrupts - would be acceptable since we're power cycling the processor. Perhaps that is not a safe assumption.

    Thanks

  • Hi,

    Another thing that came to my mind. At which OPP are you running your AM57xx device?
    Can you try setting it to OPP_HIGH?

    Best Regards,
    Yordan
  • I'm assuming we are at whatever OPP is the default, as I have not changed it.

    I found a post where you linked processors.wiki.ti.com/.../Sitara_Linux_Training:_Power_Management but I am not seeing some of the sysfs nodes referenced to be able to determine the OPP/change the OPP.

    Gerard
  • Hi Gerard,

    By default the AM57xx GP EVM, running the unmodified SDK03.03.00.04 (kernel 4.4.41) is working at 1GHz, which is the lowest of the available frequencies in cpufreq:
    root@am57xx-evm:/sys/class/regulator/regulator.3# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
    1000000 1176000 1500000
    This corresponds to OPP_NOM, see dra7.dtsi:
    opp_nom@1000000000 {
    opp-hz = /bits/ 64 <1000000000>;
    opp-microvolt = <1060000 850000 1150000>;
    opp-supported-hw = <0xFF 0x01>;
    opp-suspend;
    };

    You should change the governor to userspace:
    root@am57xx-evm~# echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
    And select the highest cpu frequency 1.5GHz:
    root@am57xx-evm:~# echo 1500000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed

    Another option is to modify the opp table in dra7.dtsi.

    You should have these sysfs entries in your kernel, if you have enabled the CPUFREQ driver, when building it.

    Best Regards,
    Yordan
  • Also I've managed to get the DPS load fail on my GP EVM, but it is as Rex mentioned:
    if I use short interval between bind->unbind the system fails to load the DSP firmware. However if i use the bellow script:
    #!/bin/sh
    cd /sys/bus/platform/drivers/omap-rproc/
    echo 40800000.dsp > unbind
    sleep 5
    ln -s /lib/firmware/dra7-dsp1-fw.xe66.dspdce-fw /lib/firmware/dra7-dsp1_new.xe66
    echo 40800000.dsp > bind
    sleep 15
    echo 40800000.dsp > unbind
    sleep 5
    ln -s /lib/firmware/dra7-dsp1-fw.xe66.dspdce-fw /lib/firmware/dra7-dsp1-fw.xe66
    echo 40800000.dsp > bind
    sleep 15

    that is 20 s sleep between bind->unbind I have no problems loading & unloading the firmware.

    Best Regards,
    Yordan
  • Hi,

    Some additional information, based on the log you've shared in your first message.

    The error originates from the NameServer daemon (sources located here: git.ti.com/.../daemon), specifically from this section:

       /* Set Timeout to wait: */

       tv.tv_sec = 0;

       tv.tv_usec = NAMESERVER_GET_TIMEOUT;

       /* Create request message and send to remote: */

       clusterId = procId - MultiProc_getBaseIdOfCluster();

       sock = NameServer_module->comm[clusterId].sendSock;

       if (sock == INVALIDSOCKET) {

           LOG1("NameServer_getRemote: no socket connection to processor %d\n",

                procId);

           status = NameServer_E_RESOURCE;

           goto exit;

       }

    Best Regards, 
    Yordan

  • Thank you; I'll rebuild my kernel with the CPUFREQ driver as that must explain why I don't see all the nodes that the Wiki listed and you list above. I'll report back once I try at the other OPP setting.

    Thanks
  • The problem is that having that much delay between DSP start/stops is a show stopper for us. It needs to happen in much less than 20 seconds - ideally just a few seconds.

    Is there a root cause that can be addressed (kernel patch) to deal with the quick start/stop problem that even you have now recreated?

    Thanks
  • Gerard,

    If using the long delay will work reliably in your situation, then you can start trimming it down to find the minimum delay for your hardware situation. Since Yordan cannot duplicate all of your customer hardware and external devices, it may be difficult to come up with an accurate figure to use, nothing better than experiment on your end, for example.

    Once you confirm that the long delay will work for you, that will be a big data point for us both to consider in the debug process moving forward. Please let us know when that test is completed so we will know to move forward in the right direction together.

    Regards,
    RandyP
  • Yordan Kovachev said:
    Hi Gerard,

    You should have these sysfs entries in your kernel, if you have enabled the CPUFREQ driver, when building it. 

    I have the CPUFREQ driver enabled but am still not seeing the nodes. I do notice this happens at boot-up:

    [ 0.001718] /cpus/cpu@0 missing clock-frequency property
    [ 0.001733] /cpus/cpu@1 missing clock-frequency property

    Is this property something that should be added to dra7.dtsi and dra74x.dtsi for cpu 0 and cpu 1, respectively?

    Thanks

  • Hi Gerard,

    Do you have the related opp settings in your dts:
    In beagle-x15-common.dtsi, you should have the oppdm_mpu/dspeve/gpu/ivahd/core supplies:
    &oppdm_mpu {
    vdd-supply = <&smps12_reg>;
    };

    &oppdm_dspeve {
    vdd-supply = <&smps45_reg>;
    };

    &oppdm_gpu {
    vdd-supply = <&smps45_reg>;
    };

    &oppdm_ivahd {
    vdd-supply = <&smps45_reg>;
    };

    &oppdm_core {
    vdd-supply = <&smps6_reg>;
    };

    these should be present in
    &i2c1 {
    status = "okay";
    clock-frequency = <400000>;

    tps659038: tps659038@58 {
    ......

    Also you should have the opp table defined in dra7.dtsi:
    cpu0_opp_table: opp_table0 {

    which is called in dra74x.dtsi:
    cpu@1 {
    device_type = "cpu";
    compatible = "arm,cortex-a15";
    reg = <1>;
    operating-points-v2 = <&cpu0_opp_table>;
    };

    Do you have these in your dts files?

    Best Regards,
    Yordan
  • Yordan Kovachev said:
    Hi Gerard,

    Do you have these in your dts files?

    Yes; I have confirmed I have all of the properties you listed across the relevant dts files.

    Thanks,

    Gerard

  • Also, here are all the cpufreq-related kernel options I have enabled:
    CONFIG_CPU_FREQ=y
    CONFIG_CPU_FREQ_GOV_COMMON=y
    CONFIG_CPU_FREQ_STAT=y
    CONFIG_CPU_FREQ_STAT_DETAILS=y
    # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
    # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
    # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
    CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
    # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
    CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
    CONFIG_CPU_FREQ_GOV_POWERSAVE=y
    CONFIG_CPU_FREQ_GOV_USERSPACE=y
    CONFIG_CPU_FREQ_GOV_ONDEMAND=y
    CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y

    #
    # CPU frequency scaling drivers
    #
    CONFIG_CPUFREQ_DT=y
    # CONFIG_ARM_BIG_LITTLE_CPUFREQ is not set
    # CONFIG_ARM_KIRKWOOD_CPUFREQ is not set
    CONFIG_ARM_OMAP2PLUS_CPUFREQ=y
    CONFIG_ARM_TI_CPUFREQ=y
  • Yordan Kovachev said:
    Also I've managed to get the DPS load fail on my GP EVM, but it is as Rex mentioned:
    if I use short interval between bind->unbind the system fails to load the DSP firmware. However if i use the bellow script:
    #!/bin/sh
    cd /sys/bus/platform/drivers/omap-rproc/
    echo 40800000.dsp > unbind
    sleep 5
    ln -s /lib/firmware/dra7-dsp1-fw.xe66.dspdce-fw /lib/firmware/dra7-dsp1_new.xe66
    echo 40800000.dsp > bind
    sleep 15
    echo 40800000.dsp > unbind
    sleep 5
    ln -s /lib/firmware/dra7-dsp1-fw.xe66.dspdce-fw /lib/firmware/dra7-dsp1-fw.xe66
    echo 40800000.dsp > bind
    sleep 15

    I wanted to report back on this longer delay test: even with delays that match yours, I do eventually get a load failure.

    Section 5.3.3.4.2 of the AM572x TRM talks about the power down sequence for the C66x. There is reference to a "powerdown" message that the host (in our case the ARM) sends to the DSP. Is there any further detail on this expected procedure and/or any sample code that illustrates what the DSP should be doing upon reception of this "powerdown" message?

    Thanks,
    Gerard

  • Hi Gerard,

    Sorry for the late reply.

    Is there any further detail on this expected procedure and/or any sample code that illustrates what the DSP should be doing upon reception of this "powerdown" message?


    I am not aware of any public document describing this procedure in details. As for sample code, you can have a look at the kernel remote-porc drivers:
    drivers/remoteproc/repoteproc_core.c --> rproc_shutdown()
    drivers/remoteproc/omap_remoteproc.c

    Those are the functions that load/unload the dsp firmware.

    As for not having the CPUFREQ in your sysfs.. I cannot find a valid explanation yet.. I am working on this.

    Best Regards,
    Yordan
  • Yordan Kovachev said:


    I am not aware of any public document describing this procedure in details. As for sample code, you can have a look at the kernel remote-porc drivers:
    drivers/remoteproc/repoteproc_core.c --> rproc_shutdown()
    drivers/remoteproc/omap_remoteproc.c

    Those are the functions that load/unload the dsp firmware. 

    I was more concerned with what the DSP needed to do. The TRM references a shutdown procedure implemented in SW on the DSP side. The assumption is that the intent of this DSP SW procedure is to leave things in a good state so that there aren't issues when you power up the next time.

    Thanks

  • Hi Gerard,

    I was able to find some documents about DSP power down. The CorePack integrated in AM572x has its own user guide:
    www.ti.com/.../sprugw0c.pdf
    Check Chapter 12 Power-Down Controller, specifically Sections 12.2.5 C66x CorePac Powerdown & 12.2.6 Miscellaneous Power-Down.
    It can also be useful to take a look at the IDLE instruction described in C66x DSP and Instruction Set Reference Guide (SPRUGH7) (www.ti.com/.../sprugh7.pdf), see Sections 3.8.12.5 IDLE, 4.158 IDLE & Figure H-2 NOP and IDLE Instruction Format

    Implemented software procedure should comply with the descriptions above.

    Best Regards,
    Yordan
  • Thanks for the information, Yordan.

    Here is what we are currently doing to shut down the DSP once it is up and running:

    1. Host (ARM) sends the DSP a custom shutdown message (via MessageQ)
    2. DSP calls PCIe deconstructor (which essentially calls Pcie_close() )
    3. DSP sends Host (ARM) an ACK message (via MessageQ)
    4. DSP sleeps for 1 millisecond to make sure EC gets ACK message
    5. General uninitialization
      1. Detach from internal message and timers
      2. Call MessageQ_delete()
      3. Call Ipc_detach()
      4. Delete timers
      5. Delete other classes
      6. Delete McSpi interface - calls SPI_close for the physical SPI interface
    6. Call the library function PMLIBCpuModePrepare(PMHAL_PRCM_MOD_DSP1,PMLIB_IDLE_CPU_MODE_OFF); 

      I believe this implements step 1 in the list provided in section 12.2.5 of the TMS320C66x DSP CorePac User Guide.  Since the DSP will be power cycled and reset by the ARM, step 2 was not specifically implemented since step 4 above disabled all interrupts. 

    7. Call the library function PMLIBCpuIdle(PMHAL_PRCM_PD_STATE_OFF);

      I believe this implements step 3 of 12.2.5.  I confirmed this by stepping through the code and idle would have been executed, but when I stepped to the idle command the DSP went into Reset and the debugger disconnected, which I suppose would be expected behavior.

    We still eventually fail to load the DSP, however. What can we do to improve this procedure?

    Thanks

  • Hi,

    Just wanted to let you know that we're looking into this.

    Best Regards,
    Yordan
  • Hi, Gerard,

    Is there a trace logs when the DSP fails to download? it should be in /sys/kernel/debug/remoteproc/remoteproc#/trace0 where # is the core id starting from 0.

    Rex
  • Hi, Gerard,

    We see two possibilities for the cause of the failure, and are not sure if you had looked into them before:
    a) DSP is writing to GPIO register, when for some reason GPIO peripheral is powered down. Could you confirm if the GPIO peripheral is powering up and down as well?
    b) DSP is writing to FPGA through PCIE, when FPGA PCIE is disconnected. Probably this looks like the most probable scenario. To avoid this, Can it be sure that the PCIE accesses on DSP is completely stopped before powering down both DSP and FPGA? ( This also includes DSP should not be responding to any GPIO interrupts, which may lead to any PCIE transactions)

    Rex
  • Hello Rex,

    I am working the DSP side of the issue Gerard has been posting about.  We wanted to get some additional clarity on the questions in your replies.

    As for the trace logging, is there specific things I might want to add to the .cfg file in the DSP to get the logging you are looking for.  Since we are MIPS challenged in the DSP for some of our applications, I probably have the very minimal logging enabled.

    As for the question about GPIO Peripherals.  Did you mean the powering down the GPIO peripheral in the AM5728 SOC or the hardware on our custom board the GPIO's are connected to?

    As for the question about PCIe transactions, we believe that since the FPGA has never been activated with mission code and since the DSP is at the IDLE instruction, the PCIE should also be idle.  However, I do notice that although we do call the PCIe_close function that likely does not really turn off the PCIe hardware in the AM5728.  Should we be taking additional hardware/register actions to be sure the PCIe is really idle in the 5728?

    Thanks for helping,

    Chris

  • Hi, Chris,

    Do you see anything shown in the current trace and Linux dmesg? Whatever DSP logs, it shows up in the trace file on AMR side. For debugging purpose will it possible to enable some of the DSP logs?

    I recall the GPIOs are used as the interrupts to DSP by FPGA. Just want to be sure that any interrupts come in won't be handled.

    One of your test cases which was successful is to leave FPGA disabled. Does the disable FPGA mean powered off? Would it make difference if powered up but without PCIe configured/enumerated?

    It doesn't seem that PCIe on AM572x can be powered off except only the PHY power control. Since we can't reproduce the issue, we can't make more suggestions to root cause the issue.

    Rex
  • Rex,

    When the system is up and running, I see this in the trace0 log:

    [      0.000] 20 Resource entries at 0x80000000

    Gerard will have to comment on what additional things are present after a failure.  Yes, for these tests we should be able to enable any logging you would recommend.

    Yes, a GPIO is used as an interrupt from the FPGA to the DSP.  However, we do not believe the FPGA should be generating any interrupts and as part of the DSP shutdown procedure I call HwiP_disable();.  For good measure, I have also just added GPIO_disableInt() to the GPIO driver destructor which is called on power down.  Previously the GPIO driver destructor was empty as no shutdown API is provided in the driver code.

    Gerard will need to comment on the meaning of "FPGA Disabled".

    We are working on a EVM based setup to help reproduce the issue, but our needs to the ported back to the EVM and stripped of sensitive code, so it will be a few more days before we can send anything.

    Thanks,

    Chris

  • Hi, Chris,

    For GPIO interrupts, we just want to be sure it is covered, and don't mean it be one of the causes. Using the EVM based setup to reproduce the issue, will it still require the FPGA? We need a way to reproduce it at TI to debug it. will it be a similar setup with any PCIe end device connected to AM572x EVM with your DSP code to enumerate the PCIe? I assume it is DSP initializes the PCIe because it is the consumer of FPGA data or vice versa. If that is similar, then I can disable the PCIe enumeration in the Kernel code to see if we can reproduce it at TI side.

    Rex
  • Rex Chang said:

    Do you see anything shown in the current trace and Linux dmesg? 

    This is from my original post on this thread; everything looks normal in dmesg - I've included the remoteproc-related output below. The IPC daemon hints at a problem though when it prints out that it doesn't have a connection to the DSP.

    Remote processor start-up looks fine:

    [ 417.333737] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@e0000000
    [ 417.341684] remoteproc2: 40800000.dsp is available
    [ 417.346584] remoteproc2: Note: remoteproc is still under development and considered experimental.
    [ 417.355991] remoteproc2: THE BINARY FORMAT IS NOT YET FINALIZED, and backward compatibility isn't yet guaranteed.
    [ 417.384513] remoteproc2: powering up 40800000.dsp
    [ 417.389437] remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 15350528
    [ 417.404130] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
    [ 417.410032] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
    [ 417.415976] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
    [ 417.498260] remoteproc2: remote processor 40800000.dsp is now up
    [ 417.504904] virtio_rpmsg_bus virtio0: rpmsg host is online
    [ 417.510484] remoteproc2: registered virtio0 (type 7)

    IPC daemon log reveals some issues (no connection to DSP1 in this case - processor 4 - what I tried to start):

    [835.080166] Retrieving command...
    [836.080428] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x304d0, 'DSP1:MsgQ:01')...
    [836.080474] NameServer_getLocal: entry key: 'DSP1:MsgQ:01' not found!
    [836.080493] NameServer_getRemote: no socket connection to processor 1
    [836.080510] NameServer_getRemote: no socket connection to processor 2
    [836.080526] NameServer_getRemote: Sending request via sock: 5
    [836.080542] NameServer_getRemote: requesting from procId 3, MessageQ: DSP1:MsgQ:01
    [836.080574] NameServer_getRemote: pending on waitFd: 4
    [836.080663] NameServer: back from select()
    [836.080685] NameServer: Listener got NameServer message from sock: 6!
    [836.080709] listener_cb: recvfrom socket: fd: 6
    [836.080726] Received ns msg: nbytes: 484, from addr: 61, from vproc: 3
    [836.080742] NameServer Reply: instanceName: MessageQ, name: DSP1:MsgQ:01, value: 0x13a8e
    [836.080767] NameServer: waiting for unblockFd: 2, and socks: maxfd: 6
    [836.080802] NameServer_getRemote: value for MessageQ:DSP1:MsgQ:01 not found.
    [836.080822] NameServer_getRemote: no socket connection to processor 4
    [836.080837] value = 0x80
    [836.080852] status = -5
    [836.080867] DONE

    Rex Chang said:


    One of your test cases which was successful is to leave FPGA disabled. Does the disable FPGA mean powered off? Would it make difference if powered up but without PCIe configured/enumerated? 

    Yes, the FPGA would have been powered off. We did a test where the FPGA was powered on but we did nothing with PCIe and that still failed.