Linux/AM5728: DSP Load Failure

Gerard

Expert 1290 points

Part Number: AM5728
Other Parts Discussed in Thread: BQ40Z60

Tool/software: Linux

Background:

Processor SDK 03.03.00.04
DSP1 and DSP2 are dynamically started and stopped as the system is running using the bind/unbind nodes

Problem:

DSP periodically fails to load.

More Info:

Remote processor start-up looks fine:

[ 417.333737] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@e0000000
[ 417.341684] remoteproc2: 40800000.dsp is available
[ 417.346584] remoteproc2: Note: remoteproc is still under development and considered experimental.
[ 417.355991] remoteproc2: THE BINARY FORMAT IS NOT YET FINALIZED, and backward compatibility isn't yet guaranteed.
[ 417.384513] remoteproc2: powering up 40800000.dsp
[ 417.389437] remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 15350528
[ 417.404130] omap_hwmod: mmu0_dsp1: _wait_target_disable failed
[ 417.410032] omap-iommu 40d01000.mmu: 40d01000.mmu: version 3.0
[ 417.415976] omap-iommu 40d02000.mmu: 40d02000.mmu: version 3.0
[ 417.498260] remoteproc2: remote processor 40800000.dsp is now up
[ 417.504904] virtio_rpmsg_bus virtio0: rpmsg host is online
[ 417.510484] remoteproc2: registered virtio0 (type 7)

IPC daemon log reveals some issues (no connection to DSP1 in this case - processor 4 - what I tried to start):

[835.080166] Retrieving command...
[836.080428] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x304d0, 'DSP1:MsgQ:01')...
[836.080474] NameServer_getLocal: entry key: 'DSP1:MsgQ:01' not found!
[836.080493] NameServer_getRemote: no socket connection to processor 1
[836.080510] NameServer_getRemote: no socket connection to processor 2
[836.080526] NameServer_getRemote: Sending request via sock: 5
[836.080542] NameServer_getRemote: requesting from procId 3, MessageQ: DSP1:MsgQ:01
[836.080574] NameServer_getRemote: pending on waitFd: 4
[836.080663] NameServer: back from select()
[836.080685] NameServer: Listener got NameServer message from sock: 6!
[836.080709] listener_cb: recvfrom socket: fd: 6
[836.080726] Received ns msg: nbytes: 484, from addr: 61, from vproc: 3
[836.080742] NameServer Reply: instanceName: MessageQ, name: DSP1:MsgQ:01, value: 0x13a8e
[836.080767] NameServer: waiting for unblockFd: 2, and socks: maxfd: 6
[836.080802] NameServer_getRemote: value for MessageQ:DSP1:MsgQ:01 not found.
[836.080822] NameServer_getRemote: no socket connection to processor 4
[836.080837] value = 0x80
[836.080852] status = -5
[836.080867] DONE

DSP emulator connection fails with this error:

"Error connecting to the target: (Error -1143 @ 0x0) Device core was hung. The debugger has forced the device to a ready state and recovered debug control, but your application's state is now corrupt. You should have limited access to memory and registers, but you may need to reset the device to debug further. Emulation package 6.0.407.6)"

It would appear that the remote processor framework thinks everything is OK but that in reality the DSP was not successfully loaded. Any ideas? Is this some sort of caching problem? The test was to start and stop the exact same firmware image over and over. Eventually this failure happens.

Thanks

over 6 years ago

0 Yordan Kovachev over 6 years ago

TI__Guru**** 161600 points

Hi Gerard,

I'm looking into this.
Could you share if this is the AM57xx GP EVM or a custom board? In case this is a custom board, could you please share the .dts file & full bootlog (dmesg)?

Best Regards,
Yordan

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Could you share if this is the AM57xx GP EVM or a custom board? In case this is a custom board, could you please share the .dts file & full bootlog (dmesg)?

This is a custom board based on the AM5728. I've attached the .dts file and full bootlog.

Thank you!

3731.boot_log.txt

2806.am572x-custom-main.dts.txt

0 Yordan Kovachev over 6 years ago in reply to Gerard

TI__Guru**** 161600 points

Hi,

Can you remove the internal RTC from device tree. Its entry is located in arch/arm/boot/dts/dra7.dtsi, it seems that this is messing with the omap_hwmod driver...

Also I am really confused by the following entry in your custom dts:
&mmc2 {
status = "okay";

pinctrl-names = "default";
pinctrl-0 = <
&mmc1_pins_default
&mmc2_pins_default
&i2c2_pins_default
&i2c5_pins_default
&clkout3_pins_default
&dcan1_pins_default
&emu_pins_default
&gpio1_pins_default
&gpio2_pins_default
&gpio3_pins_default
&gpio4_pins_default
&gpio5_pins_default
&gpio6_pins_default
&gpio7_pins_default
&gpio8_pins_default
&mdio_pins_default
&nmin_pins_default
&onreset_pins_default
&rgmii0_pins_default
&rtc_misc_pins_default
&rgmii1_pins_default
&rmii_pins_default
&uart1_pins_default
&uart3_pins_default
&uart5_pins_default
&usb1_pins_default
&wakeup_pins_default
&vin2a_iodelay_manual1_conf
&rgmii0_iodelay_manual1_conf
&mmc3_iodelay_manual1_conf
&rmii_iodelay_manual1_conf
>;

Why are you enabling all those pins in mmc2 driver? Also what is the purpose of the &rtc_misc_pins_default(?) as I see these are mostly interrupts, rtc clk & jtag pins...

Best Regards,
Yordan

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Yordan Kovachev said:

Can you remove the internal RTC from device tree. Its entry is located in arch/arm/boot/dts/dra7.dtsi, it seems that this is messing with the omap_hwmod driver...

Done; I've removed the internal RTC reference from dra7.dtsi.

Yordan Kovachev said:

Why are you enabling all those pins in mmc2 driver?

We found that our custom board was still drawing current when powered off; different attempts were made to rectify this via the device tree and this hack ultimately worked. It's probably a separate topic of discussion but I'd be curious to hear your thoughts on that issue as well.

Yordan Kovachev said:

Also what is the purpose of the &rtc_misc_pins_default(?) as I see these are mostly interrupts, rtc clk & jtag pins...

This is the result of the pinmux tool; we're not using the internal RTC but included it within our dts as we thought it was necessary in order to have a known state for the pins (reduce current, prevent damage, etc). Is this not true? Should we just remove this reference as well?

Thanks

0 Gerard over 6 years ago in reply to Gerard

Expert 1290 points

To follow up, removing the internal RTC reference from the dra7.dtsi file had no effect on the DSP loading reliability.

Looking forward to further feedback; this problem is a current roadblock to progress for my development effort. It's critical that we're able to reliably start and stop the DSPs with different firmware images.

Thank you

0 Yordan Kovachev over 6 years ago in reply to Gerard

TI__Guru**** 161600 points

Hi,

I advised removing the RTC, because of the first kernel panic in your bootlog:
[ 0.108632] omap_hwmod: l3_main_2 using broken dt data from ocp
[ 0.216626] omap_hwmod: rtcss: no dt node
[ 0.216634] ------------[ cut here ]------------
[ 0.216649] WARNING: CPU: 0 PID: 1 at /vobs/cra_os/Ec/System/src/open_source/linux/linux-4.4.41+gitAUTOINC+f9f6f0db2d-gf9f6f0db2d/arch/arm/mach-omap2/omap_hwmod.c:2523 _init+0x1f4/0x418()
[ 0.216657] omap_hwmod: rtcss: doesn't have mpu register target base

As you can see it messes with device hwmod, however the kernel recovers from this error.

Later you have:
[ 7.284958] mpu9250: module is from the staging directory, the quality is unknown, you have been warned.
Loading module bq40z60_battery
Loading module fpga_config
Loading module hw_info
......
And then the remoteproc/rpmsg crashes & fails to load your dsp binary, the reason is a custom error returned from L4_PER...

After that I suspect the kernel enters quite a long loop of trying to laod the binary & cannot get to the user space (console), right? This is why in my opinion you should make sure you do NOT call the RTC module (unless it is crucial to your system setup). I'll dig a bit more & update this with any feedback I have.

We found that our custom board was still drawing current when powered off; different attempts were made to rectify this via the device tree and this hack ultimately worked. It's probably a separate topic of discussion but I'd be curious to hear your thoughts on that issue as well.

Regarding drawing current after power off, you should also revise your hardware design.

This is the result of the pinmux tool; we're not using the internal RTC but included it within our dts as we thought it was necessary in order to have a known state for the pins (reduce current, prevent damage, etc). Is this not true? Should we just remove this reference as well?
[/quote]

If unused, you should follow the Data Manual recommendations for tying unused pins (section 4.1.1 Unused Balls Connection Requirements), and you can remove these entries from the dts file.

Best Regards,
Yordan

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Yordan Kovachev said:

And then the remoteproc/rpmsg crashes & fails to load your dsp binary, the reason is a custom error returned from L4_PER...

After that I suspect the kernel enters quite a long loop of trying to laod the binary & cannot get to the user space (console), right?

No, everything is initially functional once the system boots up. Every time I load a DSP there is a single custom error - I had created a separate forum post about this some time ago with no conclusion reached as to the cause. Often I can boot up and load/unload a DSP 5 or more times before I get the failure described in this thread.

Why would the remote processor framework report that the DSP has been successfully started yet the DSP isn't really running? It feels like some sort of caching issue. The initial load of the DSP image from eMMC to memory takes closer to ~6 seconds but subsequent loads only take ~200 milliseconds. I'm assuming that speed increase is due to the image being cached. Perhaps something is messing with the cache in between loads so we're left with an invalid DSP image in memory, causing the emulator to not even connect?

0 Yordan Kovachev over 6 years ago in reply to Gerard

TI__Guru**** 161600 points

Do you load/unload the same dsp image?
I'll test this on my AM5728 GP EVM to try and recreate the behavior on your side.

Best Regards,
Yordan

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Yes, I am loading/unloading the same dsp image.

Thank you

0 Gerard over 6 years ago in reply to Gerard

Expert 1290 points

Yordan,

Have you been able to recreate the problem on your EVM?

Thanks

0 Rex Chang over 6 years ago in reply to Gerard

TI__Guru 50170 points

Hi, Gerard,

what's the interval between the bind and unbind? If you have not tried a delay between them, could you give it a try adding 15s delay? That is bind - 15s delay - unbind - 15s - bind - 15s - etc. We suspect things are out of sync if they were issued too closely.

Rex

0 Gerard over 6 years ago in reply to Rex Chang

Expert 1290 points

Currently I have a 1 second delay between the unbind and bind. I experimented with 7, 15, and 20 seconds with no luck.

A little more background to our system:

DSP talks to an FPGA when running; FPGA gets configured via GPMC and then data is transferred via PCIe
FPGA fires interrupts to the DSP using GPIOs
In these tests the FPGA is power cycled at the same time as the DSP

Here is some new data after some testing with a script that continuously loads/unloads the DSP:

Test DSP build was generated for the AM5728EVM - the problem was not recreated (obviously no FPGA on the EVM)
Test DSP build was generated for our custom hardware - the problem was not recreated (FPGA was not used)
Production DSP + FPGA build loop test failed
Production DSP only (FPGA left disabled) build loop test was successful

So, it seems that something with the DSP-FPGA interaction is to blame for why the subsequent DSP load fails. My best guess would be that peripherals that the DSP/FPGA were interacting with are interfering with the DSP around the power cycle. Are there best practices for this dynamic power cycling use case? For example, is it a requirement that the DSP disable all peripherals/interrupts prior to getting shut off? One might think that the lazy path - not disabling peripherals/interrupts - would be acceptable since we're power cycling the processor. Perhaps that is not a safe assumption.

Thanks

0 Yordan Kovachev over 6 years ago in reply to Gerard

TI__Guru**** 161600 points

Hi,

Another thing that came to my mind. At which OPP are you running your AM57xx device?
Can you try setting it to OPP_HIGH?

Best Regards,
Yordan

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

I'm assuming we are at whatever OPP is the default, as I have not changed it.

I found a post where you linked processors.wiki.ti.com/.../Sitara_Linux_Training:_Power_Management but I am not seeing some of the sysfs nodes referenced to be able to determine the OPP/change the OPP.

Gerard

0 Yordan Kovachev over 6 years ago in reply to Gerard

TI__Guru**** 161600 points

Hi Gerard,

By default the AM57xx GP EVM, running the unmodified SDK03.03.00.04 (kernel 4.4.41) is working at 1GHz, which is the lowest of the available frequencies in cpufreq:
root@am57xx-evm:/sys/class/regulator/regulator.3# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
1000000 1176000 1500000
This corresponds to OPP_NOM, see dra7.dtsi:
opp_nom@1000000000 {
opp-hz = /bits/ 64 <1000000000>;
opp-microvolt = <1060000 850000 1150000>;
opp-supported-hw = <0xFF 0x01>;
opp-suspend;
};

You should change the governor to userspace:
root@am57xx-evm~# echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
And select the highest cpu frequency 1.5GHz:
root@am57xx-evm:~# echo 1500000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed

Another option is to modify the opp table in dra7.dtsi.

You should have these sysfs entries in your kernel, if you have enabled the CPUFREQ driver, when building it.

Best Regards,
Yordan

0 Yordan Kovachev over 6 years ago in reply to Yordan Kovachev

TI__Guru**** 161600 points

Also I've managed to get the DPS load fail on my GP EVM, but it is as Rex mentioned:
if I use short interval between bind->unbind the system fails to load the DSP firmware. However if i use the bellow script:
#!/bin/sh
cd /sys/bus/platform/drivers/omap-rproc/
echo 40800000.dsp > unbind
sleep 5
ln -s /lib/firmware/dra7-dsp1-fw.xe66.dspdce-fw /lib/firmware/dra7-dsp1_new.xe66
echo 40800000.dsp > bind
sleep 15
echo 40800000.dsp > unbind
sleep 5
ln -s /lib/firmware/dra7-dsp1-fw.xe66.dspdce-fw /lib/firmware/dra7-dsp1-fw.xe66
echo 40800000.dsp > bind
sleep 15

that is 20 s sleep between bind->unbind I have no problems loading & unloading the firmware.

Best Regards,
Yordan

0 Yordan Kovachev over 6 years ago in reply to Yordan Kovachev

TI__Guru**** 161600 points

Hi,

Some additional information, based on the log you've shared in your first message.

The error originates from the NameServer daemon (sources located here: git.ti.com/.../daemon), specifically from this section:

/* Set Timeout to wait: */

tv.tv_sec = 0;

tv.tv_usec = NAMESERVER_GET_TIMEOUT;

/* Create request message and send to remote: */

clusterId = procId - MultiProc_getBaseIdOfCluster();

sock = NameServer_module->comm[clusterId].sendSock;

if (sock == INVALIDSOCKET) {

LOG1("NameServer_getRemote: no socket connection to processor %d\n",

procId);

status = NameServer_E_RESOURCE;

goto exit;

}

Best Regards,
Yordan

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Thank you; I'll rebuild my kernel with the CPUFREQ driver as that must explain why I don't see all the nodes that the Wiki listed and you list above. I'll report back once I try at the other OPP setting.

Thanks

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

The problem is that having that much delay between DSP start/stops is a show stopper for us. It needs to happen in much less than 20 seconds - ideally just a few seconds.

Is there a root cause that can be addressed (kernel patch) to deal with the quick start/stop problem that even you have now recreated?

Thanks

0 RandyP over 6 years ago in reply to Gerard

TI__Guru* 84110 points

Gerard,

If using the long delay will work reliably in your situation, then you can start trimming it down to find the minimum delay for your hardware situation. Since Yordan cannot duplicate all of your customer hardware and external devices, it may be difficult to come up with an accurate figure to use, nothing better than experiment on your end, for example.

Once you confirm that the long delay will work for you, that will be a big data point for us both to consider in the debug process moving forward. Please let us know when that test is completed so we will know to move forward in the right direction together.

Regards,
RandyP

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Yordan Kovachev said:
Hi Gerard,

You should have these sysfs entries in your kernel, if you have enabled the CPUFREQ driver, when building it.

I have the CPUFREQ driver enabled but am still not seeing the nodes. I do notice this happens at boot-up:

[ 0.001718] /cpus/cpu@0 missing clock-frequency property
[ 0.001733] /cpus/cpu@1 missing clock-frequency property

Is this property something that should be added to dra7.dtsi and dra74x.dtsi for cpu 0 and cpu 1, respectively?

Thanks

0 Yordan Kovachev over 6 years ago in reply to Gerard

TI__Guru**** 161600 points

Hi Gerard,

Do you have the related opp settings in your dts:
In beagle-x15-common.dtsi, you should have the oppdm_mpu/dspeve/gpu/ivahd/core supplies:
&oppdm_mpu {
vdd-supply = <&smps12_reg>;
};

&oppdm_dspeve {
vdd-supply = <&smps45_reg>;
};

&oppdm_gpu {
vdd-supply = <&smps45_reg>;
};

&oppdm_ivahd {
vdd-supply = <&smps45_reg>;
};

&oppdm_core {
vdd-supply = <&smps6_reg>;
};

these should be present in
&i2c1 {
status = "okay";
clock-frequency = <400000>;

tps659038: tps659038@58 {
......

Also you should have the opp table defined in dra7.dtsi:
cpu0_opp_table: opp_table0 {

which is called in dra74x.dtsi:
cpu@1 {
device_type = "cpu";
compatible = "arm,cortex-a15";
reg = <1>;
operating-points-v2 = <&cpu0_opp_table>;
};

Do you have these in your dts files?

Best Regards,
Yordan

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Yordan Kovachev said:
Hi Gerard,

Do you have these in your dts files?

Yes; I have confirmed I have all of the properties you listed across the relevant dts files.

Thanks,

Gerard

0 Gerard over 6 years ago in reply to Gerard

Expert 1290 points

Also, here are all the cpufreq-related kernel options I have enabled:
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y

#
# CPU frequency scaling drivers
#
CONFIG_CPUFREQ_DT=y
# CONFIG_ARM_BIG_LITTLE_CPUFREQ is not set
# CONFIG_ARM_KIRKWOOD_CPUFREQ is not set
CONFIG_ARM_OMAP2PLUS_CPUFREQ=y
CONFIG_ARM_TI_CPUFREQ=y

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Yordan Kovachev said:
Also I've managed to get the DPS load fail on my GP EVM, but it is as Rex mentioned:
if I use short interval between bind->unbind the system fails to load the DSP firmware. However if i use the bellow script:
#!/bin/sh
cd /sys/bus/platform/drivers/omap-rproc/
echo 40800000.dsp > unbind
sleep 5
ln -s /lib/firmware/dra7-dsp1-fw.xe66.dspdce-fw /lib/firmware/dra7-dsp1_new.xe66
echo 40800000.dsp > bind
sleep 15
echo 40800000.dsp > unbind
sleep 5
ln -s /lib/firmware/dra7-dsp1-fw.xe66.dspdce-fw /lib/firmware/dra7-dsp1-fw.xe66
echo 40800000.dsp > bind
sleep 15

I wanted to report back on this longer delay test: even with delays that match yours, I do eventually get a load failure.

Section 5.3.3.4.2 of the AM572x TRM talks about the power down sequence for the C66x. There is reference to a "powerdown" message that the host (in our case the ARM) sends to the DSP. Is there any further detail on this expected procedure and/or any sample code that illustrates what the DSP should be doing upon reception of this "powerdown" message?

Thanks,
Gerard

0 Yordan Kovachev over 6 years ago in reply to Gerard

TI__Guru**** 161600 points

Hi Gerard,

Sorry for the late reply.

Is there any further detail on this expected procedure and/or any sample code that illustrates what the DSP should be doing upon reception of this "powerdown" message?

I am not aware of any public document describing this procedure in details. As for sample code, you can have a look at the kernel remote-porc drivers:
drivers/remoteproc/repoteproc_core.c --> rproc_shutdown()
drivers/remoteproc/omap_remoteproc.c

Those are the functions that load/unload the dsp firmware.

As for not having the CPUFREQ in your sysfs.. I cannot find a valid explanation yet.. I am working on this.

Best Regards,
Yordan

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Yordan Kovachev said:

I am not aware of any public document describing this procedure in details. As for sample code, you can have a look at the kernel remote-porc drivers:
drivers/remoteproc/repoteproc_core.c --> rproc_shutdown()
drivers/remoteproc/omap_remoteproc.c

Those are the functions that load/unload the dsp firmware.

I was more concerned with what the DSP needed to do. The TRM references a shutdown procedure implemented in SW on the DSP side. The assumption is that the intent of this DSP SW procedure is to leave things in a good state so that there aren't issues when you power up the next time.

Thanks

0 Yordan Kovachev over 6 years ago in reply to Gerard

TI__Guru**** 161600 points

Hi Gerard,

I was able to find some documents about DSP power down. The CorePack integrated in AM572x has its own user guide:
www.ti.com/.../sprugw0c.pdf
Check Chapter 12 Power-Down Controller, specifically Sections 12.2.5 C66x CorePac Powerdown & 12.2.6 Miscellaneous Power-Down.
It can also be useful to take a look at the IDLE instruction described in C66x DSP and Instruction Set Reference Guide (SPRUGH7) (www.ti.com/.../sprugh7.pdf), see Sections 3.8.12.5 IDLE, 4.158 IDLE & Figure H-2 NOP and IDLE Instruction Format

Implemented software procedure should comply with the descriptions above.

Best Regards,
Yordan

0 Gerard over 6 years ago in reply to Yordan Kovachev

Expert 1290 points

Thanks for the information, Yordan.

Here is what we are currently doing to shut down the DSP once it is up and running:

Host (ARM) sends the DSP a custom shutdown message (via MessageQ)
DSP calls PCIe deconstructor (which essentially calls Pcie_close() )
DSP sends Host (ARM) an ACK message (via MessageQ)
DSP sleeps for 1 millisecond to make sure EC gets ACK message
General uninitialization
1. Detach from internal message and timers
2. Call MessageQ_delete()
3. Call Ipc_detach()
4. Delete timers
5. Delete other classes
6. Delete McSpi interface - calls SPI_close for the physical SPI interface
Call the library function PMLIBCpuModePrepare(PMHAL_PRCM_MOD_DSP1,PMLIB_IDLE_CPU_MODE_OFF);

I believe this implements step 1 in the list provided in section 12.2.5 of the TMS320C66x DSP CorePac User Guide. Since the DSP will be power cycled and reset by the ARM, step 2 was not specifically implemented since step 4 above disabled all interrupts.
Call the library function PMLIBCpuIdle(PMHAL_PRCM_PD_STATE_OFF);

I believe this implements step 3 of 12.2.5. I confirmed this by stepping through the code and idle would have been executed, but when I stepped to the idle command the DSP went into Reset and the debugger disconnected, which I suppose would be expected behavior.

We still eventually fail to load the DSP, however. What can we do to improve this procedure?

Thanks

0 Yordan Kovachev over 6 years ago in reply to Gerard

TI__Guru**** 161600 points

Hi,

Just wanted to let you know that we're looking into this.

Best Regards,
Yordan

0 Rex Chang over 6 years ago in reply to Yordan Kovachev

TI__Guru 50170 points

Hi, Gerard,

Is there a trace logs when the DSP fails to download? it should be in /sys/kernel/debug/remoteproc/remoteproc#/trace0 where # is the core id starting from 0.

Rex

0 Rex Chang over 6 years ago in reply to Rex Chang

TI__Guru 50170 points

Hi, Gerard,

We see two possibilities for the cause of the failure, and are not sure if you had looked into them before:
a) DSP is writing to GPIO register, when for some reason GPIO peripheral is powered down. Could you confirm if the GPIO peripheral is powering up and down as well?
b) DSP is writing to FPGA through PCIE, when FPGA PCIE is disconnected. Probably this looks like the most probable scenario. To avoid this, Can it be sure that the PCIE accesses on DSP is completely stopped before powering down both DSP and FPGA? ( This also includes DSP should not be responding to any GPIO interrupts, which may lead to any PCIE transactions)

Rex

0 Christopher Peters over 6 years ago in reply to Rex Chang

Genius 3370 points

Hello Rex,

I am working the DSP side of the issue Gerard has been posting about. We wanted to get some additional clarity on the questions in your replies.

As for the trace logging, is there specific things I might want to add to the .cfg file in the DSP to get the logging you are looking for. Since we are MIPS challenged in the DSP for some of our applications, I probably have the very minimal logging enabled.

As for the question about GPIO Peripherals. Did you mean the powering down the GPIO peripheral in the AM5728 SOC or the hardware on our custom board the GPIO's are connected to?

As for the question about PCIe transactions, we believe that since the FPGA has never been activated with mission code and since the DSP is at the IDLE instruction, the PCIE should also be idle. However, I do notice that although we do call the PCIe_close function that likely does not really turn off the PCIe hardware in the AM5728. Should we be taking additional hardware/register actions to be sure the PCIe is really idle in the 5728?

Thanks for helping,

Chris

0 Rex Chang over 6 years ago in reply to Christopher Peters

TI__Guru 50170 points

Hi, Chris,

Do you see anything shown in the current trace and Linux dmesg? Whatever DSP logs, it shows up in the trace file on AMR side. For debugging purpose will it possible to enable some of the DSP logs?

I recall the GPIOs are used as the interrupts to DSP by FPGA. Just want to be sure that any interrupts come in won't be handled.

One of your test cases which was successful is to leave FPGA disabled. Does the disable FPGA mean powered off? Would it make difference if powered up but without PCIe configured/enumerated?

It doesn't seem that PCIe on AM572x can be powered off except only the PHY power control. Since we can't reproduce the issue, we can't make more suggestions to root cause the issue.

Rex

0 Christopher Peters over 6 years ago in reply to Rex Chang

Genius 3370 points

Rex,

When the system is up and running, I see this in the trace0 log:

[ 0.000] 20 Resource entries at 0x80000000

Gerard will have to comment on what additional things are present after a failure. Yes, for these tests we should be able to enable any logging you would recommend.

Yes, a GPIO is used as an interrupt from the FPGA to the DSP. However, we do not believe the FPGA should be generating any interrupts and as part of the DSP shutdown procedure I call HwiP_disable();. For good measure, I have also just added GPIO_disableInt() to the GPIO driver destructor which is called on power down. Previously the GPIO driver destructor was empty as no shutdown API is provided in the driver code.

Gerard will need to comment on the meaning of "FPGA Disabled".

We are working on a EVM based setup to help reproduce the issue, but our needs to the ported back to the EVM and stripped of sensitive code, so it will be a few more days before we can send anything.

Thanks,

Chris

0 Rex Chang over 6 years ago in reply to Christopher Peters

TI__Guru 50170 points

Hi, Chris,

For GPIO interrupts, we just want to be sure it is covered, and don't mean it be one of the causes. Using the EVM based setup to reproduce the issue, will it still require the FPGA? We need a way to reproduce it at TI to debug it. will it be a similar setup with any PCIe end device connected to AM572x EVM with your DSP code to enumerate the PCIe? I assume it is DSP initializes the PCIe because it is the consumer of FPGA data or vice versa. If that is similar, then I can disable the PCIe enumeration in the Kernel code to see if we can reproduce it at TI side.

Rex

0 Gerard over 6 years ago in reply to Rex Chang

Expert 1290 points

Rex Chang said:

Do you see anything shown in the current trace and Linux dmesg?

This is from my original post on this thread; everything looks normal in dmesg - I've included the remoteproc-related output below. The IPC daemon hints at a problem though when it prints out that it doesn't have a connection to the DSP.

Remote processor start-up looks fine:

IPC daemon log reveals some issues (no connection to DSP1 in this case - processor 4 - what I tried to start):

Rex Chang said:

One of your test cases which was successful is to leave FPGA disabled. Does the disable FPGA mean powered off? Would it make difference if powered up but without PCIe configured/enumerated?

Yes, the FPGA would have been powered off. We did a test where the FPGA was powered on but we did nothing with PCIe and that still failed.

Processors

Processors forum

Linux/AM5728: DSP Load Failure