This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6548: Kernel crash from PRU auto-forwarding

Part Number: AM6548

Hi Team, 

Customer is using Linux SDK 8.2 for AM6548--They are trying to use the PRU to auto-forward packets from RX0 port to their TX1 port. Ideally, they'd like to use the AM65x as a sniffer on this comms line. 

This auto-forwarding works fine on low-speed tests (<100MHz), but when moving up to higher speeds (1GHz) the kernel crashes. Here's some info on the current status from the customer: 

"

The kernel crash is happening because the driver tries to push a packet of length 1563 into a socket buffer that was allocated as the max allowable size of 1522. I can prevent the kernel crash from happening by just increasing the max allowable packet size to 1600 in the driver so that it allocates 1600 bytes every packet instead of 1522, rebuilding the driver, and following the same procedure I did before. What this seems to mean though is that something with the auto forwarding settings isn’t quite right. I should never be getting packets of lengths > 1522, so somehow packets are getting mushed together, and then the kernel freaks out when that packet is too long. Any ideas on the correct register settings in regards to auto forwarding to prevent this from happening?

 

Again it would seem the actual packet forwarding is functioning correctly, just what the driver is reading isn’t correct. I am waiting on a better network tap to verify as my current one has some issues, but it can capture the network traffic well enough. When comparing the two captures, I can see that occasionally the PRU ethernet isn’t getting the whole packet or sometimes misses a smaller packet entirely. In the below screencap, the first 66 bytes are missing, but then the rest of the data is correct (PRU ethernet capture on left, network tap on right)

It’s almost as if the PRU isn’t getting notified that the frame was there at the proper time, and so that RX L2 buffer gets overwritten before it can be read. That’s pretty much a wild guess at this point though. So yeah I guess my question is am I missing something in my settings that might explain all of this?

 

Settings are as follows right now:

 

RXCFG0 = 0x5B = RX_AUTO_FWD_PRE0 | RX_L2_EN0 | RX_MUX_SEL0 | RX_DATA_RDY_MODE_ DIS0 | RX_ENABLE0

RXCFG1 = 0x53 = RX_AUTO_FWD_PRE1 | RX_L2_EN1 | RX_DATA_RDY_MODE_ DIS1 | RX_ENABLE1

TXCFG0 = 0x140201 = TX_START_DELAY0 = 0x14 | PRE_TX_AUTO_SEQUENCE0| TX_ENABLE0

TXCFG1 = 0x140301 = TX_START_DELAY0 = 0x14 | PRE_TX_AUTO_SEQUENCE0| TX_MUX_SEL0 | TX_ENABLE0

MII_G_RT_ICSS_G_CFG = 0x1002D = RGMII | RGMII | RX_L2_G_EN | TX_L1_EN

"

Any suggestions on modifications to the PRU settings that may be able to mitigate this?

BR,

-RT

  • Hello Ryan,

    There have been many updates to the AM65x driver since 8.2, so the first debug step would probably be to try again with the latest firmware and drivers and see if the behavior persists.

    AM64x Linux SDK 8.6 just released this week. Since AM64x and AM65x use the same firmware and drivers, that is probably the latest (the AM65x SDK 8.6 release is planned for about a month from now, and will probably have some additional improvements to the firmware and drivers). I am checking with the Linux developer to see if he suggests testing with a different version of firmware/drivers.

    Regards,

    Nick

  • The developer confirmed that the version of the PRU firmware and Linux driver in AM64x Linux SDK 8.6 is the latest software I can currently point you to testing with. Note that there may be additional changes to the PRU firmware and Linux driver between the AM64x Linux SDK 8.6 release, and the AM65x Linux SDK 8.6 release in about a month.

    Regards,

    Nick

  • Some further notes from the developers:

    1) The PRU firmware has updates between 8.2 and 8.6 which may help here

    2) Note that packet sniffing will be limited by the ability of the A53 cores to handle all that traffic. It sounds like the AM65x will probably not be able to handle it if the data throughput of the packets it is sniffing is consistently high (i.e., packet sniffing of Ethernet frames where the throughput is close to the line rate of 1Gbit/sec is probably too much for A53 without a bunch of optimizations that we will not be able to answer questions about, like creating a custom PRU firmware that filters out packets at a lower layer before passing some of them up to Linux to sniff).

    3) There may be some additional Linux driver changes that are needed. Please let us know how the tests with Linux SDK 8.6 code go, I may need to file a bug against the driver.

    Regards,

    Nick

  • Hi Nick,

    Appreciate the quick responses here. I definitely agree that the upgrade of firmware should take care of some intermediary problems. I'll let you know if there is any trouble with the upgrade from 8.2 to 8.5 + AM64x PRU FW

  • Hi Nick,

    Customer was able to install the AM64x SDK and pull the PRU firmware from it: 

    "I initially pulled the whole drivers/net/ethernet/ti folder from it as well, however the davinci_mdio driver seemed to have some dependencies that I didn’t have so I just replaced the ICSSG specific files only. " 

    - Are there any other drivers that the ICSSG driver updates depend on?

    Their Linux image was rebuilt with BPF support, because the new driver uses XDP. THey rebuild the kernel and all relevant modules without errors, and install both and see them function. 

    - They couldn't find a menu option in the menuconfig to enable XDP sockets--is this a requirement in the menu?

    When they use the new driver, they see no more crashing from malformed packets!

    A different note: "While some of this could very well be from the A53 cores not being able to handle the traffic, I do see a very large discrepancy in lost / malformed packets when I have autoforwarding enabled, vs when I have a network tap going directly into a PRU eth port. So it seems to me that something still has to be up with the auto forwarding config right?"

    Additionally, they see a few messages on boot saying "unsupported resource 5" when each PRU boots and a settime timeout for each--what exactly does this mean?

    Logs: 

    [   12.568054] remoteproc remoteproc13: powering up b234000.pru

    [   12.583877] remoteproc remoteproc13: Booting fw image ti-pruss/am65x-sr2-pru0-prueth-fw.elf, size 38224

    [   12.614784] remoteproc remoteproc13: unsupported resource 5

    [   12.642804] remoteproc remoteproc13: remote processor b234000.pru is now up

    [   12.654367] remoteproc remoteproc14: powering up b204000.rtu

    [   12.670396] remoteproc remoteproc14: Booting fw image ti-pruss/am65x-sr2-rtu0-prueth-fw.elf, size 30872

    [   12.694347] remoteproc remoteproc14: remote processor b204000.rtu is now up

    [   12.701448] remoteproc remoteproc15: powering up b20a000.txpru

    [   12.727656] remoteproc remoteproc15: Booting fw image ti-pruss/am65x-sr2-txpru0-prueth-fw.elf, size 37328

    [   12.746326] remoteproc remoteproc15: remote processor b20a000.txpru is now up

    [   12.783938] net eth5: started

    [   12.812570] remoteproc remoteproc10: powering up b138000.pru

    [   12.818580] remoteproc remoteproc10: Booting fw image ti-pruss/am65x-sr2-pru1-prueth-fw.elf, size 38496

    [   12.838400] remoteproc remoteproc10: unsupported resource 5

    [   12.845373] remoteproc remoteproc10: remote processor b138000.pru is now up

    [   12.853521] remoteproc remoteproc11: powering up b106000.rtu

    [   12.866513] remoteproc remoteproc11: Booting fw image ti-pruss/am65x-sr2-rtu1-prueth-fw.elf, size 30104

    [   12.877346] remoteproc remoteproc11: remote processor b106000.rtu is now up

    [   12.887203] remoteproc remoteproc12: powering up b10c000.txpru

    [   12.898455] remoteproc remoteproc12: Booting fw image ti-pruss/am65x-sr2-txpru1-prueth-fw.elf, size 35836

    [   12.909259] remoteproc remoteproc12: remote processor b10c000.txpru is now up

    [   12.927260] icssg-prueth icssg1-eth: settime timeout

    BR,

    -RT