This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/TMDSICE3359: Powerlink receive issue

Part Number: TMDSICE3359

Tool/software: Linux

Hi,

I am trying to integrate the Powerlink stack into the Openmac application. I am facing some issues when I am trying to implement this. I am currently working on Linux MN and I am facing an issue , we are not receiving some packets properly. On debugging this issue , we understood that the Rx interrupt is being triggered even when no packets are being received. But when I comment out the the Powerlink task in my application , my Rx and Tx is working fine. But when I add the Powerlink task the Rx interrupt is triggered even when no Ethernet cable is connected.

Below are the tasks that I am having in my application ,their priorities and the scheduling policies:-

Rx task – 95 (RR)

Tx task – 90 (RR)

Openmac task – 85 (RR)

Powerlink task – 85 (RR)

Kernel event task – 85 (FIFO)

User event task – 75(FIFO)

Firmware load task – 15 (RR)

Debug task – 1 (RR)

RR-Round Robin

FIFO – First In First Out

The Powelink layer uses a circular buffer which is used to post any event that occurs in the layer and is used as a mechaism to communicat ebetween various tasks.

Following are the debugging activities I have done and all these activities are done without connecting any ethernet cable :-

1) Printing the status of the SRSR0 register before and after the memcpy to circular buffer.

intc base+280 before memcpy in writedata = 80

intc base+200 before memcpy in writedata = 80

intc base+280 after memcpy in writedata = 80

intc base+200 after memcpy in writedata = 200080

2) On commenting the memcpy, the interrupt was not triggered.

3) The circular buffer used for memcpy is obtained by the mmap( ) function.

pInstance_p->pCircBuf = mmap(NULL, *pSize_p, PROT_READ | PROT_WRITE, MAP_SHARED,

pArch->fd, pageSize);

4) devmem2 0x4A320200

Memory mapped at address 0xb6fda000.

Read at address 0x4A320200 (0xb6fda200): 0x00200080

This log is obtained after wait_interrupt( ) in RX ISR

I think the 200080 value corresponds to TX but I am getting this value in the RX ISR.

Why am I getting such a behaviour. Looking forward for your support at the earliest.

Regards

Paramesh

  • Hi,
    openpowerlink.sourceforge.net/.../
    Above link is the documentation of the openpowerlink stack that I am trying to integrate.
    Another information I would like to share is that I am using RT Linux .



    Regards
    Paramesh
  • To give a better perspective I would like to add some more information on what I am exactly trying to convey.

    I am running a Linux RT application on AM335x. The higher layer is openpowerlink and the lower layer is Openmac driver which is a userspace driver. I am using UIO driver for memory mapping and interrupt mapping. The Rx interrupt number is 20. In the ideal condition , when the ethernet cable is connected and a packet is received , the firmware triggers the interrupt which is mapped to the ARM side. But in my case , the Rx interrupt is somehow triggered from the application and not by the firmware as expected. As I mentioned in my above post, when the data is written into the circular buffer , the value is reflected in the SRS0 register ,but the memory mapping of these two registers is different. Is there anyother possibility that the value can be written to the SRSR0 register ? How can I debug this issue ??



    Regards
    Paramesh
  • Your question is out of the scope of support of this forum. This forum supports only the Processor SDK provided by TI.
  • Hello Paramesh,

    Biser is right that we cannot support the full software stack or non-TI software in general. However, maybe we can help with some of the lower-level parts of your software.

    General questions:
    1) What version of Linux RT Processor SDK are you using?
    2) Tell us more about the firmware running on the PRU. E.g. is this prebuilt code provided by TI? If so, which code?
    3) Are you using any TI-provided code? (pru firmware, low level drivers, UIO mapping, etc?)
    4) Do you have any documentation or visuals for how your stack/application interacts, and how the software running on the ARM interacts with the firmware on the PRU? I'm trying to get a better picture of your system as a whole.

    Notes on PRU interrupts:
    SRSR0 shows the raw event status. Just because a system event is set to 1 here, doesn't mean it is able to interrupt the host. SECR0 shows pending events only if the system event is enabled. The fact that SRSR0 = 0x00200080 while SECR0 = 0x00000080 makes me suspect event in location 0x00200000 might not be able to generate an event that triggers RX ISR.

    5) Could you check ESR0 to see if 0x00200000 is even enabled?
    6) 0x00200000 corresponds to system event 21 in the PRU's INTC, not system event 20. When you say "Rx interrupt number is 20", are you referring to PRU's INTC system event 20, or does "Rx interrupt number is 20" refer to something else?

    Regards,
    Nick
  • Hi,

    1) The Linux RT Processor SDK version I am using is ti-processor-sdk-linux-rt-am335x-evm-04.00.00.04

    2) I am using the TI provided firmware for the Openmac driver which is the LLD.

    3)  PRU firmware is provided by TI and the LLD in RTOS was given by TI and I am porting it to Linux RT. LLD in Linux RT is userspace driver. UIO mapping template I am using is taken from the TI provided ICSS   EMAC.The standalone application in Linux RT is working properly and the basic Rx and Tx functionality is verified and its interaction with firmware is also proper. But, now I am trying to integrate the OpenPowerlink stack onto the the standalone application and I am facing the issue of Rx interrupt being triggered even when no packet is received.

    4) The application and the stack interaction is using the circular buffer that I have mentioned in my first post.The interaction between the ARM and the PRU firmware is as follows :-

    The buffer is in OCMC RAM and the buffer descriptor is in the shared RAM which is accessible to both the ARM and the PRU.There are two interrupts , one for Rx and other for Tx.

    When a packet is received the firmware gives the Rx interrupt and when a packet is send it gives the Tx interrupt. The interupt mapping obtained using UIO is verified in the standalone application.

    5) The Powelink layer uses a circular buffer and writes any datat into it and is used as a mechaism to communicate between various tasks.

    Before memcpy :-

    Base Address

    OFFSET

    Value

    Register

    INTC base

    200

    80

    SRSR0

    INTC base

    280

    80

    SECR0

    INTC base

    300

    f00080

    ESR0

    After memcpy :-

    Base Address

    OFFSET

    Value

    Register

    INTC base

    200

    200080

    SRSR0

    INTC base

    280

    200080

    SECR0

    INTC base

    300

    f00080

    ESR0

    According to my understanding of the values obtained, the event 20 and 21 is enabled.

    6) When I said "Rx interrupt number is 20", I am referring to PRU's INTC system event 20. Sorry for the lack of clarity.

    Further explaination of my issue is given below:-

    The issue of Rx interrupt being triggerd even when no packet is being received occurs when the powerlink thread tries to communicate with another thread while writing into the circular buffer.

    This behaviour occurs not for any specific address of destination or source used for memcpy.

    Example 1 :-

    intc base+280 before memcpy in writedata = 80

    intc base+300 before memcpy in writedata = f00080

    intc base+200 before memcpy in writedata = 80

    Destination of memcpy in writedata= b665d13c

    Source of memcpy in writedata = b64fee10

    intc base+280 after memcpy in writedata = 200080

    intc base+300 after memcpy in writedata= f00080

    intc base+200 after memcpy in writedata = 200080

    Example 2 :-

    intc base+280 before memcpy in writedata = 80

    intc base+300 before memcpy in writedata = f00080

    intc base+200 before memcpy in writedata = 80

    Destination of memcpy in writedata= b658e13c

    Source of memcpy in writedata = b63fee10

    intc base+280 after memcpy in writedata = 200080

    intc base+300 after memcpy in writedata= f00080

    intc base+200 after memcpy in writedata = 200080

    When memcpy is performed , value is properly copied from source to edstination but in addition to that value of the SRSR0 register is also getting modified.

    Even though the bit 21 is getting set in the SRSR0 register, interrupt triggered is 20.

    I am not expecting a firmware – driver communication issue as its working has been already verified with the standalone application.

    I am suspecting the an UIO mapping issue ? Is there any chance of the UIO mapping to get corrupted ?

    Looking forward for your support.

    Regards

    Paramesh

  • Hello Paramesh,

    Ok, so SECR0 is getting changed after a memcpy - in that case, it makes sense that the host is getting an interrupt when the flag for system event 21 goes high.

    It looks like the original EMAC LLD driver uses both interrupt 20 and 21 for RX, and both interrupts are mapped to the same ISR: Interrupt 20 is related to Port 0, while Interrupt 21 is related to Port 1.

    1) Are you still using both Port 0 and Port 1 for RX? (and interrupts 20 and 21 for the respective ports?)

    2) If not, was interrupt 21 associated with something else (like TX)? Could it still be mapped to the RX ISR, even if it was associated with something other than an RX on the port?

    3) "Even though the bit 21 is getting set in the SRSR0 register, interrupt triggered is 20." - What in your system is telling you the interrupt triggered is 20?

    4) What actions are taken when a memcpy occurs to the circular buffer? (or result after other parts of the software learn that a memcpy has occured?) Could any of those actions potentially be related to setting system event 21?

    Regards,

    Nick

  • Hi,

    1) I am using only Port0 for Rx and interrupt is 20.

    2) Interrupt 21 is associated with Tx of port 0.

    3) The program flow reaches the RX ISR (corresponding to interrupt 20) and I had verified the same by giving a  printf command  inside Rx ISR which appeared in the console.

    4) The concept of circular buffer is used to communicate between the various threads in the application. When one thread writes data into the circular buffer, another thread will read this data and process this information. I had verified that the virutal address of the destination location used for memcpy and the SRSR0, ESR0 registers are different.

    Thanks

    Paramesh

  • Hello Paramesh,

    1) In the original code, interrupt 21 was mapped to the RX ISR. Since interrupt 21 is still generating an RX ISR in your current code, I'd investigate that mapping first and make sure it is correct in your code.

    2) It sounds like something that is either performing the write to the circular buffer or reading from the circular buffer is causing a TX event to occur. That would be the next thing I'd look into with the higher level code.

    Regards,
    Nick
  • Hi,

    1) I am using a different firmware and not the one used for ICSS Emac. In this case interrupt 20 is mapped to RX and interrupt 21 is TX. But I am using the UIO mapping from ICSS emac.

    2)  We have disabled the flag which indicates to the firmware to start transmission. But still the interrupt is triggered , but the value is not reflected in the SRSR0 register .

        Console log is given below :-

    tx0 after interrupt

    /dev/mem opened.

    Memory mapped at address 0xb6f27000.

    Read at address 0x4A320200 (0xb6f27200): 0x00000000                       

    Hence we are suspecting the mapping in UIO driver. What could be the issue ?

    Regards

    Paramesh

  • Hi,

    I would like to add some more details.

    I have disabled the code which indicates the firmware to trigger the interrupt, but still I am able to see the print I have given in the RX ISR. But the values are not updated in the interrupt register.

    /dev/mem opened.

    Memory mapped at address 0xb6f56000.

    Read at address 0x4A320200 (0xb6f56200): 0x00000080

    /dev/mem opened.

    Memory mapped at address 0xb6f0b000.

    Read at address 0x4A320280 (0xb6f0b280): 0x00000080

    /dev/mem opened.

    Memory mapped at address 0xb6f74000.

    Read at address 0x4A320300 (0xb6f74300): 0x00F00080

    From which I suppose that the firmware is not triggering the interrupt.

    There is a memcpy in my application .The memcpy is not related to the interrupt , but the ISR is triggered incorrectly by the UIO.

    But ,when the memcpy is disabled , my interrupt is not triggered incorrectly.

    Is there any possibility that the UIO driver would incorrectly trigger the ISR ?

    Looking forward for your support.

    Regards

    Paramesh

  • >>3) The circular buffer used for memcpy is obtained by the mmap( ) function.

    >>

    >>pInstance_p->pCircBuf = mmap(NULL, *pSize_p, PROT_READ | PROT_WRITE, MAP_SHARED,

    >>pArch->fd, pageSize);

    What is pArch->fd and pSize_p here?

    Also, can you please check the log of "cat /proc/interrupts | grep pru" to see if the interrupt statistics are as expected?

  • Checking in to see if this needs more support or if we should close the ticket.

    Regards,
    Nick
  • Paramesh, please confirm if it is resolved. If not (my understanding from last call), please update it and help us replying Hongmei's questions.

    thank you,
    Paula
  • Hi,

    I am Paramesh's colleague and I will be following up the issue.
    The values of pArch->fd and pSize_p in the mmap function is as follows.

    pArch->fd=11,pSize_p=32768

    pArch->fd=12,pSize_p=32768

    pArch->fd=13,pSize_p=32768

    pArch->fd=14,pSize_p=32768

    pArch->fd=15,pSize_p=32768

    pArch->fd=16,pSize_p=8192

    pArch->fd=17,pSize_p=2048

    pArch->fd=18,pSize_p=2048

    pArch->fd=19,pSize_p=2048

    pArch->fd=20,pSize_p=2048

    pArch->fd=23,pSize_p=32768

    I will share the log regarding the interrupt statistics at the earliest.

    Regards,
    Akshay
  • Hi Akshay, do the interrupt statistics are as expected?

    thank you,
    Paula
  • I am assuming this has been resolved. Please reply if the issue needs more attention.

    Regards,
    Nick
  • Hi,

    Apologies for the delay, I was debugging another critical issue occurred in the project.This issue is not solved yet.

    I spent some time on interrupt statistics, the observations are given below. It looks okay for me.

    175: 0 INTC 20 Level uio_pruss_evt0

    176: 1141 INTC 21 Level uio_pruss_evt1

    177: 0 INTC 22 Level uio_pruss_evt2

    178: 0 INTC 23 Level uio_pruss_evt3

    179: 0 INTC 24 Level uio_pruss_evt4

    180: 0 INTC 25 Level uio_pruss_evt5

    181: 0 INTC 26 Level uio_pruss_evt6

    182: 0 INTC 27 Level uio_pruss_evt7



    Another observation I would like to share is that in the case of Tx interrupt, bit number 21(PRUICSS-event 21) in SECR0 register is getting
    set and getting cleared properly. But the issue is, in the case of Rx the ISR is getting called without the bit number 20(PRUICSS-event 20) of SECR0 register getting set.

    Do you see any other reason for this issue ?

    Thank you,

    Akshay

  • Hello Akshay,

    I would be curious to see the actual mapping in your UIO code of PRU System Events (PRU user interrupts) to ICSS Host channels to ARM PRU interrupts.

    Regards,
    Nick
  • Hi Nick,

        I am attaching the source file regarding the mapping . PRU_ARM_EVENT1, PRU_ARM_EVENT2, PRU_ARM_EVENT3, PRU_ARM_EVENT4 are the ones we are concerned , and from the code we can see that it is mapped to PRU_EVTOUT0, PRU_EVTOUT1, PRU_EVTOUT2, PRU_EVTOUT3 respectively. I hope this is the mapping you mentioned in the forum, otherwise please let me know .

    Regards,

    Akshaytiemac_pruss_intc_mapping.h

  • Hello Akshay,

    Could you post the file defining what INTC INITDATA your prussdrv_pruintc_init() function is expecting? In the am335x Linux SDK 4.3 I'm looking at, I found this definition:

    from 
    board-support/extra-drivers/uio-module-drv-2.2.1.0+gitAUTOINC+bda9260f22/test/prussdrv_test/include/prussdrv.h
    
    typedef struct __sysevt_to_channel_map {
        short sysevt;
        short channel;
    } tsysevt_to_channel_map;
    typedef struct __channel_to_host_map {
        short channel;
        short host;
    } tchannel_to_host_map;
    typedef struct __pruss_intc_initdata {
        //Enabled SYSEVTs - Range:0..63
        //{-1} indicates end of list
        char sysevts_enabled[NUM_PRU_SYS_EVTS];
        //SysEvt to Channel map. SYSEVTs - Range:0..63 Channels -Range: 0..9
        //{-1, -1} indicates end of list
        tsysevt_to_channel_map sysevt_to_channel_map[NUM_PRU_SYS_EVTS];
        //Channel to Host map.Channels -Range: 0..9  HOSTs - Range:0..9
        //{-1, -1} indicates end of list
        tchannel_to_host_map channel_to_host_map[NUM_PRU_CHANNELS];
        //10-bit mask - Enable Host0-Host9 {Host0/1:PRU0/1, Host2..9 : PRUEVT_OUT0..7)
        unsigned int host_enable_bitmask;
    } tpruss_intc_initdata;
    

    which expects 2 value structures rather than the 4 value structures you are passing in.

    Regards, 

    Nick

  • Hi Nick,

    Thanks for the support. I have done the mapping considering icss-emac as reference and as implemented in that I have passed the PRUSS_INTC_INITDATA to PRUICSS_pruIntcInit() function.I have not called prussdrv_pruintc_init() in my application.

    Thank you,
    Akshay
  • Hello Akshay,

    1) Revisiting the interrupt statistics: is the 1141 INTC 21 interrupts only from TX, or from both RX and TX? Can INTC 21 cause the TX ISR to be called, or does it only call the RX ISR?

    2) I am having trouble going into the PRUSS function that performs the mapping, so let's check the mapping at a register level. Please post the register values for ESR0, ESR1, ECR0, ECR1, CMR0 - CMR15, HMR0 - HMR2, HIER.

    Regards, 

    Nick

  • Hi Nick,

    INTC 21 is for the TX ISR .

    The following are the register values when the RX ISR is called.

    ESR 0 - 0x03F00080

    ESR 1 - 0x00600600

    ECR 0 - 0x03F00080

    ECR 1 - 0x00600600

    CMR 0 - 0x00000000

    CMR 1 - 0x01000000

    CMR 2 - 0x00000000

    CMR 3 - 0x00000000

    CMR 4 - 0x00000000

    CMR 5 - 0x05040302

    CMR 6 - 0x00000504

    CMR 7 - 0x00000000

    CMR 8 - 0x00000000

    CMR 9 - 0x00000000

    CMR 10 - 0x00000700

    CMR 11 - 0x00000000

    CMR 12 - 0x00000000

    CMR 13 - 0x00010800

    CMR 14 - 0x00000000

    CMR 15 - 0x00000000

    HMR 0 - 0x03020000

    HMR 1 - 0x08060504

    HMR 2 - 0x00000009

    HIER - 0x000003FF


    Thank you,
    Akshay

  • Hello Akshay,

    Your PRU INTC setup is not the issue. System events 20 and 21 are both enabled. Event 21 is the only event mapped to channel 3, which is the only channel that maps to host interrupt 3. Event 20 is the only event mapped to channel 2, which is the only channel that maps to host interrupt 2. Host interrupts 2 and 3 are both enabled.

    If the ARM side RX ISR is ONLY associated with ARM interrupt 20 PRU_ICSS_EVTOUT0 and with no other ARM interrupts, I would not expect the program to enter the RX ISR if system event 20 was not triggered.

    1) On the interrupt statistics: Paramesh's post here makes it sound like PRU system event 21 is triggered when the RX ISR is called. That is why I am curious if the 1141 INTC 21 interrupts you saw are only caused from TX, caused from both RX and TX, caused only from RX, or caused by an action that is neither RX nor TX, etc. 

    2) Does PRU INTC 21 cause both the TX ISR and the RX ISR to be called? Does it only call the RX ISR? Does it do something else? 

    3) Does PRU INTC 20 cause the RX ISR to be called (when it occurs)? Does it do something else?

    Regards, 

    Nick

  • Hi Nick,

     Thanks for the support. Interrupts are created as shown below , here "EdrvRx0InterruptHandler" and "EdrvTx0InterruptHandler" are the RX ISR and TX ISR respectively.   

    Interrupt number Interrupt Service Routine 
    20 EdrvRx0InterruptHandler
    21 EdrvTx0InterruptHandler

    Since the Application wants to send the packets, TX ISR is continously triggered by the firmware as expeceted. But the RX ISR is being called without triggering the interrupt by firware.

    Thankyou,

    Akshay

  • Hello Akshay,

    So to restate:

    * each and every single time EdrvTx0InterruptHandler is called, there is a firmware interrupt that called it, and that firmware interrupt was caused by PRU system event 21.

    * each and every single time EdrvRx0InterruptHandler is called, there is no firmware interrupt that called it. You have verified that no firmware interrupt is triggered (checking all firmware interrupt numbers) and are not just making an assumption.

    So you are saying Paramesh's post here is incorrect when he says "When memcpy is performed, ... bit 21 is getting set in the SRSR0 register, [but the] interrupt triggered is 20 [EdrvRx0InterruptHandler]".

    I am confused about the behavior you are observing. Could you describe the complete behavior of the issue again as if this is was the first post about it?

    Regards, 

    Nick

  • Hi Nick,

    Thanks for the support and sorry for the confusion. I am describing the complete behavior of the issue.

    We are porting the powerlink into AM335x ICE v2. We are using openPOWERLINK stack version 2.6.1 for the porting.

    The lower layer of the application, MAC driver is ported from RTOS to Linux.UIO driver is used for making interrupt and memory available to user space. For this implementation ICSS-EMAC UIO code was used as reference.

    Event number

    eventout

    ISR

    20

    PRU_EVTOUT0

    EdrvRx0InterruptHandler

    21

    PRU_EVTOUT1

    EdrvTx0InterruptHandler

    22

    PRU_EVTOUT2

    EdrvRx1InterruptHandler

    23

    PRU_EVTOUT3

    EdrvTx1InterruptHandler

    We have completed the porting and when running we can see that Rx ISR is called continously.When we checked the situation we have confirmed that the we are not reveiving any packets and we are not connecting the cable to the board.So we checked the SECR0 register value when the RX ISR is called, the 20th bit of the register is not set.So our understanding is without triggering the interrupt the RX ISR is being called.

    When the Application is running it will periodically send packets. So the Tx interrupt is getting triggered and it is working properly. When we checked the SECR0 register value in the Tx ISR, bit number 21 is getting set and cleared properly.

    Our issue is that without triggering the Interrupt the RX ISR is getting called.

    Thank you,

    Akshay

  • Hello Akshay,

    1) For your interrupt statistics, did you use cat /proc/interrupts? I would be surprised if the interrupt count did not increment, but the RX ISR was somehow still getting called. It would be a good idea to insert a printk or something similar into the RX ISR to verify that ISR is the one getting called if you have not already.

    2) I would be curious to see if the RX ISR continues to get called if the PRU gets powered off. That would help to isolate if it is coming from the PRU or the ARM system.

    Regards,
    Nick
  • Hi Nick,

    Thanks for the support.

    I used the lilnux command "cat /proc/interrupts | grep pru"to get the interrupt statistics. I am attaching the screenshot of the log after running the application for some time. As you can see from that the rx0 interrupt count shows 0, where the rx isr is called 91 times ("inside rx interrupt handler" is printed from rx isr).

    On checking whether the issue is coming from ARM or PRU system, we are not connecting the lan cable and since the intc value is 0 is it okay to assume that interrupt is not being triggered ?

    Without the working of firmware our application will stop, since we need the feedback from firmware to run the application. We are trying on a workaround on this issue, but it may take some time.

    Thank you,

    Akshay

  • Hello Akshay,

    We cannot know for sure while the PRU is running. However, your results above make me suspect that something on the ARM side is the culprit rather than the PRU.

    1) So far we have only talked about the RX ISR getting triggered on its own. Can you receive an RX packet in the PRU and trigger the RX ISR as expected?

    2) Are there any patterns to entering the RX ISR? e.g., when you look at the timestamps of when the ISR is entered, is the ISR getting called in a periodic fashion?

    3) I understand this is holding up progress. Does it actually affect the function of the code if the RX ISR gets called at an unexpected time? If so, how?

    TI does not support openmac, so we cannot help you if the issue is related to openmac. The Linux CPSW interface has a polling function which I might expect to call the RX ISR on its own. I do not know how the RTOS LLD driver works, but I would not be surprised if it implemented something similar.

    Regards,
    Nick
  • Hi Nick,

        Thanks for the support. When I send 1000 packets to my application , from the interrupt statistics (using cat /proc/interrupts | grep pru  command) I can observe 1000 interrupts is being triggered, but the RX ISR is being called 1054 times.

    I will respond to the other 2 queries as early as possible.

    Thank you,

    Akshay

  • Hello Akshay,

    As per our offline discussion, try

    1) Updating from Linux SDK 4.0 to the latest SDK (5.0) to see if the problem resolves itself
    2) Commenting out the RX portion of the PRU firmware to see if the ARM RX ISR is still getting called
    3) gathering more information about the ISR calls (e.g., timestamp to see if behavior is periodic, perhaps trying to check the link register to see what is calling the ISR, etc)

    Regards,
    Nick
  • Further thoughts on the points from my previous post:

    2) Also try running the ARM program without loading any PRU firmware. If it does not work, fine - but if the RX ISR somehow still gets triggered when there is nothing on the PRU, that definitively narrows down what could be causing the interrupt.

    3) Is there a way to distinguish between good and bad RX ISR calls? e.g., is there data waiting somewhere? Could we just insert an if statement, check if the call is good or bad, and exit the ISR if it is a bad RX ISR call?

    Regards,
    Nick
  • Hi Nick,

    Thanks for supporting us.We have tried the following things you suggested.

    1. Updated to sdk 5.0 and the issue still remains.
    2. Disabled the Rx part of the firmware and ran the application, the RX ISR is still getting called.
    3. Took the timestamp when RX ISR is getting called, we couldn't find any specific pattern.

    Futher more, mmap_helper.c(armv7->linux->mmap_helper.c) is the file in which we have the UIO functions.We have debugged the wait_interrupt() function in mmap_helper.c file and confirmed that the interrupt is triggered from the UIO driver.

    Thank you,
    Akshay

  • Hello Akshay,

    Facts from discussion: This issue only happens when the powerlink stack is running - it behaves as expected when openMAC is running without powerlink on top. Communication occurs in a 10ms cycle time at regular intervals. Connection drops if the AM335x does not send/receive packets in time.

    We suspect this may be a system design issue related to scheduling in RT Linux rather than a UIO issue.

    Things to try:

    1) Take a look at priority. Increase priority of openpowerlink ABOVE the kernel (so greater than 50). 51 would be typical. For FIFO scheduling, set openMAC to be higher priority than powerlink (52). Etc.

    2) Try increasing the cycle time from 10ms to 20ms, 30ms, etc - even up to100ms and beyond. If increasing the cycle time "fixes" the issue, it could be related to scheduling

    3) Show down tasks that are not needed but are taking up processing space. run "top" to see what is taking up the most processing space, and shut down the ones that aren't needed - web browser, avahi daemon, etc. Do NOT shut down systemd. This will help to limit context switches, so the scheduler will return to the powerlink-related tasks more quickly.

    4) Try posting on openpowerlink forum - has anyone seen anything like this before?

    5) You can try writing a workaround into the RX ISR, where it checks the PRU INTC registers before taking any action. However, if the issue is related to priority/scheduling, then this might not fix the issue.

    Outstanding questions: Does the powerlink stack make calls intending to poke the driver if it is not receiving things at an expected rate?

    Regards,
    Nick
  • Hi,
    Here are some steps for possibly evaluting if system latency is a contributor to the issue being experiened.

    - The TI Linux SDK boots a user space based on systemD. There are a few daemons that get started automatically that may impact the overall system latency. Kill the avahi daemons, find them using this command "ps aux | grep avahi", you should see 3 pids listed, use kill -9 the first 2 pids found, the third was your grep and should not be active.

    - To change the thread priority and scheduling policy of the power link and openmac applications you will use the chrt command, this training video talks about RT linux and using the chrt command
    training.ti.com/linux_app_dev_using_linux_rt_sdk

    - Depending on if the 100mS cycle time does not improve the issue please take a look at this "how to video" that demonstrates how to boot to a single shell which is essentially no user space. This video describes a script that you will have create. I am assuming that the two applications are independently launched. Remember to launch the openmac and powerlink applications to the background with a "&" on the end of the command line (ex. openmac & <cr> powerlink & <cr>), this way you will still have a command prompt.
    training.ti.com/boot-to-shell-sitara
    The script in the video is described here
    processors.wiki.ti.com/.../5x

    Please note that these are debug or problem isolation steps, they should be used for evaluating on how to improve the latency issue being experienced and not used directly as a problem resolution.

    Best Regards,
    Schuyler
  • Hi,

    Thank you Nick and Schuyler for supporting us. We tried the following things today.

    1. We tried with the priority above the kernal, specifically we tried with openpowerlink = 51 ,openmac = 52 and openpowerlink = 84, openmac = 85

        But in both instance we are getting the same behavior. 

    4.We have posted in the openpowerlink forum. Hopefully we will get a reply soon.

    5. we tried the workaround , but the there is no change in the observation. 

    Thank you,

    Akshay

  • Hi,

    Were you able to set the scheduling policy to SCHED_FIFO.

    Best Regards,

    Schuyler

  • Hi Akshay, we have few questions for you:


    - Could you confirm if Powerlink stack add more interrupts? I think no, but want to double check


    - Do we have a better idea on how adding Powerlink stack changes the behavior of the system?


    - Where Rx ISR is located? Somewhere in user space?


    - Where is the Rx Interrupt handler counter located? I guess is "num_rx0_intr" in EdrvOpenMac.c?


    - I think we cover this before, but still no clear. Why MN thinks the communication with CN failed? is there any warnings/errors? any debug mode we can turn on/off?

    - Just for your information, in UIO Tx side, there was one more event which was multiplexed in the original implementation (if we recall correctly it was related with TTS implementation). We don't think what you are facing is due to that, however, because your interrupts numbers are different to our ICSS EMAC we were wondering if you can use same interrupt numbers as EMAC drv example, just to confirm what you are facing is not related with this Tx event multiplexation.

    In summary, a potential test would be to go back to the initial interrupt mapping and see if the problem still exists.

    - Could we add a conditional breakpoint at EdrvOpenMac.c where "wait interrupt" is called?. The idea is to check if we get there when there is a spurious ISR call.  Our understanding is:


    - UIO select should not return if it does not receive an interrupt.
    - UIO should not return if there is no interrupt on /proc/interrupt

    So we shouldn't hit this breakpoint when there is a wrong ISR.

    Please let us know if you have any questions,

    Thank you,

    Paula

     

     

  • Hi Paula,

     As discussed in the call we have added a work around for the RX ISR issue. We will be debugging on this once the communication issue is solved.

    Thank you,

    Akshay

  • I am closing this thread for now. We can open another one later as needed.

    Regards,
    Nick