This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

System hang up with intensity communication between PRU and host system

Other Parts Discussed in Thread: 4430

Hello.

I'm new comer  in PRU programming. I have built PRU example PRU_RPMsg_Echo_Interrupt0 and loaded to Sitara PRU0. In general case this sample work when I manually send following command "echo hello > /dev/rpmsg_pru30" and "cat /dev/rpmsg_pru30". After that I decided to write small test program which send data to PRU and reading back:

#include <stdio.h>
#include <stdlib.h>
// Driver header file
//#include "prussdrv.h"
//#include "pruss_intc_mapping.h"
#include <sys/stat.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#define DEVICE_PATH "/dev/rpmsg_pru30"
#define GET_MSG_ID  0x4745544D
#define PRU_BUFF_SIZE 256

int            sig_stop;            // сигнал завершения работы программы
static void sign_term(int sig)
{
     switch(sig){
        case SIGTERM:
            sig_stop=1;
        break;
    }
}

int main(int argc, char **argv)
{
   int fd;
   char data_buff[513];
   uint32_t send_data=GET_MSG_ID;
   fd=open(DEVICE_PATH,O_RDWR);
   int ret;
   FILE* data_file=fopen("balisa.bin","wb+");
   int start_write;
   start_write=0;
   sig_stop=0;
   int total_recv_bytes=0;
   if(fd>0 && data_file!=0){
       printf("Program started\n"); fflush(stdout);
       while(sig_stop==0){
            ret=write(fd,&send_data,4);
            if(ret!=4){
                printf("Failed send data to PRU\n"); fflush(stdout);
            }
            ret=read(fd,data_buff,sizeof(data_buff));
            if(ret<PRU_BUFF_SIZE && start_write==0){ start_write=1;}
            if(ret==PRU_BUFF_SIZE && start_write==1){
                printf("RPMSG buffer overflow\n"); fflush(stdout);
            }
            if(start_write==1 && ret>0){
                total_recv_bytes+=ret;
                printf("Start writing %d. Total bytes %d\n",ret,total_recv_bytes); fflush(stdout);
                fwrite(data_buff,1,ret,data_file);
                fflush(data_file);
            }
           
       }


   }
   printf("Program finished\n"); fflush(stdout);
   if(data_file!=0){fclose(data_file);}
   if(fd>0){close(fd);}
   return 0;
}

In first 15 seconds program is workin fine, but the system hang up.

Could you help me in this issue?

My board - beaglebone black.

My Linux based on 4.4.16

PRU compiller from CCS 6.1.3

Thank you

  • Hi Viktor,

    I suggest you to add debug messages to localize on which command the system hangs up.

    BR
    Tsvetolin Shulev
  • Hello.
    I've found source of problem it's rpmsg_pru kernel module.
    When I've call dmesg (before hang up) I saw:

    [ 376.850207] rpmsg_pru rpmsg0: Message length table is full
    [ 376.855721] rpmsg_pru rpmsg0: Message length table is full
    [ 376.861236] rpmsg_pru rpmsg0: Message length table is full
    [ 376.866750] rpmsg_pru rpmsg0: Message length table is full

    For me it's little bit strange. Why message length overflow? I write to PRU 4 byte after that I read 4 bytes
  • Small update:

    I can write/read only 65535 to pru (no matter which pru number) iteration after that I have hung up.

    I've written small program which write/read 65530 iteration to PRU1. After that I restart rpmsg_pru module (rmmod rpmsg_pru/modprobe rpmsg_pru). After 5 iteration I have had a  hang up.

    It's look like rpmsg_lib feature (or bug).

    4430.145849] rpmsg_pru: adding data 13. FIFO avaible 8192
    [ 4430.152121] rpmsg_pru: reading data 512/0/13
    [ 4430.157793] rpmsg_pru: adding data 13. FIFO avaible 8192
    [ 4430.164080] rpmsg_pru: reading data 512/0/13
    [ 4553.689481] rpmsg_pru rpmsg0: new rpmsg_pru device: /dev/rpmsg_pru30
    [ 4553.705549] rpmsg_pru rpmsg1: new rpmsg_pru device: /dev/rpmsg_pru31
    [ 4569.013582] rpmsg_pru: adding data 13. FIFO avaible 8192
    [ 4569.020810] rpmsg_pru: reading data 512/0/13
    [ 4569.026119] rpmsg_pru: adding data 13. FIFO avaible 8192
    [ 4569.033236] rpmsg_pru: reading data 512/0/13
    [ 4569.038653] rpmsg_pru: adding data 13. FIFO avaible 8192
    [ 4569.045458] rpmsg_pru: reading data 512/0/13
    [ 4569.050665] rpmsg_pru: adding data 13. FIFO avaible 8192
    [ 4569.057682] rpmsg_pru: reading data 512/0/13
    [ 4569.062924] rpmsg_pru: adding data 13. FIFO avaible 8192
    [ 4569.069613] rpmsg_pru: reading data 512/0/13
    [ 4569.074875] rpmsg_pru: adding data 13. FIFO avaible 8192
    [ 4569.080228] rpmsg_pru: adding data 13. FIFO avaible 8179
    [ 4569.085571] rpmsg_pru: adding data 13. FIFO avaible 8166
    [ 4569.090912] rpmsg_pru: adding data 13. FIFO avaible 8153
    [ 4569.096252] rpmsg_pru: adding data 13. FIFO avaible 8140
    [ 4569.101593] rpmsg_pru: adding data 13. FIFO avaible 8127
    [ 4569.106934] rpmsg_pru: adding data 13. FIFO avaible 8114
    [ 4569.112276] rpmsg_pru: adding data 13. FIFO avaible 8101
    [ 4569.117616] rpmsg_pru: adding data 13. FIFO avaible 8088
    [ 4569.122956] rpmsg_pru: adding data 13. FIFO avaible 8075
    [ 4569.128297] rpmsg_pru: adding data 13. FIFO avaible 8062
    [ 4569.133638] rpmsg_pru: adding data 13. FIFO avaible 8049
    [ 4569.138977] rpmsg_pru: adding data 13. FIFO avaible 8036
    [ 4569.144318] rpmsg_pru: adding data 13. FIFO avaible 8023

  • Small update again:
    When hang up happened in log file I have following message:

    [ 811.057315] rpmsg_pru: reading data 512/0/13
    [ 811.057722] rpmsg_pru: reading data 512/0/13
    [ 811.060441] NOHZ: local_softirq_pending 102
    [ 811.060964] NOHZ: local_softirq_pending 102
    [ 811.064861] NOHZ: local_softirq_pending 102
    [ 811.497044] rpmsg_pru: adding data 13. FIFO avaible 7542
    [ 811.502388] rpmsg_pru: adding data 13. FIFO avaible 7529
    [ 811.507729] rpmsg_pru: adding data 13. FIFO avaible 7516
    [ 811.513069] rpmsg_pru: adding data 13. FIFO avaible 7503
    [ 811.518409] rpmsg_pru: adding data 13. FIFO avaible 7490
    [ 811.523749] rpmsg_pru: adding data 13. FIFO avaible 7477
    [ 811.529089] rpmsg_pru: adding data 13. FIFO avaible 7464
    [ 811.534429] rpmsg_pru: adding data 13. FIFO avaible 7451
    [ 811.539769] rpmsg_pru: adding data 13. FIFO avaible 7438
    [ 811.545111] rpmsg_pru: adding data 13. FIFO avaible 7425
    [ 811.550451] rpmsg_pru: adding data 13. FIFO avaible 7412
    [ 811.555791] rpmsg_pru: adding data 13. FIFO avaible 7399
    [ 811.561132] rpmsg_pru: adding data 13. FIFO avaible 7386
    [ 811.566473] rpmsg_pru: adding data 13. FIFO avaible 7373

    For me:
    [ 811.060441] NOHZ: local_softirq_pending 102
    [ 811.060964] NOHZ: local_softirq_pending 102

    little bit strange.


    You can easy reproduce proble if changing NUM_MESSAGES from 100 to 65536 or more (in PRU_RPMsg_Echo_Interrupt0 or PRU_RPMsg_Echo_Interrupt1) it's look like a bug of rpmg_lib (not kernel space driver).

    PS. May be anyone can answer me. It's look like as serious problem...
  • Viktor,

    Can you run the user space program provided in Lab 5 of the PRU Software Support Package and see if the issue still occurs? git.ti.com/.../rpmsg_pru_user_space_echo.c

    I have changed NUM_MESSAGES to 65536 and can run the program multiple times without any lockup occurring.

    I am using the Linux Processor SDK v3.0.0.4 along with the out of box PRU_RPMsg_Echo_Interrupt firmwares provided in /lib/firmware/pru/ along with the modified lab_5 user space program mentioned above.

    Jason Reeder
  • Hi Jason.

    Thank you for your answer.

    Yes, I'm using program from Lab 5.

    For building my rootfs I'm using buiildroot-2016.02 and kernel 4.4.19 from https://github.com/beagleboard/linux.git"

    Cross-compiller linaro arm 2015.11. I don't use SDK

    May be problem with my compiler?

    I'm not first person which have this issue. For example in this topic we have the same issue groups.google.com/.../

    Best regards,

    Viktor.

  • Jason,

    Small question:
    Do you use ti-processor-sdk-linux-am335x-evm-03.00.00.04-Linux-x86-Install.bin or ti-processor-sdk-linux-rt-am335x-evm-03.00.00.04-Linux-x86-Install.bin?
    May be problem in the kernel from github.com/.../linux.git ?

    Thank you.
  • Viktor,

    Where did you add your prints that you posted above? I am assuming that your 'rpmsg_pru: reading data' is in the rpmsg_pru_read function and that your 'rpmsg_pru: adding data' is in the rpmsg_pru_cb function correct?

    The 'Message length table is full' occurs when the kernel FIFO contains 32 outstanding messages. When means, that the PRU has sent 32 messages to the ARM core without the ARM core reading them off of the kernel FIFO.

    Here's the order of what happens on a write (then echo) from user space:
    user space writes to rpmsg_pruXX char device -> rpmsg_pru_write function is called -> rpmsg_send is used by the kernel to send the message from ARM to PRU -> PRU receives the message -> PRU echoes the message back to the ARM core using rpmsg -> the rpmsg_pru_cb function occurs when the ARM receives the message -> if the kernel FIFO is not full or does not contain 32 messages, then the kernel driver places the message on the kernel FIFO.

    When you read from char device in user space:
    if the kernel FIFO is not empty then the message is taken from the kernel FIFO and copied to user space

    It appears like at some point the program stops reading from the character device in user space which allows the kernel FIFO to fill up with 32 messages. Would you agree?

    Jason Reeder
  • I am testing with ti-processor-sdk-linux-am335x-evm-03.00.00.04-Linux-x86-Install.bin.
  • Yes you are right.
    I've spent a lot of time for research source code and have the same conclusion like you.
    It's source code of my test program based on lab_5:

    #include <stdio.h>
    #include <unistd.h>
    #include <fcntl.h>
    #include <string.h>
    #include <sys/poll.h>

    #define MAX_BUFFER_SIZE 512
    char readBuf[MAX_BUFFER_SIZE];

    #define NUM_MESSAGES 65530
    #define DEVICE_NAME "/dev/rpmsg_pru31"

    int main(void)
    {
    struct pollfd pollfds[1];
    int i;
    int result = 0;

    /* Open the rpmsg_pru character device file */
    pollfds[0].fd = open(DEVICE_NAME, O_RDWR);

    /*
    * If the RPMsg channel doesn't exist yet the character device
    * won't either.
    * Make sure the PRU firmware is loaded and that the rpmsg_pru
    * module is inserted.
    */
    if (pollfds[0].fd < 0) {
    printf("Failed to open %s\n", DEVICE_NAME);
    return -1;
    }

    /* The RPMsg channel exists and the character device is opened */
    printf("Opened %s, sending %d messages\n\n", DEVICE_NAME, NUM_MESSAGES);

    for (i = 0; i < NUM_MESSAGES; i++) {
    /* Send 'hello world!' to the PRU through the RPMsg channel */
    result = write(pollfds[0].fd, "hello world!", 13);
    if (result > 0)
    printf("Message %d: Sent to PRU\n", i);

    /* Poll until we receive a message from the PRU and then print it */
    result = read(pollfds[0].fd, readBuf, MAX_BUFFER_SIZE);
    if (result > 0)
    printf("Message %d received from PRU:%s\n\n", i, readBuf);
    }

    /* Received all the messages the example is complete */
    printf("Received %d messages, closing %s\n", NUM_MESSAGES, DEVICE_NAME);

    /* Close the rpmsg_pru character device file */
    close(pollfds[0].fd);

    return 0;
    }

    After first running of program everything fine. But when I launch program second time after 5 iteration my system working slowly and slowly. If I terminate users-space program it doesn't help I have more and more messages in dmesg. It's look like anything send data to pru without reading. It's strange I understand but it's true.

    Best regards,
    Viktor.
  • Hello.

    When I launch top on momemt o hungup I see following picture:

    m: 102740K used, 400444K free, 80K shrd, 4600K buff, 11544K cached
    CPU:   0% usr  95% sys   0% nic   4% idle   0% io   0% irq   0% sirq
    Load average: 2.82 1.03 0.37 1/99 236
      PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
      181     2 root     RW       0   0%  94% [irq/201-remotep]
      228   210 root     S     6916   1%   0% sshd: root@pts/1
      236   230 root     R     2420   0%   0% top
        4     2 root     RW       0   0%   0% [ktimersoftd/0]
        3     2 root     SW       0   0%   0% [ksoftirqd/0]
      220   210 root     S     6916   1%   0% sshd: root@pts/0
      205     1 root     S     6504   1%   0% /usr/sbin/ntpd -g
      210     1 root     S     4356   1%   0% /usr/sbin/sshd
      145     1 root     S     3904   1%   0% /sbin/udevd -d
      222   220 root     S

    For me 181     2 root     RW       0   0%  94% [irq/201-remotep] is abnormal

    201 irq is:

    # cat /proc/interrupts  | grep 201
    201:      17983  4a320000.intc  18 Edge      remoteproc2
    #

    What do you think about it?

  • Hello.

    When I launch top on momemt o hungup I see following picture:

    m: 102740K used, 400444K free, 80K shrd, 4600K buff, 11544K cached
    CPU: 0% usr 95% sys 0% nic 4% idle 0% io 0% irq 0% sirq
    Load average: 2.82 1.03 0.37 1/99 236
    PID PPID USER STAT VSZ %VSZ %CPU COMMAND
    181 2 root RW 0 0% 94% [irq/201-remotep]
    228 210 root S 6916 1% 0% sshd: root@pts/1
    236 230 root R 2420 0% 0% top
    4 2 root RW 0 0% 0% [ktimersoftd/0]
    3 2 root SW 0 0% 0% [ksoftirqd/0]
    220 210 root S 6916 1% 0% sshd: root@pts/0
    205 1 root S 6504 1% 0% /usr/sbin/ntpd -g
    210 1 root S 4356 1% 0% /usr/sbin/sshd
    145 1 root S 3904 1% 0% /sbin/udevd -d
    222 220 root S

    For me 181 2 root RW 0 0% 94% [irq/201-remotep] is abnormal

    201 irq is:

    # cat /proc/interrupts | grep 201
    201: 17983 4a320000.intc 18 Edge remoteproc2
    #

    What do you think about it?
  • Hi Jason,

    I should say than the problem inside 4.4.19 kernel which I use, in remoteproc subsystem (this sytem produce [irq/201-remotep]). I've compared drivers/remoteproc folders between 4.4.12 (which you use in SDK) and 4.4.19 kernels and found a lot of changes.

    Could you build 4.4.19 kernel from github.com/.../linux.git ? I'm shure that you reproduce this behavior on 4.4.19 kernel.

    Best regards,
    Viktor.
  • Viktor,

    I've figured out what is causing this issue.

    The last_avail_idx property of the pru_virtqueue structure (in the include/pru_virtqueue.h file) should be an unsigned 16 bit integer (instead of uint32_t). This property type must match the vring_avail.idx property in the include/pru_virtio_ring.h file because these two properties are both free running counters that are compared to each other in order to determine if there is a new message for the PRU or not.

    In our error case, the vring_avail.idx property was incremented 65536 times at which point it rolled over to 0. The last_avail_idx (used by the rpmsg library) did not roll over at the same time because it was 32 bits. This caused the two indexes to become mismatched, even after the PRU had read all available messages. So, the PRU got stuck in the while loop of the main method in the Echo examples and it thought that there was always a message waiting for it, which it immediately echoed back to the ARM core.

    All of these unsolicited messages brought the ARM core to a crawl.

    Thank you for pointing out this issue and helping to resolve it. I have posted the fix to the pru-software-support-package repository here: https://git.ti.com/pru-software-support-package/pru-software-support-package/commit/d5add00c73293f5ae6f5eef304ce97f93d180322

    Jason Reeder