Linux/AM3354: slow SPI access

user5842274

Part Number: AM3354

Tool/software: Linux

I am using ti-processor-sdk-linux-rt-am335x-evm-05.00.00.15 on an 800MHz AM3354, for a custom board. On my board, I connect the AM3354 to a STM32 MCU via SPI bus. AM3354 acts as the master. There is no other SPI devices on the SPI master. When the MCU need to send data to AM3354, it triggers an interrupt.
On the AM3354 side, I use an interrupt thread to access the SPI bus synchronuously. As the first several read-in data contains the length of the total packet, I split the read MCU procedure to two part, first read-in the part contains the length, then read the total packet according to the length.
The problem is the access of SPI is too slow. I have tried to increase the irq thread, spi0 thread and the application thread priority to SCHED_FIFO, 99. Please see the following picture:

1. From the picture, you can see first the WRITE signal goes low. It is corresponding to the source line gpio_set_value(p_drvdata->boot0, 0) in function ruifu_mcu_controller_write;
2. then the CS is activated, but the delay is also huge, over 200us.
3. Then after the actual write, CS goes high. MCU handles the command and drive the INT low very quickly. But at this time, the system call write(kernel space ruifu_mcu_controller_write) didn't finish yet.
4. After about 100us, the WRITE line goes high, it corresponding to the line gpio_set_value(p_drvdata->boot0, 1) in ruifu_mcu_controller_write. At this point, the user space application finished the command sending to MCU. But obviously, MCU already finished handling the command and is waiting for the following SPI read.
5. After about 900us, AM3354 begin to read the MCU response in the irq thread. It start to read the first part, which contains the length info of the total packet.
6. After another 200us, the remaining part of the total packet are read out.

driver code:

static ssize_t
ruifu_mcu_controller_sync(struct ruifu_mcu_controller_data *p_drvdata, struct spi_message *message)
{
    int status;
    struct spi_device *spi;

    spin_lock_irq(&p_drvdata->spi_lock);
    spi = p_drvdata->spi;
    spin_unlock_irq(&p_drvdata->spi_lock);

    if (spi == NULL)
        status = -ESHUTDOWN;
    else
        status = spi_sync(spi, message);

    if (status == 0)
        status = message->actual_length;

    return status;
}

static inline ssize_t
ruifu_mcu_controller_sync_write(struct ruifu_mcu_controller_data *p_drvdata, size_t len)
{
    struct spi_transfer t;
    struct spi_message  m;

    memset(&t, 0, sizeof(struct spi_transfer));
    t.tx_buf = p_drvdata->tx_buffer;
    t.len       = len;
    t.speed_hz  = p_drvdata->speed_hz;

    spi_message_init(&m);
    spi_message_add_tail(&t, &m);
    return ruifu_mcu_controller_sync(p_drvdata, &m);
}

static ssize_t
ruifu_mcu_controller_write(struct file *filp, const char __user *buf,
        size_t count, loff_t *f_pos)
{
    struct ruifu_mcu_controller_data *p_drvdata;
    ssize_t         status = 0;
    unsigned long       missing;

    /* chipselect only toggles at start or end of operation */
    if (count > bufsiz)
        return -EMSGSIZE;

    p_drvdata = filp->private_data;

    missing = copy_from_user(p_drvdata->tx_buffer, buf, count);
    if(missing)
        status = -EFAULT;
    else{
        mutex_lock(&p_drvdata->buf_lock);
        gpio_set_value(p_drvdata->boot0, 0);
        status = ruifu_mcu_controller_sync_write(p_drvdata, count);
        gpio_set_value(p_drvdata->boot0, 1);
    }
    mutex_unlock(&p_drvdata->buf_lock);

    return status;
}

const u16 CONST_MCU_PACKET_HEADER_RESERVE   =     2;
static irqreturn_t ruifu_mcu_controller_irq(int irq, void *handle)
{
    struct ruifu_mcu_controller_data *p_drvdata = handle;
    struct spi_transfer x[2];
    struct spi_message  m;
    u16    readsize = CONST_MCU_PACKET_HEADER_LEN-CONST_MCU_PACKET_HEADER_RESERVE;
    u16    tx_size, transferred_size;
    u16    remaining_bufsize;
    int i, j;
    int read_error = 0;

#ifdef RUIFU_MCU_CONTROLLER_DEBUG
    dev_notice(&p_drvdata->spi->dev, "interrupt triggered\n");
#endif

    spi_message_init(&m);
    mutex_lock(&p_drvdata->buf_lock);
    /* iterate twice, first for header, then for the payload */
    for(i=0; i<2; i++)
    {
        /* first make sure there is enough buffer remained */
        if(p_drvdata->head >= p_drvdata->tail) {
            remaining_bufsize = bufsiz -1 - (p_drvdata->head - p_drvdata->tail);
        }else{
            remaining_bufsize = bufsiz -1 + p_drvdata->head - p_drvdata->tail;
        }
        if(readsize>(bufsiz/2))
        {
            dev_err(&p_drvdata->spi->dev, "device request to read too many bytes: %d, rx buffer overflow\n", readsize);
            readsize = bufsiz/2;
        }
        if(remaining_bufsize < readsize)
        {   
            dev_err(&p_drvdata->spi->dev, "rx buffer overflow\n");
        }
        tx_size = readsize;
        if((p_drvdata->head+readsize) > bufsiz)
        {
            /* we need two separate transfers */
            tx_size = bufsiz - p_drvdata->head;
        }
        memset(&x[0], 0, sizeof(struct spi_transfer));
        x[0].rx_buf = p_drvdata->rx_buffer+p_drvdata->head;
        x[0].len = tx_size;
        x[0].speed_hz= p_drvdata->speed_hz;
        if(tx_size != readsize)
        {
            memset(&x[1], 0, sizeof(struct spi_transfer));
            x[1].rx_buf = p_drvdata->rx_buffer;
            x[1].len = readsize - tx_size;
            x[1].speed_hz= p_drvdata->speed_hz;
            spi_message_add_tail(&x[0], &m);
            if(i==0){
                x[1].cs_change = 1;
            }
            spi_message_add_tail(&x[1], &m);
        }else{
            if(i==0){
                x[0].cs_change = 1;
            }
            spi_message_add_tail(&x[0], &m);
        }
        transferred_size = ruifu_mcu_controller_sync(p_drvdata, &m);
        if(transferred_size != readsize)
        {
            dev_err(&p_drvdata->spi->dev, "request rx size: %d, actual size: %d\n", readsize, transferred_size);
        }
#ifdef RUIFU_MCU_CONTROLLER_DEBUG
        dev_notice(&p_drvdata->spi->dev, "read:");
        for(j=0; j<readsize; j++)
        {
            pr_notice("0x%02x", p_drvdata->rx_buffer[(p_drvdata->head+j)%bufsiz]);
        }
#endif
        if(i==0){
            /* calculate the length for the next packet */
            tx_size = p_drvdata->rx_buffer[(p_drvdata->head+CONST_MCU_PACKET_LENGTH_POS)%bufsiz];
            tx_size = (tx_size<<8)|p_drvdata->rx_buffer[(p_drvdata->head+CONST_MCU_PACKET_LENGTH_POS+1)%bufsiz];
            if(tx_size<CONST_MCU_PACKET_LENGTH_REDUCTION){
                dev_err(&p_drvdata->spi->dev, "protocol length voilation detected, packet length error\n");
                readsize = CONST_MCU_PACKET_HEADER_RESERVE;
                read_error = 1;
            }else{
                p_drvdata->head = (p_drvdata->head+readsize)%bufsiz;
                readsize = (tx_size-CONST_MCU_PACKET_LENGTH_REDUCTION)*2+CONST_MCU_PACKET_HEADER_RESERVE;
            }
            spi_message_init(&m);
        }else{
            if(!read_error){
                p_drvdata->head = (p_drvdata->head+readsize)%bufsiz;
            }
        }
    }
    mutex_unlock(&p_drvdata->buf_lock);
    if(!read_error){
        wake_up_interruptible(&p_drvdata->wait);
    }
    return IRQ_HANDLED;
}

My questions are:

1. The delay from step 1 to 2 is too large. Is there any way to reduce it?
2. The delay from 3 to 4 seems so large. Is there any way to reduce it?
3. It seems the interrupt is not serviced in time. The delay from 4 to 5 seems so large. This is the most severe problem I think.
4. How could I improve the two consecutive SPI message transfer? It is the delay from step 5 to 6, corresponding to the two call to ruifu_mcu_controller_sync in ruifu_mcu_controller_irq

I didn't start any other user APP, and from the command top output. The system is not busy.
I found someone use PRU to do the SPI transfer, like github.com/.../wiki. Is that applicable to AM3354? And is there some easier way to improve my case? I feel the PRU method overkill for my case.

over 6 years ago

0 RonB over 6 years ago

TI__Mastermind 30576 points

Thanks for your post. A few clarifying questions:

1. Where do your timing expectations come from? Is the scenario above typical, or worst case?
2. Could you please post/attach your DTS file, specifically your SPI DT nodes?

I'm sorry, but I can't review the code specifically. I'll be happy to take a look at the DT config and see if I can make any suggestions.

The AM3354 does not have PRU cores. However, if you chose to move to a device with PRU cores, the PRU would be the absolute lowest latency way to manage SPI communication and should be very similar to other MCUs.

I hope this is helpful.

0 user5842274 over 6 years ago in reply to RonB

Prodigy 90 points

For your question:

1. This is the typical scenario. I managed to capture the timing sequence several time, they are quite similar. I don't know the typical SPI timing about the Linux Rt running on AM335x. Do you think the sequence are normal?

2. My dts files are attached. I saw in a post that removing DMA names will improve the SPI timing. So I removed those DMA names in SPI node, as my SPI packet will not be large enough to trigger a DMA.

dt_files.zip

0 Krunal Bhargav34 over 6 years ago in reply to user5842274

TI__Mastermind 44735 points

Hello,

I am wondering if you have tried running the cyclictest for the system's latencies because it could answer some of your delay questions.

Regards,
Krunal

0 user5842274 over 6 years ago in reply to Krunal Bhargav34

Prodigy 90 points

Hi,

I just tried to use the tool with command " cyclictest -n -p 90 -i 1000". and Here is the result (listed only the last several lines.

===============================================

T: 0 ( 798) P:90 I:1000 C:1643990 Min: 34 Act: 64 Avg: 48 Max: 7541
policy: fifo: loadavg: 0.02 0.09 0.18 1/93 969

T: 0 ( 798) P:90 I:1000 C:1644001 Min: 34 Act: 46 Avg: 48 Max: 7541
policy: fifo: loadavg: 0.02 0.09 0.18 1/93 969

T: 0 ( 798) P:90 I:1000 C:1644012 Min: 34 Act: 49 Avg: 48 Max: 7541
policy: fifo: loadavg: 0.02 0.09 0.18 1/93 969

===============================================

If I understands it well, the too usest use priority FIFO 90 for the testing. From the ps result, it seems only the [posixcputmr/0] has a higher priority.

===============================================

root@am335x-evm:~# ps -eLo pid,cls,rtprio,pri,nice,cmd
PID CLS RTPRIO PRI NI CMD
1 TS - 19 0 /sbin/init
2 TS - 19 0 [kthreadd]
5 TS - 19 0 [kworker/u2:0]
6 TS - 39 -20 [mm_percpu_wq]
7 TS - 19 0 [ksoftirqd/0]
8 FF 1 41 - [ktimersoftd/0]
9 FF 1 41 - [rcu_preempt]
10 FF 1 41 - [rcu_sched]
11 FF 1 41 - [rcub/0]
12 FF 1 41 - [rcuc/0]
13 TS - 19 0 [kswork]
14 FF 99 139 - [posixcputmr/0]
15 TS - 19 0 [kdevtmpfs]
16 TS - 39 -20 [netns]
18 FF 50 90 - [irq/26-44e07000]
19 FF 50 90 - [irq/27-4804c000]
20 FF 50 90 - [irq/28-481ac000]
21 FF 50 90 - [irq/29-481ae000]
22 TS - 19 0 [khungtaskd]
23 TS - 19 0 [oom_reaper]
24 TS - 39 -20 [writeback]
25 TS - 19 0 [kcompactd0]
26 TS - 39 -20 [crypto]
27 TS - 39 -20 [kblockd]
28 FF 50 90 - [irq/20-49000000]
29 FF 50 90 - [irq/22-49000000]
30 TS - 39 -20 [edac-poller]
31 TS - 39 -20 [watchdogd]
32 TS - 39 -20 [rpciod]
33 TS - 39 -20 [xprtiod]
34 TS - 19 0 [kswapd0]
35 TS - 39 -20 [nfsiod]
47 TS - 39 -20 [kpcitest]
48 FF 50 90 - [irq/52-48310000]
49 FF 50 90 - [irq/44-ruifu_mc]
50 FF 50 90 - [irq/45-ruifu_sa]
53 FF 50 90 - [irq/47-4a100000]
54 FF 50 90 - [irq/48-4a100000]
55 FF 50 90 - [irq/33-mmc0]
56 FF 50 90 - [irq/53-48060000]
57 TS - 19 0 [mmcqd/0]
58 FF 50 90 - [irq/34-mmc1]
60 TS - 19 0 [mmcqd/1]
61 TS - 19 0 [mmcqd/1boot0]
62 TS - 19 0 [mmcqd/1boot1]
63 TS - 19 0 [mmcqd/1rpmb]
64 TS - 39 -20 [ipv6_addrconf]
65 FF 50 90 - [irq/32-44e0b000]
67 TS - 39 -20 [kworker/0:1H]
68 TS - 19 0 [jbd2/mmcblk0p2-]
69 TS - 39 -20 [ext4-rsv-conver]
70 TS - 19 0 [kworker/u2:2]
84 TS - 19 0 /lib/systemd/systemd-journald
100 TS - 19 0 /lib/systemd/systemd-udevd
106 TS - 19 0 /lib/systemd/systemd-timesyncd
106 TS - 19 0 /lib/systemd/systemd-timesyncd
173 TS - 39 -20 [kworker/0:2H]
283 TS - 19 0 /lib/systemd/systemd-networkd
348 TS - 19 0 /sbin/syslogd -n
376 TS - 19 0 /usr/sbin/rngd -f -r /dev/urandom
394 TS - 19 0 /usr/sbin/rpcbind
402 TS - 19 0 /sbin/klogd -n
411 TS - 19 0 /usr/sbin/atd -f
423 TS - 19 0 /usr/bin/dbus-daemon --system --address=systemd: --nofo
433 TS - 19 0 [jbd2/mmcblk1p2-]
434 TS - 39 -20 [ext4-rsv-conver]
473 TS - 19 0 /usr/sbin/crond -n
479 TS - 19 0 avahi-daemon: running [am335x-evm.local]
494 TS - 19 0 avahi-daemon: chroot helper
501 TS - 19 0 [ptp0]
652 TS - 19 0 /lib/systemd/systemd-logind
662 TS - 19 0 /usr/sbin/snmpd -Ls0-6d -a -f
671 TS - 19 0 /lib/systemd/systemd-resolved
677 TS - 19 0 /usr/bin/redis-server 127.0.0.1:6379
677 TS - 19 0 /usr/bin/redis-server 127.0.0.1:6379
677 TS - 19 0 /usr/bin/redis-server 127.0.0.1:6379
687 TS - 19 0 /usr/sbin/lighttpd -D -f /etc/lighttpd.conf
695 TS - 19 0 /bin/login --
696 TS - 19 0 /sbin/agetty -o -p -- \u --noclear tty1 linux
697 TS - 19 0 /usr/sbin/rpc.statd -F
710 FF 50 90 - [irq/30-44e09000]
721 TS - 19 0 /usr/sbin/telnetd
731 TS - 19 0 /usr/sbin/thttpd -d /srv/www -p 8080
783 TS - 19 0 /lib/systemd/systemd --user
784 TS - 19 0 (sd-pam)
788 TS - 19 0 -sh
845 TS - 19 0 [kworker/0:0]
931 TS - 19 0 [kworker/0:2]
958 TS - 19 0 [kworker/u2:1]
963 TS - 19 0 [kworker/0:1]
977 TS - 19 0 [kworker/0:3]
980 TS - 19 0 /sbin/agetty -8 -L ttyS3 115200 xterm
981 TS - 19 0 ps -eLo pid,cls,rtprio,pri,nice,cmd

===============================================

Can you see any abnormal matters from the above result?

Thanks

/Li

0 Krunal Bhargav34 over 6 years ago in reply to user5842274

TI__Mastermind 44735 points

Hello,

I apologize for the inconvenience but due to US holidays, our expert on the topic is not available and I will try to have some feedback for you by next week.

Regards,
Krunal

0 Wolfgang Muees1 over 6 years ago in reply to Krunal Bhargav34

Genius 3685 points

I suggest you write a kernel driver for this communication.

You can not handle this fast timing in user space.

0 user5842274 over 6 years ago in reply to Wolfgang Muees1

Prodigy 90 points

Yes, I invented the driver. As you can see in my original mail. The IRQ related driver source is also posted there.

0 Marcus Cooksey over 6 years ago in reply to user5842274

TI__Genius 9510 points

I would like to revisit the proper sequence of the SPI protocol to begin communication,
1) the bus master configures the clock (CLK)
2) The master then selects the slave device with a logic level 0 (CS)
3) The master sends a bit on the MOSI line and the slave reads it, while the slave sends a bit on the MISO

Based on your diagram the communication protocol is out of sequence.

You may need to need to add mutex or semaphores to maintain sequence between slave interrupt and setup to transmit the data.

0 user5842274 over 6 years ago in reply to Marcus Cooksey

Prodigy 90 points

The sequence is correct. I can visit the slave and got the response as expected. The problem here is the about the slow SPI access in a irq thread. As you can see in my code above, there is already a mutex in my irq thread:
mutex_lock(&p_drvdata->buf_lock);

0 Krunal Bhargav34 over 6 years ago in reply to user5842274

TI__Mastermind 44735 points

Hello,

I just wanted to clarify whether you are experiencing data corruption with your current test setup or if the data is valid but you would like to speed up the SPI access.

Regards,
Krunal

0 user5842274 over 6 years ago in reply to Krunal Bhargav34

Prodigy 90 points

Hi, I only want to speed up the SPI access.

0 Krunal Bhargav34 over 6 years ago in reply to user5842274

TI__Mastermind 44735 points

Hello,

In the file, spi-omap2-mcspi.c (located under "<linux-xxxx>/drivers/spi"), modify the value of the variable "DMA_MIN_BYTES" to 4 and please let me know if it helps with your SPI access.

Regards,
Krunal

0 Krunal Bhargav34 over 6 years ago in reply to Krunal Bhargav34

TI__Mastermind 44735 points

Hello,

I will be closing the ticket and if you are still experiencing issues, please feel free to open the ticket in the future.

Regards,
Krunal

0 user5842274 over 6 years ago in reply to Krunal Bhargav34

Prodigy 90 points

Hi,

I have written code to access the SPI register directly, instead of use the standard spi_transfer. It improves a lot. But I still feel there are something abnormal. I will try your method later and let you know the result. You can close the issue now.

Thanks.

Regards,
Li

Processors

Processors forum

Linux/AM3354: slow SPI access