This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OMAP2 SPI: DMA Transfer loses "last" byte (Kernel-3.2.0-psp04.06.00.11)

Hello,

I'm using the Kernel-3.2.0-psp04.06.00.11 and I have an application that uses the SPI to communicate with a microcontroller via a custom protocol. In this design, the AM335x processor I use is the master.

Before I get to the bug I need to explain the communication protocol I use. When I send a request to the microcontroller, there will be some time for it to process it and have an answer ready to send back. As the slave side can't enable the bus, the solution we use is to read a lot more than what we expect because we don't know when the answer will be ready.

For example, say that I'm expecting a package with 50 bytes. I would send the request message and then try to read 500 bytes, hoping that somewhere inside this big buffer I can find those 50 bytes I need. 

The bug happens when the answer is truncated. Again, imagine that it took the microcontroller so much time to process the request that the answer started at byte number 470. When this happens, only 30 bytes would be received and I would need to read the bus again to get the remaining 20 bytes.

Now, the bug: there is ALWAYS one byte missing when the package is truncated. If my package was supposed to have 50 bytes, I get only 49.

Long story short, I found out the reason: the driver uses the file spi-omap2-mcspi.c to implement the transfer function. When I read LESS than 160 bytes, it calls a function that implements a programmed I/O read method, reading one byte at a time from the rx register. If I try to read more than 160 bytes, the transfer function the driver uses implements a direct memory access design to read things.

With an oscilloscope, I found out that FIO reads exactly the number of bytes I asks and copies the same exact number of bytes to the rx buffer. DMA, in the other hand, reads the number of bytes I ask PLUS ONE and copies the right number of bytes to the rx buffer. That means that if I try read(fd,rx_buff,500) I see 501 bytes going through the SPI bus, even though I always get the 500 bytes in rx_buff. This one extra byte is forever lost.

I made a simple program to further illustrate this issue. When I try to read anything from the bus, the microcontroller just sends back one byte and increment the value it sent. The example is shown below:

root@AM335x:~# ./spitest 200
   0000 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
   0010 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
   0020 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
   0030 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
   0040 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
   0050 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
   0060 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
   0070 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
   0080 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
   0090 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
   0100 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
   0110 B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
   0120 C0 C1 C2 C3 C4 C5 C6 C7


root@AM335x:~# ./spitest 200
   0000 C9 CA CB CC CD CE CF D0 D1 D2 D3 D4 D5 D6 D7 D8
   0010 D9 DA DB DC DD DE DF E0 E1 E2 E3 E4 E5 E6 E7 E8
   0020 E9 EA EB EC ED EE EF F0 F1 F2 F3 F4 F5 F6 F7 F8
   0030 F9 FA FB FC FD FE FF 00 01 02 03 04 05 06 07 08
   0040 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18
   0050 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28
   0060 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38
   0070 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48
   0080 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58
   0090 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68
   0100 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78
   0110 79 7A 7B 7C 7D 7E 7F 80 81 82 83 84 85 86 87 88
   0120 89 8A 8B 8C 8D 8E 8F 90

All the program does is read 200 bytes from the bus and print them somewhat like hexdump would. Two consecutive calls shows that I lost the C8 byte.

As I said, I sniffed the communication with an oscilloscope and the missing byte is being sent in the end of the first read call that uses the DMA.

Another test I did was to disable DMA, using PIO regardless of the size I would read. Works like a charm!

This is as far as my current knowledge goes. I don't fully understand how the DMA registers handles the SPI communication and why the heck would it read one extra byte every time...

Can someone help me?

Thanks for reading evething =)

DAVI

  • Hi Davi,
     
    I will escalate this to the factory team for explanation.
  • Ok!

    It would be nice if someone else could verify this issue. Just in case, I'm using the AM3352ZCZ60...

    DAVI

  • Hi Davi,

    I will take a look at your issue and get back with you.  Just to review, if you use less than 160 bytes with the default spi-omap2-mcspi driver, then you don't see the issue. But when you use greater than 160 bytes and then the DMA is used for the transfer then you see the issue?

    I will post back on whether I can duplicate this issue and figure out what is going on with your example.

  • Yes, PIO always works. The DMA is the one that causes the issue. 160 is the value defined for the macro DMA_MIN_BYTES.

    I was thinking that if you can't reproduce it, maybe the problem is with the Kernel configuration I'm using (the configuration used in the menuconfig).

    Anyway, first things first. Please let me know if I can be of any assistance.

    DAVI

  • DAVI,

    I finally got a chance to duplicate your issue.  I do see one extra transfer anytime the DMA is used whereas the PIO does not request an extra byte.  I am starting to debug now. I don't have much experience with the with the EDMA, so it may take a little time to debug. I will post back on my findings.

  • Ok, thanks for the update. At least the problem is not within my setup...

    DAVI

  • I too am experiencing this issue.  So when a cause is found i would definitely like to know!

    Also i have some additional information on the issue.  First off here is how my setup works:

    • Am335x is the master and sends out commands whenever it has commands to send
    • The slave is another processor for which we wrote the code.  When it has data to send it pulls one of the gpio lines on the am335x low, if this line is low and there is no command to send then the master just sends out sets of clocks until the line goes back high essentially receiving in all the data.  

    I lowered the DMA limit to 5 from 160 for this so that DMA is pretty much always used to make my testing easier.  What i found is when i actually transmit DUMMY DATA out:

    ...
    //We have no data to send so setup a receive only
    //testBuff is 10 bytes long and contains all 0's
    myTest.transfer.tx_buf = testBuf;
    myTest.transfer.rx_buf = rx_buff;
    myTest.transfer.len = 10;
    
    ...
    
    //Add the transfer to the message tail
    spi_message_add_tail(&myTest.transfer, &myTest.msg);
    ...

    Everything works great.  The above code sends exactly 10 bytes out (receiving exactly 10 in).  However when i have no buffer to send out:

    ...
    //We have no data to send so setup a receive only
    myTest.transfer.tx_buf = NULL;
    myTest.transfer.rx_buf = rx_buff;
    myTest.transfer.len = 10;
    
    ...
    
    //Add the transfer to the message tail
    spi_message_add_tail(&myTest.transfer, &myTest.msg);
    ...

    An extra byte sends.  The above code sends exactly 11 bytes out (and receives exactly 10 in).

    So as you can see the problem is somehow directly coupled with there being or not being a TX buffer on the DMA.  I am going to research into this myself a little but if you guys have found any additional info i would be glad to hear it!!!  For now i can workaround this by just sending out a DUMMY TX BUFFER of 0's, but would like to know a proper solution when found : ).

    Thanks,

    Jarrod

  • Alright, well i went through the driver code briefly and between that and the user manual here is my understanding of the issue.  I am not going to pretend i understand all of what is going on in spi drivers (spi-omap2-mcspi.c) because i just took a brief look through to try and solve this.

    Issue Recap:

    When using SPI DMA in RX Only mode an extra frame is sent out at the tail end.  This causes an issue in that we lose that frame completely.

    Suggested Cause:

    In RX only mode the TX reg is loaded once and everytime the RX reg is read a new frame is immediately started using the same TX reg value (what i took from the manual description). So how the code works in NON-DMA mode is ON THE LAST FRAME it waits for the RXS bit to be set, then disables SPI, then reads the byte.  So by disabling the SPI first a new frame is NOT started when the data is read out.  In DMA mode the DMA is setup to read the full length range and then we wait for DMA to finish, then we disable SPI.  By disabling SPI after DMA completion we have already read out the last frame before disabling meaning a new frame was processed.  This is at least what i think is happening from my understanding of the code, correct me if I'm wrong please.

    Suggested Solution:

    So assuming my suggested cause is correct then my suggested solution is to in RX only mode set the DMA transaction to 1 frame less than we actually want.  Then after DMA completion we wait for the last frame to come in, then disable the spi module then we read the last frame.  In this way we do not automatically start another frame because we have disabled before reading the last frame.  Here is the diff for this solution (spi-omap2-mcspi.c) :

    diff -r e7f0207ed225 drivers/spi/spi-omap2-mcspi.c
    --- a/drivers/spi/spi-omap2-mcspi.c	Thu Jan 09 17:32:48 2014 -0500
    +++ b/drivers/spi/spi-omap2-mcspi.c	Thu Apr 03 09:46:49 2014 -0400
    @@ -358,6 +358,13 @@
     	if (rx != NULL) {
     		int a_cnt, b_cnt, c_cnt, b_cntrld;
     
    +		//In receive only mode we need to manually receive
    +		//the last byte so that the DMA does not cause an
    +		//extra frame to be sent out
    +		if(tx == NULL) {
    +			element_count--;
    +		}
    +
     		a_cnt    = 1 << data_type;
     		c_cnt    = element_count / (SZ_64K - 1);
     		b_cnt    = element_count - c_cnt * (SZ_64K - 1);
    @@ -415,6 +422,17 @@
     	if (rx != NULL) {
     		wait_for_completion(&mcspi_dma->dma_rx_completion);
     		dma_unmap_single(&spi->dev, xfer->rx_dma, count, DMA_FROM_DEVICE);
    +		//If in RX Only Mode Wait for the last element
    +		if(tx == NULL) {
    +			if (mcspi_wait_for_reg_bit(chstat_reg,
    +					OMAP2_MCSPI_CHSTAT_RXS) < 0) {
    +				dev_err(&spi->dev, "RXS timed out\n");
    +				count -= (word_len <= 8)  ? 1 :
    +					(word_len <= 16) ? 2 :
    +					/* word_len <= 32 */ 4;
    +				return count;
    +			}
    +		}
     		omap2_mcspi_set_enable(spi, 0);
     
     		if (l & OMAP2_MCSPI_CHCONF_TURBO) {
    
    			if (likely(mcspi_read_cs_reg(spi, OMAP2_MCSPI_CHSTAT0)
    				   & OMAP2_MCSPI_CHSTAT_RXS)) {
    				u32 w;
    
    				w = mcspi_read_cs_reg(spi, OMAP2_MCSPI_RX0);
    				if (word_len <= 8)
    					((u8 *)xfer->rx_buf)[elements++] = w;
    				else if (word_len <= 16)
    					((u16 *)xfer->rx_buf)[elements++] = w;
    				else /* word_len <= 32 */
    					((u32 *)xfer->rx_buf)[elements++] = w;
    			} else {
    				dev_err(&spi->dev,
    					"DMA RX penultimate word empty");
    				count -= (word_len <= 8)  ? 2 :
    					(word_len <= 16) ? 4 :
    					/* word_len <= 32 */ 8;
    				omap2_mcspi_set_enable(spi, 1);
    				return count;
     			}
     		}
     
    +		//If in RX only mode then read in the last element
    +		if(tx == NULL) {
    +			if(mcspi_wait_for_reg_bit(chstat_reg, OMAP2_MCSPI_CHSTAT_RXS) >= 0) {
    +				u32 v = mcspi_read_cs_reg(spi, OMAP2_MCSPI_RX0);
    +				if (word_len <= 8)
    +					((u8 *)xfer->rx_buf)[elements] = v;
    +				else if (word_len <= 16)
    +					((u16 *)xfer->rx_buf)[elements] = v;
    +				else /* word_len <= 32 */
    +					((u32 *)xfer->rx_buf)[elements] = v;
    +				dev_vdbg(&spi->dev, "read-%d %x\n",
    +						word_len, v);
    +			} else {
    +				dev_err(&spi->dev,
    +					"DMA RX final manual word empty");
    +				count -= (word_len <= 8)  ? 1 :
    +					(word_len <= 16) ? 2 :
    +					/* word_len <= 32 */ 4;
    +			}
    +		}
    +
     		omap2_mcspi_set_enable(spi, 1);
     	}
     	return count;
    

    I have tested this code and it seems to correct the issue so far : ).  I will hopefully test more extensively over the new few days while i work on my other code to try and be sure it is in fact correct.

    What I Don't Quite Understand

    Turbo mode, i don't use it but in the above the code it seems to read in the value in the RX reg and put it in the SECOND to the LAST place in the buffer.  I don't quite understand why it is doing this?  I guess maybe i don't fully understand how Turbo mode functions?  If someone could explain it to me some and/or be sure the changes i made won't break Turbo mode i would greatly appreciate it : ).

    Feel free to correct any mistaken assumptions i have made or to check through my code changes and make sure they are correct!  This solution seems to work for me so that is what I am going to use for now : ).

    -Jarrod

  • I spoke too soon.  After fooling around a bunch and making many messages send in rapid succession i can get it into a state where the last received byte is always wrong : /.  

    I don't know what exactly is happening but once it happens then the last byte of my receive will always be wrong until system restart.  So if i send 10 Bytes the 10th byte will be incorrect.  So I guess the code above is not  a proper fix, must be something amiss.

    For now i am just going to go back to the method of transmitting a dummy buffer of 0's when i wish to only receive because i don't have the time to play with it any more right now : (

    If someone finds a proper fix i would be glad to hear it!

    -Jarrod

  • The last news I got from TI was that they were still trying to find the root cause. I also can't focus on this issue right now.

    If the dummy buffer is not a work around of some sort (meaning if it actually solves the problem), it would seems that the Kernel is not handling the NULL pointer correctly. In fact, the Kernel or the DMA register, which would be much worse... Anyway, this is just an assumption (and not a difficult one) given the fact DMA actually accesses the memory directly.

    For my part, sending a dummy buffer is far from ideal. I have verified that it does work, but so far that was all I did. Sometime in the near future I would like to see if I can use the same address for the TX and RX buffer without jeopardizing the protocol I use so that I don't have to allocate unnecessary memory.

    If the problem persists, I think every developer is going to find a different way to implement this dummy buffer. So much for standard...

    DAVI

  • Thanks for the info!

    Yeah it kind of sucks having to use a dummy buffer, i thought i almost had a working method and i think my original assumption of the root cause is correct in that:

    What happens is DMA is flagged as completed when the last byte is received but the way the hardware works in RX ONLY mode is as soon as the data is read from the RX register a new frame begins immediately.  So this means that an extra frame is always sent in DMA RX only mode because SPI is not disabled until after DMA Completion.  In NON-DMA mode SPI is disabled before reading the last frame out meaning it does NOT transmit the extra frame.

    All my testing seems to support this assumption and my original solution "seemed" to correct it, but i am able to get it messed up somehow.  But what happens with my fix is that the correct number of frames are in fact sent, its just that the last byte is received incorrectly : /  for example on the scope i see it is 0x00 but it reads in 0x65 or something.  So the original issue is gone still, but i created a new one haha.  I am guessing i just made an error in my code due to not fully understanding all the driver is doing.

    It really shouldn't have anything to do with a NULL pointer because what happens in the driver is it checks to see if tx_buf is null.  If it is then it sets the SPI into RX only mode and puts 0x00 into the TX register.  According to the manual in RX only mode:

    "In master receive only mode, after the first loading of the transmitter register of the enabled channel, the transmitter register state is maintained as full. The content of the transmitter register is always loaded into the shift register, at the time of shift register assignment."

    So DMA in RX only mode is not using a TX buffer at all, DMA is ONLY attached to RX which is not null.  Maybe someone else will have some time to take a look at my code above and solve the issue, if not i will come back to it hopefully at some point and see if i can resolve it : ).

  • You are right, I overlooked the RX ONLY mode entirely...

    Much of the information you posted was new to me, too. Thanks.

    DAVI

  • I will post my idea but I don't have the time to implement nor to test it. Maybe it can help.

    You tried to read the last byte manually and that didn't work (not to mention there would be some delay to read this last byte). I think I would try a different approach: I know we can configure how the TX buffer is sent, so maybe if we don't use the RX ONLY mode at all and instead send just one 0x00 byte over and over again, until count is reached, we would prevent the issue as the completion flag would not be triggered by the RX buffer anymore.

    So, instead of using the RX ONLY mode, I would try to configure the TX register (those confusing A_CNT, B_CNT and C_CNT variables) accordingly to what I just said. As far as I can remember, the Technical Reference Manual shows at least two ways of sending data through, but the code always send one continuous array of data.

    Again, I don't know if this is feasible because I didn't have the time check my assumptions. I'm saying this based on what I remember reading...

    Hope it shed some light!

    I will get back to this issue eventually, if this does work I can post my findings afterwards.

    DAVI

  • Alright, cool.  Thanks!  I have to move on to other things as well but hopefully i have some time to come back at the tail end of this project and take a look.  I too will keep you updated on anything i find : ).

    Thanks!

    Jarrod

  • Hi Jarrod,

    I was reading this thread to understand a problem I am having and decided to copy my post here so that we can use as a single reference and maybe that can give you some clue about that, there is some few differences in what I observed but I am still convinced that we are all in the same issue.

    I am using spi-omap2-mcspi to drive a network card based on enc28j60 and everything works perfect if I define
    DMA_MIN_BYTES = 0 (force to use DMA only) or DMA_MIN_BYTES = 99999 (force to use PIO only).
    But if I use the original value of 160, the board can send out 10 to 20 pings approximately
    and if any time the drive switch from PIO to DMA what will be decided by the size of data,
    some workqueue tasks hung and the card stops. Worth to mention that the TCP/IP stack doesn't transmit packets with the same size all the time even when you fix the size for ICMP. Depending on the workload the size of transmission change randomly.

    I am running with "lockdep" and "detect hung tasks" and I can see that four tasks are stopped,
    three from enc28j60 and one from spi-omap2-mcspi.

    The mcspi is stopped in spi-omap2.mcspi.c:480 to wait the completion of omap2_mcspi_rx_callback
    that it never happens. The other three tasks are blocked by mutex_lock(spi->priv) if I remember correctly.

    I have no explanation for other three tasks be blocked because the spi-omap2-mcspi task is not holding any lock, I just guess that the line 480 is the reason.


    468 if (tx) { 469 tx->callback = omap2_mcspi_rx_callback; 470 tx->callback_param = spi; 471 dmaengine_submit(tx); 472 } else { 473 /* FIXME: fall back to PIO? */ 474 } 475 } 476 477 dma_async_issue_pending(mcspi_dma->dma_rx); 478 omap2_mcspi_set_dma_req(spi, 1, 1); 479 480 wait_for_completion(&mcspi_dma->dma_rx_completion); 481 dma_unmap_single(mcspi->dev, xfer->rx_dma, count, 482 DMA_FROM_DEVICE);

    Ventura

  • Hmmm, yeah i suppose they could be related.  Our issue is occuring only with DMA mode and it is an issue that is caused by the hardware doing what it's supposed to which is sending out the next byte as soon as the last one is read when in RX only mode.  In PIO mode we are able to disable before reading to prevent this, in DMA mode we are not.

    I am not sure why you would lock up on that line, it would seem to mean the SPI stopped transmitting before you were able to receive everything.  Did you try looking at it on a scope to see if everything is transmitting as it should?

    Also what psp version are you using?  The code you posted from spi-omap2-mcspi above doesn't seem to match to mine at all.

  • What I observed as I wrote, is the transition from PIO to DMA when the driver hangs.

    About the version I am using, I tried  many but now I am using the mainline 3.15.0-rc1. Since the beginning my board requires support for device tree in the kernel. I did a test with the psp 3.12 but the spi_omap2_mcspi is not passing the gpio interrupt to enc28j60 and I don't know how to fix the DT to work or even if that is the case. My device tree configuration works for the mainline and for  RobertCNelson kernel but not for the TI psp.

    Your thread gave me an idea to isolate the problem connecting another micro controller to work as SPI slave so that I can simulate several situations. I have a scope and logic analyzer but at this point I don't think that I need to go that far.  In reality I want to get in the same page as you are and try to start from that point.

    It's important to say that your thread is the only one I have found pointing out this issue about that driver, reason why I include myself on it.

    I will keep you posted.

    Thank you

  • Ah okay.  Yeah definitely let me know what you find!  I would be interested to know.  

    It will be awhile before i can get back into the issue, but hopefully collectively we are able to solve the driver problems!  : )

    Good Luck!

  • I think I got it working, here is the patch. I did what I wrote in my last post.

    diff -uNr a/spi-omap2-mcspi.c b/spi-omap2-mcspi.c
    --- a/spi-omap2-mcspi.c 2014-04-17 13:47:28.759432608 -0300
    +++ b/spi-omap2-mcspi.c 2014-04-17 13:48:29.035431679 -0300
    @@ -305,6 +305,8 @@
            const u8                * tx;
            void __iomem            *chstat_reg;
            struct edmacc_param     param;
    +       int a_cnt, b_cnt, c_cnt, b_cntrld;
    +       static const u8 dummy=0;
     
            mcspi = spi_master_get_devdata(spi->master);
            mcspi_dma = &mcspi->dma_channels[spi->chip_select];
    @@ -333,27 +335,29 @@
                    element_count = count >> 2;
            }
     
    -       if (tx != NULL) {
    -               int a_cnt, b_cnt, c_cnt, b_cntrld;
    -
    -               a_cnt    = 1 << data_type;
    -               b_cnt    = 1;
    -               c_cnt    = element_count / 256;
    -               b_cntrld = SZ_64K - 1;
    +       a_cnt    = 1 << data_type;
    +       b_cnt    = 1;
    +       c_cnt    = element_count / 256;
    +       b_cntrld = SZ_64K - 1;
     
    -               param.opt          = TCINTEN |
    -                       EDMA_TCC(mcspi_dma->dma_tx_channel) | SYNCDIM ;
    -               param.src          = xfer->tx_dma;
    -               param.a_b_cnt      = a_cnt | b_cnt << 16;
    -               param.dst          = tx_reg;
    +       if (tx != NULL) {
                    param.src_dst_bidx = a_cnt;
    -               param.link_bcntrld = b_cntrld << 16;
                    param.src_dst_cidx = a_cnt;
    -               param.ccnt         = element_count;
    -               edma_write_slot(mcspi_dma->dma_tx_channel, &param);
    -               edma_link(mcspi_dma->dma_tx_channel,
    -                               mcspi_dma->dummy_param_slot);
            }
    +       else {
    +               tx = xfer->tx_buf = &dummy;
    +               param.src_dst_bidx = 0;
    +               param.src_dst_cidx = 0;
    +       }
    +
    +       param.opt          = TCINTEN | EDMA_TCC(mcspi_dma->dma_tx_channel) | SYNCDIM;
    +       param.src          = xfer->tx_dma;
    +       param.a_b_cnt      = a_cnt | b_cnt << 16;
    +       param.dst          = tx_reg;
    +       param.link_bcntrld = b_cntrld << 16;
    +       param.ccnt         = element_count;
    +       edma_write_slot(mcspi_dma->dma_tx_channel, &param);
    +       edma_link(mcspi_dma->dma_tx_channel, mcspi_dma->dummy_param_slot);
     
            if (rx != NULL) {
                    int a_cnt, b_cnt, c_cnt, b_cntrld;
    @@ -929,31 +933,31 @@
                            chconf &= ~OMAP2_MCSPI_CHCONF_TRM_MASK;
                            chconf &= ~OMAP2_MCSPI_CHCONF_TURBO;
     
    -                       if (t->tx_buf == NULL)
    -                               chconf |= OMAP2_MCSPI_CHCONF_TRM_RX_ONLY;
    -                       else if (t->rx_buf == NULL)
    -                               chconf |= OMAP2_MCSPI_CHCONF_TRM_TX_ONLY;
    -
    -                       if (cd && cd->turbo_mode && t->tx_buf == NULL) {
    -                               /* Turbo mode is for more than one word */
    -                               if (t->len > ((cs->word_len + 7) >> 3))
    -                                       chconf |= OMAP2_MCSPI_CHCONF_TURBO;
    -                       }
    -
    -                       mcspi_write_chconf0(spi, chconf);
    -
                            if (t->len) {
                                    unsigned        count;
     
    -                               /* RX_ONLY mode needs dummy data in TX reg */
    -                               if (t->tx_buf == NULL)
    -                                       __raw_writel(0, cs->base
    -                                                       + OMAP2_MCSPI_TX0);
    +                               if (t->rx_buf == NULL)
    +                                       chconf |= OMAP2_MCSPI_CHCONF_TRM_TX_ONLY;
     
    -                               if (m->is_dma_mapped || t->len >= DMA_MIN_BYTES)
    +                               if (cd && cd->turbo_mode && t->tx_buf == NULL) {
    +                                       /* Turbo mode is for more than one word */
    +                                       if (t->len > ((cs->word_len + 7) >> 3))
    +                                               chconf |= OMAP2_MCSPI_CHCONF_TURBO;
    +                               }
    +
    +                               if (m->is_dma_mapped || t->len >= DMA_MIN_BYTES) {
    +                                       mcspi_write_chconf0(spi, chconf);
                                            count = omap2_mcspi_txrx_dma(spi, t);
    -                               else
    +                               }
    +                               else {
    +                                       if (t->len < DMA_MIN_BYTES && t->tx_buf == NULL)
    +                                               chconf |= OMAP2_MCSPI_CHCONF_TRM_RX_ONLY;
    +                                       mcspi_write_chconf0(spi, chconf);
    +                                       /* RX_ONLY mode needs dummy data in TX reg */
    +                                       if (t->tx_buf == NULL)
    +                                               __raw_writel(0, cs->base + OMAP2_MCSPI_TX0);
                                            count = omap2_mcspi_txrx_pio(spi, t);
    +                               }
                                    m->actual_length += count;
     
                                    if (count != t->len) {

    DAVI

  • I'm attaching the file to make it easier to use with the patch command in the drivers/spi folder.

    DAVI

    5732.spi-omap2-mcspi-patch.txt
    diff -uNr a/spi-omap2-mcspi.c b/spi-omap2-mcspi.c
    --- a/spi-omap2-mcspi.c	2014-04-17 13:47:28.759432608 -0300
    +++ b/spi-omap2-mcspi.c	2014-04-17 13:48:29.035431679 -0300
    @@ -305,6 +305,8 @@
     	const u8		* tx;
     	void __iomem		*chstat_reg;
     	struct edmacc_param	param;
    +	int a_cnt, b_cnt, c_cnt, b_cntrld;
    +	static const u8 dummy=0;
     
     	mcspi = spi_master_get_devdata(spi->master);
     	mcspi_dma = &mcspi->dma_channels[spi->chip_select];
    @@ -333,27 +335,29 @@
     		element_count = count >> 2;
     	}
     
    -	if (tx != NULL) {
    -		int a_cnt, b_cnt, c_cnt, b_cntrld;
    -
    -		a_cnt    = 1 << data_type;
    -		b_cnt    = 1;
    -		c_cnt    = element_count / 256;
    -		b_cntrld = SZ_64K - 1;
    +	a_cnt    = 1 << data_type;
    +	b_cnt    = 1;
    +	c_cnt    = element_count / 256;
    +	b_cntrld = SZ_64K - 1;
     
    -		param.opt          = TCINTEN |
    -			EDMA_TCC(mcspi_dma->dma_tx_channel) | SYNCDIM ;
    -		param.src          = xfer->tx_dma;
    -		param.a_b_cnt      = a_cnt | b_cnt << 16;
    -		param.dst          = tx_reg;
    +	if (tx != NULL) {
     		param.src_dst_bidx = a_cnt;
    -		param.link_bcntrld = b_cntrld << 16;
     		param.src_dst_cidx = a_cnt;
    -		param.ccnt         = element_count;
    -		edma_write_slot(mcspi_dma->dma_tx_channel, &param);
    -		edma_link(mcspi_dma->dma_tx_channel,
    -				mcspi_dma->dummy_param_slot);
     	}
    +	else {
    +		tx = xfer->tx_buf = &dummy;
    +		param.src_dst_bidx = 0;
    +		param.src_dst_cidx = 0;
    +	}
    +
    +	param.opt          = TCINTEN | EDMA_TCC(mcspi_dma->dma_tx_channel) | SYNCDIM;
    +	param.src          = xfer->tx_dma;
    +	param.a_b_cnt      = a_cnt | b_cnt << 16;
    +	param.dst          = tx_reg;
    +	param.link_bcntrld = b_cntrld << 16;
    +	param.ccnt         = element_count;
    +	edma_write_slot(mcspi_dma->dma_tx_channel, &param);
    +	edma_link(mcspi_dma->dma_tx_channel, mcspi_dma->dummy_param_slot);
     
     	if (rx != NULL) {
     		int a_cnt, b_cnt, c_cnt, b_cntrld;
    @@ -929,31 +933,31 @@
     			chconf &= ~OMAP2_MCSPI_CHCONF_TRM_MASK;
     			chconf &= ~OMAP2_MCSPI_CHCONF_TURBO;
     
    -			if (t->tx_buf == NULL)
    -				chconf |= OMAP2_MCSPI_CHCONF_TRM_RX_ONLY;
    -			else if (t->rx_buf == NULL)
    -				chconf |= OMAP2_MCSPI_CHCONF_TRM_TX_ONLY;
    -
    -			if (cd && cd->turbo_mode && t->tx_buf == NULL) {
    -				/* Turbo mode is for more than one word */
    -				if (t->len > ((cs->word_len + 7) >> 3))
    -					chconf |= OMAP2_MCSPI_CHCONF_TURBO;
    -			}
    -
    -			mcspi_write_chconf0(spi, chconf);
    -
     			if (t->len) {
     				unsigned	count;
     
    -				/* RX_ONLY mode needs dummy data in TX reg */
    -				if (t->tx_buf == NULL)
    -					__raw_writel(0, cs->base
    -							+ OMAP2_MCSPI_TX0);
    +				if (t->rx_buf == NULL)
    +					chconf |= OMAP2_MCSPI_CHCONF_TRM_TX_ONLY;
     
    -				if (m->is_dma_mapped || t->len >= DMA_MIN_BYTES)
    +				if (cd && cd->turbo_mode && t->tx_buf == NULL) {
    +					/* Turbo mode is for more than one word */
    +					if (t->len > ((cs->word_len + 7) >> 3))
    +						chconf |= OMAP2_MCSPI_CHCONF_TURBO;
    +				}
    +
    +				if (m->is_dma_mapped || t->len >= DMA_MIN_BYTES) {
    +					mcspi_write_chconf0(spi, chconf);
     					count = omap2_mcspi_txrx_dma(spi, t);
    -				else
    +				}
    +				else {
    +					if (t->len < DMA_MIN_BYTES && t->tx_buf == NULL)
    +						chconf |= OMAP2_MCSPI_CHCONF_TRM_RX_ONLY;
    +					mcspi_write_chconf0(spi, chconf);
    +					/* RX_ONLY mode needs dummy data in TX reg */
    +					if (t->tx_buf == NULL)
    +						__raw_writel(0, cs->base + OMAP2_MCSPI_TX0);
     					count = omap2_mcspi_txrx_pio(spi, t);
    +				}
     				m->actual_length += count;
     
     				if (count != t->len) {
    

  • Here is an update but remember I am not using the psp version from TI, I am not even using the SDK from TI. The kernel I am using is the mainline 3.15-rc1.

    I connected one beaglebone as the spi master to an Arduino uno as slave; doing that I can have serial terminals at both sides so that I can see everything. In the middle I have a logic analyzer to confirm that what I am reading at the terminals is exactly what I can see in the display of logic analyzer for both directions. I didn't find anything wrong!

    There is only one detail that I supposed has to be handled by the application: The first byte I receive back from a block transfer is always the last byte from the last block received. This is at least consistent with the way that the Arduino SPI works and maybe it's not something general but from the perspective of beaglebone everything was correct.

    Another check that I did was a stress test making transitions from PIO to DMA sending blocks of two different sizes and again everything works perfectly. This eliminates completely the drive spi_opmp2_mcspi as the originator of the problem I am having with the network card enc28j60. The problem has to be in the enc28j60 driver itself.

    Ventura

  • Hi davi ,

    I am also finding this problem . The spi-omap2-mcspi.c is  stuck at wait_for_completion(&mcspi_dma->dma_rx_completion); I am trying to use DMA in SPI for my data transfer. Is the above solution working ? 

    thanks

    rahul

  • Hi all,

    I am using 3.12 TI linux kernel and I am also finding this problem . The spi-omap2-mcspi.c is  stuck at wait_for_completion(&mcspi_dma->dma_rx_completion); I am trying to use DMA in SPI for my data transfer. So finally what's solution for this. ?

    thanks

    rahul