This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SPI TX with DMA: "continue" from BTC IRQ overwrites SPI TXBUF value (loses 1 item)

Other Parts Discussed in Thread: HALCOGEN

Hello,

I have simple problem but I do not understand why it occurs nor how to fix it. The problem is that TXBUF value in SPI register looks to be overwritten when starting new DMA transfer from BTC interrupt.

I have a kludge in my mind but that is not effective at all, 2 ISRs and wastes 1 N2HET based timer:
- from BTC isr, if more data to send, start N2HET based timer which length is 10bit time in SPI and from N2HET ISR start the DMA

I am using SPI (TX) with DMA to output UART data (10bit transfer), it works ok except when I go around a ring buffer. As I have learn you can't use DMA chaining so if there is more data in the beginning of the buffer I trigger new DMA transfer directly from BTC interrupt.

I noticed that individual chars were missing here and there and by little evaluation (basically shortening the ring buffer made failures occur faster) I managed to solve the failure point and made simple "test program" to illustrate the problem.

Test program has string array "1234567890\r\n" which is formatted as same length uint16 array (  (char <<1) | 0x200U) to contain 10bit UART pattern in each slot. SPI speed is 460800, so it takes ~20us to send one character and SPI is configured to sent 10 bits LSB first (nothing special).

Then I trigger sending that pattern every 1 second so that first the task sends all 12 chars (IRQ sends 0), then 11 ( 1), etc until final round when task sends 1 (11) and then restart from the beginning.

Here is the output (// added by me):

1234567890<CR><LF> // task 12, IRQ 0
1234567890<LF>           // task 11, IRQ 1 ( LF)
123456789<CR><LF>  // task 10, IRQ 2 (CR  LF)
123456780<CR><LF> // task 19, IRQ 3 (0 CR  LF)
123456790<CR><LF>
123456890<CR><LF>
123457890<CR><LF>
123467890<CR><LF>
123567890<CR><LF>
124567890<CR><LF>
134567890<CR><LF>
1234567890<CR><LF> // task 1, IRQ11 (now every character came through, because SPI has 2 "buffers", shift + txbuf)!!!!!!!
1234567890<CR><LF> // task 12, IRQ 0
1234567890<LF>           // task 11, IRQ 1 ( LF)
123456789<CR><LF>  // task 10, IRQ 2 (CR  LF)

As you can see, the last character which "task" should output is overwritten by the first character which IRQ gives unless task only sends 1 character (TXBUF is empty when DMA is started because data from task when to SHIFT REG).


If inside the BTC interrupt I make dummy delay to allow 1 character pass though (to ensure that there will be room in SPI buffer the output is ok
uint32 u32Time = HAL_u32TimeGet();
while( (HAL_u32TimeGet()- u32Time) < 24U ){};
1234567890<CR><LF>
1234567890<CR><LF>
1234567890<CR><LF>
1234567890<CR><LF>
1234567890<CR><LF>
1234567890<CR><LF>
1234567890<CR><LF>
1234567890<CR><LF>
1234567890<CR><LF>

Functionality is exactly the same do I use DAT0 or DAT1

tSciTxCTRLPKT.DADD      = (uint32)(&(spiREG4->DAT1));

tSciTxCTRLPKT.DADD      = (uint32)(&(spiREG4->DAT0));

And it does not matter either if CSHOLD bit is set or not (WDEL is disabled)
    spiREG4->GCR1 &= (uint32)~BIT_n( SPI_GCR1_SPI_EN );
    spiREG4->DAT1 |= BIT_n( SPI_DAT1_CSHOLD );
    spiREG4->GCR1 |= (uint32)BIT_n( SPI_GCR1_SPI_EN );


DAT0/1 selection should not effect and CSHOLD either, so in that way the functionality looks consistent.

DMA is configured to move 16 bit so that upper part of DAT register is not written/modified
    tSciTxCTRLPKT.RDSIZE    = ACCESS_16_BIT;      /* read size                  */
    tSciTxCTRLPKT.WRSIZE    = ACCESS_16_BIT;      /* write size                 */


Here is the test function which both TASK and IRQ uses to check/send the data (bStart is set by the TASK before calling this) and 12 magic number is the length of the test string pattern, task needs always send at least 1 item, otherwise IRQ never happens.

void vSendData()

{
    uint32 u32Index = 0U;
    if( bStart )
    {
        if( !bIrq )
        {
            u32Len = 12 - u32NextIdx;
            u32IrqLen = 12 - u32Len;

            u32Index = 0U;

            u32NextIdx++;

            if( u32NextIdx > 11U )
            {
                u32NextIdx = 0U;
            }
        }
        else
        {
            bStart = FALSE;
            bIrq = FALSE;
            u32Len = u32IrqLen;

            u32Index = 12U-u32Len;
        }
    }
    else
    {
        bIrq = FALSE;
        u32Len = 0U;
    }

    if( u32Len )
    {
        u32DmaSendingLen = u32Len;

        /* Updated required dma control packet information for TX1 */
        dmaRAMREG->PCP[DMA_CH_DBG_TX1].ISADDR  = (uint32)&au16DataBuf[ u32Index ];         /* source address    */
        dmaRAMREG->PCP[DMA_CH_DBG_TX1].ITCOUNT = SET_LOWORD_U32(dmaRAMREG->PCP[DMA_CH_DBG_TX1].ITCOUNT) |
                                                 SET_HIWORD_U32( u32Len );   /* frame count     */

        dmaSetChEnable(DMA_CH_DBG_TX1, DMA_HW); /* Enable DMA channel */

        spiREG4->INT0 |= SPI_INT0_DMAREQEN;
    }

}

And here is the IRQ function (dummy delay (which makes ok prints) mentioned above commented out)

void DMON_vPacketSent( void )
{
    spiREG4->INT0 &= ~SPI_INT0_DMAREQEN;

    u32DmaSendingLen = 0U;

    uint32 u32Time = HAL_u32TimeGet();

    //while( (HAL_u32TimeGet()- u32Time) < 24U ){};
    /* if more data in ring buf, sent it */
    bIrq = TRUE;
    vStartDmaTransfer();
}

There are also proper guards in the TASK side (with IRQ protection) to prevent the task triggering new DMA transfer is one is still ongoing (is this case there is never ongoing transfers)...
    if( !u32DmaSendingLen )
    {
        bStart = TRUE;
        vStartDmaTransfer();
    }


According  to TRM the DMA request should (see DMAREQEN) be generated when the data is moved to SHIFT register so in DMA transfer with used speed  (460800) there should be always another data waiting in TXBUF. And based on this, it should be OK to start new HW based DMA transfers even previous transfer residuals are in the SPI buffers.

It also tested the system by setting following checks into ISR function (just after  spiREG4->INT0 &= ~SPI_INT0_DMAREQEN; line)
    uint32 u32Pend = 0U;
    uint32 u32FLG = 0U;

    if( dmaREG->PEND & BIT_n( DMA_CH_DBG_TX1 ) )
    {
        u32Pend = 1U;
    }

    if( spiREG4->FLG & BIT_n( 9U ) ) // TX empty
    {
        u32FLG = 1U;
    }

And put breakpoint  after this and both values are 0 (as they should) when entering into ISR (also tested variable functionality by breaking the execution after start DMA and then both variables were 1 as tey should be because now DMA have had time to sent all data out). How ever the register view is not consistent (don't know the logic how this is updated) when stopping before and after send function. SPI FLG has always TX bit set and DMA has pend bit only if breakpoint is after send function. Why FLG has bit always up even though my flag checker says it is down, is the content in view updated by some delay???

I also tested ISR function by setting the break point before and after vStartDmaTransfer() function, and if break point is before the whole print comes out ok (as expected before that gives time for SPI) and if break point is after then 1 character is again lost (as expected).

PS. I do not use debug mode in DMA so it runs even though the execution in IDE is stopped...

Regards,
Jarkko

  • Jarkko,

    It could be something as simple as your configuraiton of the DMA channel.
    If you have created a simple test program that illustrates the issue you are facing, please attach the project.
    Please export the project as a .zip file as explained here: processors.wiki.ti.com/.../Project_Sharing and attach to this thread.
  • "test program" itself is simple (basically ripped/commented stuff out from the "main application"), but has tons of includes etc. I do not want to share this program as is. So I'll guess that I should/would make make new project where is only the key parts...

    I also added SCI into system and it prints ok (as supposed).

    Basically I have the string
    char* acBuffer = "1234567890\r\n";
    Which is formatted to SPI
    uint32 u32Len = strlen( acBuffer );
    uint32 u32I;
    uint16 u16Temp;

    for( u32I = 0U; u32I < u32Len; u32I++ )
    {
    /* Manipulate ring buffer array directly (less copying) */
    u16Temp = (((uint16)acBuffer[ u32I ]) << 1U) | 0x200U; /* shift data by 1 to make start bit (val 0) & set stop bit */

    au16DataBuf[ u32I ] = u16Temp;
    }

    I also simplified task-sending function a bit and added SCI peripheral to print same stuff, task starts them both and corresponding channel BTC prints the rest.

    void vTaskSend()
    {
    uint32 u32Index = 0U;

    u32Len = 12 - u32NextIdx;
    u32IrqLen = 12 - u32Len;

    u32NextIdx++;

    if( u32NextIdx > 11U )
    {
    u32NextIdx = 0U;
    }

    u32SpiLen = u32IrqLen;
    u32SpiIndex = 12U-u32SpiLen;
    u32SciLen = u32IrqLen;
    u32SciIndex = 12U-u32SciLen;

    if( u32Len )
    {
    u32DmaSendingLen = u32Len;

    /* Updated required dma control packet information for TX1 */
    dmaRAMREG->PCP[DMA_CH_DBG_TX1].ISADDR = (uint32)&au16DataBuf[ u32Index ]; /* source address */
    dmaRAMREG->PCP[DMA_CH_DBG_TX1].ITCOUNT = SET_LOWORD_U32(dmaRAMREG->PCP[DMA_CH_DBG_TX1].ITCOUNT) |
    SET_HIWORD_U32( u32Len ); /* frame count */

    /* Updated required dma control packet information for TX */
    dmaRAMREG->PCP[DMA_CH_SCI_TX].ISADDR = (uint32)&acBuffer[ u32Index ]; /* source address */
    dmaRAMREG->PCP[DMA_CH_SCI_TX].ITCOUNT = SET_LOWORD_U32(dmaRAMREG->PCP[DMA_CH_SCI_TX].ITCOUNT) |
    SET_HIWORD_U32( u32Len ); /* frame count */


    //spiREG4->INT0 &= ~SPI_INT0_DMAREQEN;
    spiREG4->INT0 |= SPI_INT0_DMAREQEN;

    #define SCI_SET_TX_DMA BIT_n(16U)
    scilinREG->SETINT = SCI_SET_TX_DMA;


    dmaREG->HWCHENAS = BIT_n(DMA_CH_DBG_TX1) | BIT_n(DMA_CH_SCI_TX);
    //dmaSetChEnable( DMA_CH_DBG_TX1, DMA_HW );
    //dmaSetChEnable( DMA_CH_SCI_TX, DMA_HW );
    }
    }

    Then I have BTCs for DMA both channels, which kicks new transmission
    DMA_BTC_SPI:
    void DMON_vPacketSent( void )
    {
    spiREG4->INT0 &= ~SPI_INT0_DMAREQEN;

    u32DmaSendingLen = 0U;

    uint32 u32Time = HAL_u32TimeGet();

    //while( (HAL_u32TimeGet()- u32Time) < 24U ){};
    /* if more data in ring buf, sent it */

    if( u32SpiLen )
    {
    dmaRAMREG->PCP[DMA_CH_DBG_TX1].ISADDR = (uint32)&au16DataBuf[ u32SpiIndex ]; /* source address */
    dmaRAMREG->PCP[DMA_CH_DBG_TX1].ITCOUNT = SET_LOWORD_U32(dmaRAMREG->PCP[DMA_CH_DBG_TX1].ITCOUNT) |
    SET_HIWORD_U32( u32SpiLen ); /* frame count */

    u32DmaSendingLen = u32SpiLen;
    u32SpiLen = 0U;

    //spiREG4->INT0 &= ~SPI_INT0_DMAREQEN;
    spiREG4->INT0 |= SPI_INT0_DMAREQEN;

    dmaSetChEnable( DMA_CH_DBG_TX1, DMA_HW );
    }
    }

    DMA_BTC_SCI:
    void SCI_BTC( void )
    {
    #define SCI_SET_TX_DMA BIT_n(16U)
    scilinREG->CLEARINT = SCI_SET_TX_DMA; // MUST BE PRESENT, otherwise SCI hangs after 1 sending...

    if( u32SciLen )
    {
    /* Updated required dma control packet information for TX */
    dmaRAMREG->PCP[DMA_CH_SCI_TX].ISADDR = (uint32)&acBuffer[ u32SciIndex ]; /* source address */
    dmaRAMREG->PCP[DMA_CH_SCI_TX].ITCOUNT = SET_LOWORD_U32(dmaRAMREG->PCP[DMA_CH_SCI_TX].ITCOUNT) |
    SET_HIWORD_U32( u32SciLen ); /* frame count */

    u32SciLen = 0;

    scilinREG->SETINT = SCI_SET_TX_DMA;

    dmaSetChEnable( DMA_CH_SCI_TX, DMA_HW );
    }
    }

    And "suprisingly" the SCI peripheral prints the stuff out correctly (actually I originally made&tested this ring buffer code with SCI but due to lack of SCIs moved this to SPI and the problems started immediately, first I though that this may be related to bus timings...). I have also checked with oscillosscope that the stream is 20us shorter so the "byte" is completely lost in the bus.
  • Here is the ZIP file (it has both SPI & SCI (LIN). SPI4 is used. XL2_RM46 launchpad pins J6 pin 4 (SCI) and J11 pin 24 (SPI). Both bus runs at 460800 (1start, 8 data, 1 stop)...

    sys_main.c does not look "nice" because I copy & pasted stuff from many files, there is also 2 halcogen projects (SPI_TX one was used to generate the latest output (other (from actual project) most probably would have worked also if I would have set _enable_IRQ(); (real project uses OS) and GIT repository with only few commits).

    When starting the program you should see following output (left side is the SPI) and if you scobe the bus you should see that "some" SPI streams are  20us shorter than other.

    Hopefully you can tell me what is wrong...


    4214.HERCULES_SPI_TX.zip

  • Having some trouble importing the project file you sent but I think it's on our side.
    It is asking me for 16.3.0 compiler and I can't seem to install that one.
    I have 16.6.0 but the import still fails.
  • Hi Jarkko,

    This is what happens with your code on the SPI side:

    1. DMA is configured and started. Everytime when SPI copies data from TXBUF (this is basically same as DAT registers) to its internal shift register, new DMA request is made and DMA copies next data to DAT register
    2. When DMA has transferred all the data it is configured to do, BTC interrupt is caused. Important notification is that the last data of the DMA transfer is still in TXBUF (so DAT-register) register of SPI, it has not yet been copied to SPI internal shift register. (exept in case that DMA was configured to copy only one data)
    3. In BTC interrupt you reconfigure DMA and manually generate first DMA request by toggling SPI_INT0_DMAREQEN (as it is cleared in beginning of interrupt routine).
    4. DMA copy starts and it immediately copies next data to DAT register overwriting the last data of previous DMA transfer (exept in case that DMA was configured to copy only one data).

    This is the reason you are missing one character in most of the iterations in your test. Only the cases that first DMA copies either all characters to SPI or first DMA copies only one data to SPI are working. The first case is obvious, as there is only on DMA transfer there is no second one happening to overwrite the last data of the first one. And second case when only one data is transferred by first DMA is working because in that case the data is immediately copied from TXBUF to internal shift register by SPI block, thus second DMA is not overwriting any data in TXBUF register.

    Now I have modified your example so that it checks if TXBUF is empty, and only in case it is DMA request is manually generated by toggling SPI_INT0_DMAREQEN. In case TXBUF is not empty, SPI block is generating TX DMA request when it will copy the existing data from TXBUF to internal shift register, like written in TRM chapter 27.2

    • "The SPI generates a request on the TX_DMA_REQ line each time the TX data is copied to the TX shift register either from the TXBUF or from peripheral data bus (when TXBUF is empty)."

    So in this case it is not required to generate manually the DMA request. Or if done it will cause DMA to overwrite existing data in TXBUF.

    For SCI part of the code, missing characters is not seen because seems SCI peripheral generating DMA request on TX DMA request bit only case when its transmit buffer is not empty. So the next DMA transfer is starting only when data is already copied from transmit buffer to internal shift register and so overwrite is not happening by DMA. What must be noted here that in case first DMA copied only one data, there is no further DMA request coming from the SCI peripheral, as it has immediately copied the data from transmit buffer to its internal shift register. So at the time when BTC interrupt is executed transmit buffer is already empty. And in this case it is required to toggle the DMA request bit to generate DMA request for the second DMA similarly as in SPI case.

    Also as it is recommend by TRM (in chapter 28.5.2.1)

    • "Because all data has been transmitted, the interrupt/DMA request should be halted."

    I kept in code clearing of the DMA request when no further DMA transfer is needed at that time in the BTC interrupt.

    Please read also my reply on your another question  to understand how first DMA request is generated in case peripheral is fully empty.

    Please find modified code that I have tested to be working 1803.sys_main.zip

    Kind Regards,

      Jani

  • Hi Jarkko,

    My earlier fix works with assumption that DMA BTC interrupt is executed within time that sending one data from SPI takes. That obviously may not be the case in real system where lot of other load also exists. Thus I modified the code so that it should work in all cases.

    Basically the change is that now code checks that if TXBUF register is empty and DMA request is not pending for SPI, then create one "manually" by toggling the SPI DMA request enable bit. This is because the HW generates DMA request when it copies the data from TXBUF to its internal shift register. And if there is no data in the TXBUF then clearly there will be no DMA request done by the HW and it must be generated by SW.

    Please find here the modified code 8357.sys_main.zip

    Br,

      Jani

  • And in real-world application (because this demo send data with predefined interval) also this else-branch needs to be completely removed because new data send request might come right after the BTC while TXBUF is still full. So basically with SPI the DMA request shall be kept on all the time unless in start DMA sending TXFULL and PEND information indicates that toggling the request is needed.

    else //no longer data pending to be transferred, turn off SPI DMA request
    {
    spiREG4->INT0 &= ~SPI_INT0_DMAREQEN;
    }
  • These lines shall be removed from SCI-part. Those are not needed at all (and now those are checked _after_ DMA start, TXRDY could toggle at this point, if check is wanted to be make it should be before DMA start) because looks like SCI DMA request works different way (no forcing the DMA to move data when setting the request) than SPI and you are basically free to toggle it as much as you like.

    //If SCITD is empty (no data waiting to be transferred to shift register)
    //then must generate first DMA REQ manually by toggling TX DMA Request
    if(scilinREG->FLR & SCI_TXRDY)
    {
    scilinREG->CLEARINT = SCI_SET_TX_DMA;
    scilinREG->SETINT = SCI_SET_TX_DMA;
    }

    So basically SCI works as said in TRM, you can disable the request always in the BTC and then set it again when starting the DMA.

    void SCI_BTC( void )
    {
    scilinREG->CLEARINT = SCI_SET_TX_DMA;
    if( u32SciLen )
    {
    /* Updated required dma control packet information for TX */
    dmaRAMREG->PCP[DMA_CH_SCI_TX].ISADDR = (uint32)&acBuffer[ u32SciIndex ]; /* source address */
    dmaRAMREG->PCP[DMA_CH_SCI_TX].ITCOUNT = SET_LOWORD_U32(dmaRAMREG->PCP[DMA_CH_SCI_TX].ITCOUNT) |
    SET_HIWORD_U32( u32SciLen ); /* frame count */

    u32SciLen = 0;

    scilinREG->SETINT = SCI_SET_TX_DMA;

    dmaSetChEnable( DMA_CH_SCI_TX, DMA_HW );

    }
    }