This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM46xxx DMA SCI RX ringbuffer - frame count does not update?

Other Parts Discussed in Thread: HALCOGEN

Hello,

I have simple DMA SCI setup where SCI RX side is running as a continous "ring buffer" and TX side is sending data also with DMA. In RX side I am using inside actual application the CTCOUNT-register to determine how many bytes have been received from SCI. This works fine if I trigger a new TX when previous TX is ended (results RX & TX DMA channel switching and thus updating RX channel status registers???).

void dmaGroupANotification(dmaInterrupt_t inttype, uint32 channel)
/* Remarks: ISR: this function is called by HALCoGen code */
{
    if( inttype == BTC )
    {
        if( channel == DMA_CH_SCI_TX )
        {
            /* Disable SCI TX DMA Interrupt to stop DMA requests, is this really needed because DMA channel should be "dead" */
            scilinREG->CLEARINT = SCI_SET_TX_DMA;

            bDmaBusy = FALSE;

            char* pszString = "Hello!\r\n";
            DMA_vSend( (uint8*)pszString, strlen(pszString) );
        }
    }
}

How ever if I comment out that DMA_vSend() line, also my application stops receiving SCI RX data (CTCOUNT stays in value 0 all the time, also the CFTCOUNT field), if I look the buffer where DMA transfers the data, the data from SCI goes there, so the problem is that status information is _never_ updated. So the problem is that if you have only 1 DMA channel in use, the CTCOUNT register (and CFTCOUNT) does not update at all???

According to technical reference manual, CFTCOUNT does not have any restriction for update but for some reason CETCOUNT-field says that whole CTCOUNT register is not updated
CFTCOUNT: "Current frame transfer count. Returned the current remaining frame counts"
vs
CETCOUNT: "Current element transfer count. These bits return the current remaining element counts. CTCOUNT register is only updated after a channel is arbitrated out of the priority queue."

Really? Is the only option to implement a reliable ring reader (or actually to get any status information from DMA channel) to establish a dummy sw-triggered (with autoinit) DMA channel which for example copies a byte from dummy location to another just to get "primary" channel(s) out of arbitration queue to update its status if you do not have constantly at least 2 DMA channels running?

Here is my SCI RX side definition

#define DMA_CH_SCI_RX           DMA_CH0
#define DMA_CH_SCI_TX           DMA_CH1
#define DMA_CH_DUMMY_UPDATER    DMA_CH15

    /* Populate dma control packets structure for RX */
    tSciRxCTRLPKT.CHCTRL    = 0;                 /* channel control            */
    tSciRxCTRLPKT.ELCNT     = 1;                 /* element count              */
    tSciRxCTRLPKT.ELDOFFSET = 0;                 /* element destination offset */
    tSciRxCTRLPKT.ELSOFFSET = 0;                 /* element source offset      */
    tSciRxCTRLPKT.FRDOFFSET = 0;                 /* frame destination offset   */
    tSciRxCTRLPKT.FRSOFFSET = 0;                 /* frame source offset        */
    tSciRxCTRLPKT.PORTASGN  = 4;                 /* port b                     */
    tSciRxCTRLPKT.RDSIZE    = ACCESS_8_BIT;      /* read size                  */
    tSciRxCTRLPKT.WRSIZE    = ACCESS_8_BIT;      /* write size                 */
    tSciRxCTRLPKT.TTYPE     = FRAME_TRANSFER;    /* transfer type              */
    tSciRxCTRLPKT.ADDMODERD = ADDR_FIXED;        /* address mode read          */
    tSciRxCTRLPKT.ADDMODEWR = ADDR_INC1;         /* address mode write         */
    tSciRxCTRLPKT.AUTOINIT  = AUTOINIT_ON;       /* autoinit                   */
    tSciRxCTRLPKT.RDSIZE = ACCESS_8_BIT;                                 /* read size                  */
    tSciRxCTRLPKT.WRSIZE = ACCESS_8_BIT;                                 /* write size                 */
    tSciRxCTRLPKT.SADD   = (uint32)(&(scilinREG->RD));  /* source address */
    tSciRxCTRLPKT.DADD   = (uint32)&au8SciRxBuffer[0];    /* destination address        */
    tSciRxCTRLPKT.FRCNT  = ELEMENTS(au8SciRxBuffer);                       /* frame count                */

    /* Setting whole dma control packet for RX */
    dmaSetCtrlPacket(DMA_CH_SCI_RX, tSciRxCTRLPKT);

    dmaSetPriority( DMA_CH_SCI_RX, HIGHPRIORITY ); /* default is LOW PRIORITY */

    dmaSetChEnable(DMA_CH_SCI_RX, DMA_HW);  /* Enable DMA channel */

And then I use same structure (to minize the need to config same things again) to start the dummy channel

    /* Make dummy channel so SCI_RX status info will be updated if no other DMA transmissions are active */
    tSciRxCTRLPKT.ADDMODERD = ADDR_FIXED;                   /* address mode read          */
    tSciRxCTRLPKT.ADDMODEWR = ADDR_FIXED;                   /* address mode write         */
    tSciRxCTRLPKT.TTYPE     = BLOCK_TRANSFER;               /* transfer type              */
    tSciRxCTRLPKT.SADD   = (uint32)&au8DmaDummyUpdater[0];  /* source address */
    tSciRxCTRLPKT.DADD   = (uint32)&au8DmaDummyUpdater[1];    /* destination address        */
    tSciRxCTRLPKT.FRCNT  = 1;                               /* frame count                */

    dmaSetCtrlPacket(DMA_CH_DUMMY_UPDATER, tSciRxCTRLPKT);
    dmaSetChEnable(DMA_CH_DUMMY_UPDATER, DMA_SW);  /* Enable DMA channel */

So looks like that at least technical reference manual has a bug in CFTCOUNT description?

  • You are correct, the DMA does not lend itself to receiving an arbitrary length of data. Faced with a similar requirement I used DMA for transmit and interrupts for receive. Another solution would be to initialize the circular buffer to 0xFFFF (16-bits) and use the DMA to read 16 bits from the RX buffer. The upper 8 bits will always be read as 0 and can be used as a flag to show that valid data is stored in the lower 8 bits. The obvious downside is that the circular buffer has to be twice as large and the CPU when reading from the circular buffer must write to set the location back to 0xFFFF after the read. In short, the dummy DMA channel is probably as good a solution as any.

    Making the DMA write back the count values after each transfer would degrade the performance of DMA.
  • Thanks for confirmation. Please fix the CFTCOUNT describtion in technical reference manual (chapter 16.3.2.9 in literature number: SPNU514B)

    We are planning to run SCI with ~2M baudrate so using the interrupts for the receive is not an option, it has to be DMA. Currently planning to use timer irq (RTI) to trigger ring buffer reading at 100us period to get some how decent "stream break" time notification because DMA HW does not offer this kind of feature either.

    So looks like that the options are the 16bit wide read or constant dummy DMA channel (one option - in order to lower DMA load - could be also to use the RTI to trigger the dummy DMA channel "sometimes", say like 2-4 times faster (25us-50us) than actual CFTCOUNT reading to quarantee that the register value is updated between reading???).

    Could you say something about the performance in different solutions because I am not yet at the point that I could run the system with targeted 2M speed, currently using only 115200 with windows PC? I mean that what happens to other DMA channels if there is one dummy channel constantly transferring data all the time in low priority queue, is there any pitfalls with this? Assuming that dummy 1 byte (is this optimal, should it be native 32bit/64 wide transaction) transferring takes time next to nothing so it should not disturbance a lot the other channels but cannot be sure. Using RTI as a dummy trigger would decrease the overall DMA load but is there actually any need for this, basically makes this kludge even more complicated :)? Device power consumption is not limiting factor in our application (assuming that constantly moving data with DMA consumes a bit more power).

    With DMA I am planning to use at least 2 other DMA channels SCI TX (this needs to produce "continuous stream" when sending) and SPI TX (for debug purpose, RM44 does not have 2 SCIs). Is there need to use - assuming not - also SPI RX with DMA if I am not interested the data coming back, just let the SPI RX side overrun). Should these or at least the SCI TX be also configured to high priority (technical reference manual says that high priority queue should be in fixed mode and low priority in rotating mode) just to minimize the dummy DMA channel effects?


    Could I use same source&destination address in DMA dummy channel or do I need to use separate addresses like 2 slot array as in my example (no safety related issues or anything after the safety features are enabled if using same address)?

  • You could also chain the dummy channel to RX DMA channel. This will guarantee DMA arbitration, with one caveat CTCOUNT will not get updated at the boundary condition where CTCOUNT transition to zero!

    Another option is to use HET to count down a "data" value which get overwritten by the dummy channel. HET generates an interrupt if the count transitions to a threshold.

    I think the easy method is to monitor the buffer using "16bit read" like Bob suggested.

  • According to technical manual, the chaining is triggered after only the all given frames are transferred (see CHAIN field), this would lead to a situation where "dummy channel" would run/triggered only when actual DMA SCI RX buffer will wrap around. Say that you have buffer which length is 100 (it is also the frame count), then if you get 10 bytes from SCI the "dummy channel" will not run and thus the actual DMA SCI RX status will not get updated.

    I do not also understand what you mean by HET & "data" & "dummy channel" combination, obviously the dummy channel does not do anything meaningful except abitrates DMA SCI RX channel out from active queue in order to update its status. Are you referring N2HET when saying HET, HET does not exists in technical manual? Are you saying that if using the 16bit SCI reading then you could monitor with N2HET how the DMA reading progress (from upper byte lieke Bob suggested)? I'll guess that this would be doable, but sounds like a quite big work amount to do and also for some reason this sounds even more kludge than running the dummy channel to get the information which should be available by default from the peripheral itself.

    I would say that using this dummy channel is at least much simpler & consumes less actual cpu time (I am just a bit worried about DMA performance what this kind of dummy channel might do to it when everything is in) than checking&flipping the upper bytes manually. Maybe with N2HET you could outsource the upper byte checking&flipping back to 0xff so the 16 bit reading wouldn't actually consume anymore cpu time than this dummy channel...

  • Jarkko Silvasti said:

    According to technical manual, the chaining is triggered after only the all given frames are transferred (see CHAIN field), this would lead to a situation where "dummy channel" would run/triggered only when actual DMA SCI RX buffer will wrap around. Say that you have buffer which length is 100 (it is also the frame count), then if you get 10 bytes from SCI the "dummy channel" will not run and thus the actual DMA SCI RX status will not get updated.

    True if your transfer type is BLOCK mode. But your transfer type is FRAME, one frame per trigger. So the chained channel will trigger after each frame transfer.

    Jarkko Silvasti said:

    I do not also understand what you mean by HET & "data" & "dummy channel" combination, obviously the dummy channel does not do anything meaningful except abitrates DMA SCI RX channel out from active queue in order to update its status. Are you referring N2HET when saying HET, HET does not exists in technical manual? Are you saying that if using the 16bit SCI reading then you could monitor with N2HET how the DMA reading progress (from upper byte lieke Bob suggested)? I'll guess that this would be doable, but sounds like a quite big work amount to do and also for some reason this sounds even more kludge than running the dummy channel to get the information which should be available by default from the peripheral itself.

    Use your imagination! So you have two eNhanced HET. I am thinking of a (N)HET routine to count down and trigger an inter character timeout interrupt. The count (data) will be overwritten by the "dummy channel" transfer after SCI RX => restarting the countdown timer.

    Jarkko Silvasti said:


    I would say that using this dummy channel is at least much simpler & consumes less actual cpu time (I am just a bit worried about DMA performance what this kind of dummy channel might do to it when everything is in) than checking&flipping the upper bytes manually. Maybe with N2HET you could outsource the upper byte checking&flipping back to 0xff so the 16 bit reading wouldn't actually consume anymore cpu time than this dummy channel...

    By chaining the "dummy channel" itself, you induce arbitration and ITCOUNT gets updated(well almost!) Are you worried about DMA won't be able to keep up with the SCI peripheral? Are you using many other DMA channels? Making "dummy channel" lowest priority should keep it out of other busy ones.
    Checking & Flipping the RX data buffer in internal RAM should be faster than reading the ITCOUNT of the DMA peripheral.