This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6548: UDMA uart rx write latency

Part Number: AM6548

I am trying to setup the uart0 rx udma so that it will write the data to ddr on every byte received vs every ~64bytes(guessing udma fifo gets filled). It looks like the udma has the data in some fifo since PDMA_PSILCFG_REG_DEBUG_1.Z is incrementing every byte I send. I don’t see a way to make it write that data to ddr when it only has 1 byte though. How would I set it up to do smaller writes?

  • What software are you using? Which version?
  • How are you configuring UART_FCR? Have you disabled the FIFO, i.e. FIFO_EN=0? In other words, is the delay you're seeing related to buffering in the UART itself, or do you think there's buffering elsewhere in the chip?
  • It isn't in the uart. I have it trigger the dma event for every byte and the UDMA shows it has the byte received based on the PDMA_PSILCFG_REG_DEBUG_1.Z. I also used the same uart settings with the EDMA and it worked fine.
    It is baremetal software but I was using pdk am65xx 1.0.2 as a reference. The PDK software also only writes when the internal fifo is full or end of cppi desc buffer.
  • Ryan,

    Could you indicate what is the FIFOCNT and ELEMENT Size setting that you are using in the PDMA settings. Have you tried setting the FIFCNT to 0 and the Element size to the TX/RX size. The UART driver in PDK uses UDMA in PDMA mode. My understand is that this setting in the PDMA results in no packet delineation in PDMA and all framing is controlled via the UDMA TR similar to EDMA implementation.

    If you are looking at PDK implementation of the UART driver, the UART driver uses the same implementation as previous SItara devices but uses the soc/dma/v2 version for configuring the DMA were you can check the PDMA configuration for transferring data from UART FIFO to application defined buffers. The trigger levels are set using configuration structures in UART_soc.c. 

    For write transfer:
    
    pdmaPrms.elemSize = 0; /* each data element is 8-bit byte*/
    pdmaPrms. elemCnt= hwAttrs->txTrigLvl; /* # of bytes to transfer for each DMA FIFO trigger event */
    pdmaPrms.fifoCnt    = 0U; /* # of DMA FIFO trigger events, don’t care for write, since the total txSize is set in the descriptor */
    
    For read transfer:
    
    pdmaPrms.elemSize = 0; /* each data element is 8-bit byte*/
    pdmaPrms. elemCnt= hwAttrs->rxTrigLvl; /* # of bytes to transfer for each DMA FIFO trigger event */
    pdmaPrms.fifoCnt    = rxSize / hwAttrs->rxTrigLvl; /* # of DMA FIFO trigger events */
    

    Regards,
    Rahul

    PS: Can you also please clarify why you are unable to use the UART and UDMA LLD in the PDK for your application development. 

  • elemSize=0;

    elemCnt=1;

    fifocCnt=0,1,or sizeof(buffer).

    When 1 it ends the TR and I get 1 byte but no more bytes.

    When 0 or sizeof(buffer) writes occur at 64bytes or less if sizeof(buffer) is <64.

  • I tried using a tr desc vs a host desc and it also only writes once the 64byte fifo is full. I thought maybe the event_size or eol settings in the tr flags desc might fix it but they had no effect.

    I should clarify that when fifocnt=0 the write to ddr only occurs after 64bytes received even if sizeof(buffer) is < 64 so it is actually worse.

    If I do a teardown on the channel it will write out whatever was queued but that isn't a good solution.
  • Ryan, Rahul,
    Data may have been queued in the UDMA. I remember we had discussions on this front, let me dig some details and respond back tomorrow.
    Jian
  • Did you find anything that might help resolve this?

  • Still chasing some spec. details. will report back end of day.
  • Ryan,
    Our team suggested we may need to "play around credit count to force PDMA/UDMA to not hold on to data".
    In case you already run into these registers in the code and want to give a try.
    Otherwise, I will have more details to follow.
    Jian
  • Ill probably need your help with that. I tried setting the PEER_CREDIT_REG to 1 before and then it didn't work. It also wasn't clear if the minimum size was 1 byte or 4 byte because of PEER_THREAD_WIDTH.
  • Ryan,

    I was able to reach several team members on this problem.

    They confirmed the UDMA will attempt to accumulate 64 bytes until it write to memory. Unless an "End of Packet (EOP)" is received. the EOP is generated by the PDMA, when you set fifocCnt=1 in your experiment. However, the UDMA also terminates the TR when it sees the EOP, that is why you get no more than 1 byte.

    So the only way to support your use case of single byte transfer, is to resubmit the TR, or use multiple TRs. The other two settings of fifoCnt=0, will cause the PDMA in streaming mode, similar to fifoCnt=sizeof_buffer, both will cause the UDMA to queue for 64 bytes.

    This behavior has been identified and we've upgraded the PDMA in future devices, where an additional "EOL" marker is added to tell UDMA to write out (also create EOL event), but not terminate the TR.

    We would also like to understand the use pattern of the UART data, in your system - assuming there are real time data coming in from the UART, they are accumulated in the memory by DMA in the system memory. Do you intend to let CPU to check on the data each time a byte arrives? If so, it seems the you can use let CPU to resubmit TR; or, if the data just need to be accumulated, why not use the UART FIFO to do so? The third possible way to consume the data, I am guessing other subsystems on the chip will be consuming each byte when it arrives?

    your feedback will help us to understand the priority of the upgrade. Thanks.

    Regards
    Jian
  • I will mark the ticket as closed to satisfy our internal accounting. please feel free to reopen.
    Jian
  • That best  design I have found for our varying platforms running anywhere from 9600 to mbits is to setup a circular DMA that writes each byte to a large(16-64kby) buffer and disable RX irqs. I avoid IRQs on our RTOS because each task has a fixed time slot it runs in. If I service  IRQs then it eats into that time slot and it is also non deterministic. The buffer is large enough that even with extremely large(100s milliseconds) latency it doesn't overflow. You can map the buffer to sram to get around the ddr page thrashing at higher speeds.

    I only see a couple ways to use UDMA with a uart rxing packet data of random sizes vs just a steady stream like mcasp. Even with mcasp the 64byte latency might be too large so I might not be able to use UDMA.

    1)generate a new TR every byte. just using fifo would be better

    2)creating a udma desc for each byte. now for my 16kby buffer I have 48*16k. The dma is also reading 48 and writing 1 bytes per byte. 

    3)setup the 16kby dma but do a teardown to flush the queue every time I read it and submit new tr. not a great solution

    I don't think either of these are viable. I am guessing I might try option 3 or dedicate a pru or cortexr core to do the equivalent of dma. 

    Our product won't be using am65xx. This work is just to get ready for your next cpu. Hopefully it will have a fix.

  • Ryan,

    Thanks for acknowledging the issue is resolved.

    I did confirm that the EOL feature already implemented in the follow-on devices. They you can:
    "Submit a 2D TR with ICNT0=1 and ICNT1=circular buffer size, with event generated every ICNT0. Put this TR in infinite loop mode. Put the PDMA in EOL mode as we discussed.
    Route the UDMAP ICNT0 event to an event counter in the IA, but don’t use it to generate an interrupt. Then the IA can be polled at any time to see how many bytes have been written to the buffer.
    Again, this is going to have system performance impact, so anyone doing it should be aware of it.
    "
    Note the last note in his email warned about the many small transactions on bursts, even the circular buffer is in the SRAM.

    regards
    Jian
  • Thanks for the help on this.
    The many small transactions aren't ideal but I haven't seen an alternative that is better for our application. Ideally I could just write to a flush bit in the UDMA and it would generate a EOL event and not terminate the TR. The counter also wouldn't update until the data is in ddr/sram. Another option is generating a EOL if data hasn't been updated in X microseconds.
  • Ryan,

    I agree with your point about the bus utilization. On your requirement:
    - The counter also wouldn't update until the data is in ddr/sram
    We also implemented an enhancement feature in the PDMA upgrade, below is the description of the feature:
    "we implemented a feedback message from the PDMA so that the UDMAP will now optionally wait for the PDMA TX to complete writing all its data before the UDMAP signals that the TX teardown is complete."
    Again, both the EOL and TX complete event will only be available in the upgraded PDMA, post current AM654x PG1 silicon.

    Regards and Happy New year.
    Jian
  • Part Number: AM6548

    j7 udma ip was updated to support the issue that is documented in https://e2e.ti.com/support/processors/f/791/p/751871/2806066#2806066

    Is the new bit I am suppose to use PDMA_PSILCFG_REG_STATIC_TR_Z[31].eol . 

    What should the other values be in the tr desc. I have been able to get it to write each byte as it comes in but I haven't seen a way to get a infinite loop. I saw the STATIC bit 4 but that is for type 8 and 9 block move trs.

    What register can I read to let me know the current buffer index that is flushed to ram?