This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Transfer of small data chunks via uPP



Among other things, we intend to transfer small data chunks,
e.g. 8 bytes, via the uPP interfaces to an external device.

Tested on an evaluation board, on which I have sent data
from channel B to channel A via the uPP's loopback feature,
I have noticed that the transfer of small data chunks
requires a certain minumum time. I had measured 1.15 usec for
a single transfer at a configured 75 MHz clock cycle rate.

It's supposed that this is determined how the internal DMA
controller exchanges the data with the associated IO channel.
During the test, TXSIZEA/B = RDSIZEI/Q = 0, meaning that
I had the minimum TX/RX thresholds of 64 bytes.

Can you confirm that the measured minimum time is caused
by this data exchange inside the uPP pheriperal?

What is really transferred from the TX channel to the RX channel?
Only the intended 8 bytes, marked by the enable signal or more,
e.g. 64 bytes, because e.g. some padding data has been added?

Is there any other, faster way to transfer small data cunks via the uPP,
e.g. by not using the internal DMA controller?

  • How are you measuring your transfer time?  Is it the time between programming the DMA registers and the EOW interrupt event?

    If you specify a small transfer size (i.e. < 64), the uPP transfer still only consists of that number of bytes.  It is not "padded" to reach a multiple of 64.  You can observe this by connecting the channel A and B pins together and running an external loopback test.

    I will need to do some checking to see if I can explain the latency that you are observing.  As an experiment, what is the completion time you observe for a larger transfer size, like 1024 bytes?

  • I have used the following code snippet to measure the transfer time
    betw. transmitter and receiver:

     printf("--- Initiating a 1st RX request ---\r\n");
     while(UPIS2r->bits.PEND == 1){};
     UPP->UPID0 = (uint32_t)&recv_buffer;
     UPP->UPID1 = UPP_DMA_LNCNT_BCNT;
     UPP->UPID2 = UPP_DMA_LOFFSET;
     printf("--- 1st RX request initiated ---\r\n");

     start = USTIMER_get();
     for(i = 0; i < 100; i++)  {
      //printf("--- Initiating %8d. TX request at time %8d usec ---\r\n", i+1, USTIMER_get());
      while(UPQS2r->bits.PEND == 1){};
      switch( i%4 ) {  //add next DMA transfer
       case 0:  UPP->UPQD0 = (uint32_t)&xmit0_buffer; break;
       case 1:  UPP->UPQD0 = (uint32_t)&xmit1_buffer; break;
       case 2:  UPP->UPQD0 = (uint32_t)&xmit2_buffer; break;
       default: UPP->UPQD0 = (uint32_t)&xmit3_buffer; break;
      }
      UPP->UPQD1 = UPP_DMA_LNCNT_BCNT;
      UPP->UPQD2 = UPP_DMA_LOFFSET;   //no offset between lines; 0 is also possible. Triggers the transfer

      //printf("--- Initiating next RX request at time %8d usec ---\r\n", USTIMER_get());
      while(UPIS2r->bits.PEND == 1){};
      UPP->UPID0 = (uint32_t)&recv_buffer;
      UPP->UPID1 = UPP_DMA_LNCNT_BCNT;
      UPP->UPID2 = UPP_DMA_LOFFSET;
     }
     end = USTIMER_get();
     delta = end-start;

     printf("--- 100   %3d-bytes transfers completed after (%8d-%8d)=%8d usec ---\r\n",
       UPP_DMA_BCNT, end, start, delta);


    For LineCount = 1, this has produced the following results for different ByteCount sizes:

    --- 100     4-bytes transfers completed after ( 3102963- 3102848)=     115 usec ---
    --- 100     8-bytes transfers completed after ( 3085182- 3085067)=     115 usec ---
    --- 100    16-bytes transfers completed after ( 3106684- 3106569)=     115 usec ---
    --- 100   128-bytes transfers completed after ( 3107017- 3106901)=     116 usec ---
    --- 100   256-bytes transfers completed after ( 3115570- 3115399)=     171 usec ---
    --- 100   512-bytes transfers completed after ( 3111862- 3111522)=     340 usec ---

    From the values, gained for the transfer of 256 and 512 byte chunks, I have supposed
    that the transfer is perfectly done for these values. In example
      340 usec / 100 / (512/2) = 13.3 nsec (corresponds to 75 MHz)
    The interface is hereby configured to transfer 16 bits (2 bytes per clock cycle).


    On the evaluation board, I have, it's mechanically not that easy to connect the pins,
    you suggest, to establish an external loop.

  • Thanks for posting more details.  It definitely looks like you're getting near-ideal throughput at transfer sizes greater than 128 bytes.  It's very interesting to me that you get roughly the same time measurements for all transfers at or below 128 bytes.  I have a couple thoughts on why this could happen:

    1. Your application may not be able to keep the uPP peripheral running consistently with very small transfer sizes
    2. If your data buffers are located in external memory (i.e. DDR), there may be some latency associated with accessing those data buffers

    For #1, note that a 4-byte transfer in 16-bit mode at 75 MHz only lasts 26.7 ns, or around 8 CPU cycles at 300 MHz.  Your loop almost certainly has a longer period than that, meaning that you will have some down time where the uPP peripheral is inactive between successive transfers.  The "minimum" transfer completion time of 1.15 us corresponds to 345 cycles, which seems high for the loop that you are using.  You could experiment by adding some "extra" cycles to your loop to see if the "minimum" increases proportionally.

    For #2, you can experiment by placing the uPP data buffers in L2 or L3 RAM to see if the "minimum" transfer time decreases.