am243x Freertos jitter

Other Parts Discussed in Thread: SYSCONFIG

Hi

We suffer from unexplained jitter in FreeRTOS (Cortex R, core 0).

Measurement method:

  1. Periodic interrupt, once at 500 uSecs
  2. ISR posted a binary semaphore to task (at highest priority)
  3. Tasks wakes up and samples CPU time stamp counters.

Differences are used to assess system jitter.

Simple example, one task, jitter within 2-5 uSecs. Good.

It looks like latency grows when I enable more parts in software, plus some factors that I cannot explain.

No access to hardware. Low priority tasks wake up periodically, but do nothing.

For example, there is big difference in latency with and without initilization of network code.

Without init of network, latency/jitter is +/- 20 uSec, with init of network, but cable is disconnected (no interrupts from DMA) +/- 80 uSec.

All code run from DDR, DDR is cached.

Cache miss can explain such a behavior?

Any thoughts ?

  • Hi Rasty,

    In the nutshell, with the complex application such as the TCP/IP network stack running, those behaviors are expected, because the latency you measured is from a low priority task. If the higher priority tasks are running taking long time to finish, then the response from the low priority task will be delayed. It is not because of the interrupt service routine was delayed, but simply because the low priority task's response was delayed.

    One way to reduce the latency is to reduce the execution time for the higher priority tasks by putting the most frequently used function and data into the TCM or OCRAM instead of putting them in DDR. Of course, they need to do the profiling for their application to understand the functions and data areas to be put in faster memory like TCM and OCRAM.

    Of course, you can also increase the priority of the task which they need short response time.

    There are only 32KB program cache and 32KB data cache, so when the program is as complex as TCP/IP network stack, the cache misses will increase for sure, therefore the execution time for the higher priority tasks will increase too. That is why you want to put the most frequently used functions and data in TCM or OCRAM instead of DDR.

    Best regards,

    Ming

  • Hi,

    I think that you did not get me right.

    I measure response of higherst priority task. It jitters, despite the fact that there is nothing above it.

    I expect some influence of cache refill, but not that high.

    Thanks

    Rasty

  • Hi Rasty,

    Do you have any tasks have the same priority as the task you measure the response latency?

    Best regards,

    Ming

  • Hi

    Task is at the highest priority. No other tasks at the same priority.

    Rasty

  • Hi Rasty,

    If that is the case, then the delay must be in the network related ISRs. Can you measure the time taken in the network related ISRs using the CycleCounterP APIs?

    Best regards,

    Ming

  • Hi,

    Network cable is unplugged, no DMA interrupts, checked with break point.

    Thanks

    Rasty

  • I found major source of real-time problems, but not all.

    How to I reconfigure Ethernet to work in polling mode, without DMA, cache invalidation and without interrupts?

  • Hi Rasty,

    Since this is a question for Ethernet, I will forward your thread to our Ethernet-CPWS expert for further help!

    Best regards,

    Ming

  • Hi Rasty

    Is it possible to share a test which we can replicate on the EVM?

    1. From this image, are you inferring that there is a Cache invalidate API which may be impacting the performance? I'll try to look more in the call trace, but I'd expect this to be just invalidating a particular cache block and not impacting the high priority task's cachebility. (unless there is some interdependence wrt same cache line having cached the code for your high priority task)

    2. Is this call trace only when sending/receiving packets? Or does this trigger even when the cable is disconnected?

    3. What interrupts are registered (and triggering) for the R5F and what are their priorities? I see a 500us timer interrupt and an Ethernet interrupt(?).

    Regards

    Karan

  • Hi

    1. I did not expect cache invalidation , because there is an option to define non-cached memory for stack via MPU.

    2. Just a put a breakpoint to cache invalidating function and see what happens.

    3. Only timer interrupt, and DMA.

    Is it possible to switch to polling mode? 

    Thanks

    Rasty 

  • 1. GCC complied example (not TI CLANG).

    2. Rasty to share which SDK example is being used here. TI to reuse (with TI CLANG) that and add a 500us timer task to see if we are able to replicate this issue.

    3. Rasty to comment on what is the priority of the 500us timer task.

    Is it possible to switch to polling mode? 

    4. TI to share steps for this.

    Regards

    Karan

  • Hi

    Please explain how all those questions are related to my request?

    ** is it possible to switch Ethernet driver/stack to polling mode and/or eliminate cache invalidation?

    Timer priority is irrelevant, because problem is general slow down of the system after initilization of Ethernet stack, while network cable is *not* connected. 

  • Rasty

    Your request is a part of question 4 per my notes.

    Other questions / comments will help us debug this issue.

    Regards

    Karan

  • Hi Rasty,

    You can refer "enet_lwip_cpsw" example for polling method implementation.

    file:///C:/Repos/mcu_plus_sdk/docs/api_guide_am243x/EXAMPLES_ENET_LWIP_CPSW.html

    Can you share your test example to reproduce the issue on my setup?

    Best Regards

    Ashwani

  • Notes from 3/7

    1. TCP/IP is not time critical, this can be put in DDR un-cached. Ashwani (TI) to provide details on what buffers/code can be put in uncached DDR. Also how to make these code changes for changing the mem map.

    Step1. Put Data only in uncached DDR.

    Step2. Put .text selectively in uncached DDR.

    Regards

    Karan

  • Hi Rasty,

    Sorry for delay in response.

    Please follow below document for latency benchmark.

    www.ti.com/.../spracv1b.pdf

    Allow me some more time to sync internally and get back to you.

    Best Regards,

    Ashwani

  • Hi

    How do I change this parameters (come from file generated by sysconfig)

    /*! \brief RX packet task stack size */
    #define LWIPIF_RX_PACKET_TASK_STACK (1024U)

    /*! \brief TX packet task stack size */
    #define LWIPIF_TX_PACKET_TASK_STACK (1024U)

    /*! \brief Links status poll task stack size */
    #if (_DEBUG_ == 1)
    #define LWIPIF_POLL_TASK_STACK (3072U)
    #else
    #define LWIPIF_POLL_TASK_STACK (1024U)
    #endif

    Thanks

    Rasty

  • Hi Karan

    I still did not get unswer to this question.

    I tried following

    1. Defined memory area in ddr from address 0x90000000 as non-cachable (DDR_NC)

    2. Moved following sections to that non cachable area

    DDR : ORIGIN = 0x80000000 , LENGTH = 0x10000000

    DDR_NC : ORIGIN = 0x90000000 , LENGTH = 0x10000000

    *(*ENET_DMA_DESC_MEMPOOL)
    *(*ENET_DMA_RING_MEMPOOL)
    /*#if (ENET_SYSCFG_PKT_POOL_ENABLE==1)*/
    *(*ENET_DMA_PKT_MEMPOOL)
    /*#endif*/
    } > DDR_NC

    .bss (NOLOAD) : ALIGN (128) {*(ENET_DMA_OBJ_MEM)} > DDR_NC
    .bss (NOLOAD) : ALIGN (128) {*(ENET_DMA_PKT_INFO_MEMPOOL)} > DDR_NC

    3. Disable cache invalidation.

    Once I disable cache invalidation I get very strange behavior of  stack

    Need help to move DMA buffers to non-cached memory and get rid of cache invalidation.

  • get rid of cache invalidation

    We can not completely get rid of cache invalidation as it is handled in driver code.

    DMA descriptor and DMA RING MEMORY should always be cached.

    ENET_DMA_OBJ_MEM)

    This should be in cached section.

    ENET_DMA_PKT_MEMPOOL

    This is related to packet payload. So, can be cached or un-cached memory location, based on use case.

    Regards

    Ashwani

  • Please Explain why it should be cached.

    Whay are alternatives? Do you have DMA-less network drivers? Polling?

    Thanks

    rasty 

  • Please Explain why it should be cached.

    If you want to move DMA descriptor and DMA RING to un-cached memory, Then

    1 Then, you will get reduced performance.

    2. You need to remove cache invalidation related code from driver

    Do you have DMA-less network drivers?

    We do not support DMA-less networking with ICSSG and CPSW with 1G.

    Regards
    Ashwani

  • Can you send me a summay of places that I have to change in order to get rid of cache and invalidation?

    I'm asking because I already did that by my own, but stack does not work well. I probably miss something important.

  • How do I change this parameters (come from file generated by sysconfig)

    Once you generated files from sysconfig.

    Then, you edit the files.

    Copy the generated files and paste them into example/project directory.

    Set SysConfig to be not included in the build and then build the project.

    Can you send me a summay of places that I have to change in order to get rid of cache and invalidation?

    We are working on this.

    Regards

    Ashwani

  • I have indication that cache invalidation is not all.

    Even if I comment out cache invalidation, I still have ISR jiitter of 50 uSec. I have an impression that something related to Ethernet/DMA disables interrupts for a long time or some ISR (like Udma_eventIsrFxn) takes long.

  • Hi Rasty,

    Can you help me with the detailed steps to reproduce the issue on my setup?

    I will start with SDK 9.1.

    which SDK example to use ?

    What are the local changes on your setup that I need to add my setup to reproduce the issue ?

    Regards

    Ashwani

  • You can take any TI TCP/IP example.

    On top of it

    1. Add periodic timer, say 125 uSec

    2 From timer ISR give a semaphore to high priority task

    3. In task measure the difference in task wakeup time. Look for minuimum/maximum.

    4. Repeat this test with and without Ethernet traffic.

  • Hello Ashwani ,

    Here, we can keep source and destination buffer addresses in non-cached memories. So, R5F will directly read data from destination buffer without cache_invalidation after DMA completion. I think this is possible.

    Even if I comment out cache invalidation, I still have ISR jiitter of 50 uSec. I have an impression that something related to Ethernet/DMA disables interrupts for a long time or some ISR (like Udma_eventIsrFxn) takes long.

    Rasty,

    Please look at the image below. Typically when DMA starts operation we disable the all  interrupts and after the starting of DMA again we resume the interrupts .This, operation you can see in image below. I assumed that we use the same UDMA API in ethernet driver also  to initiate the DMA. So, this could create an issue.

    Regards,

    S.Anil.

  • We use udma+cpsw+lwip so I assume that we use the same API.

    What would you suggest?

  • Rasty,  I am not familiar to industrial protocols but mostly the same API Udma_ringQueueRaw function is used in the entire MCU+SDK to initiate the DMA.

    And, we need to check what the peripherals are being used in your Applications. Since , we are doing same the interrupts disabling or resuming after some critical operations. So, if you share all the details about what peripherals being used in your applications, it is really helpful to debug the issue further.

    We use udma+cpsw+lwip so I assume that we use the same API.

    My assumption is also same .

    Ashwani , can you confirm here ?

    Regards,

    S.Anil.

  • It is not industrial protocol. Ethernet tcp/ip based on TI example.

  • In general I do not undertand why TI drivers use HwiP_disable without any wrappers that allow disabling only selected peripheral interrupts.

  • Ashwani , can you confirm here ?

    We are using line #1380

    Regards

  • In general I do not undertand why TI drivers use HwiP_disable without any wrappers that allow disabling only selected peripheral interrupts.

    For any atomic operation to happen, global interrupts need to be disabled.

    Regards

    Karan

  • I did some measurement

    Enqueueand Dequeue contributes to jitter 8 and 9 uSecs each other.

    Udma_eventIsrFxn tooks 54 uSecs (!).

    From my perspective it is design flaw.

    ISR shall not do such a major work - must be threaded, work shall be done in task not in ISR.

    In that case you would not need to disable interrupt globally, it would be enough to have mutex.

    In any case I do not see a reason for global disable of interrupts, software shall mask only DMA interrupt.

  • You can refer "enet_lwip_cpsw" example for polling method implementation.

    file:///C:/Repos/mcu_plus_sdk/docs/api_guide_am243x/EXAMPLES_ENET_LWIP_CPSW.html

    Hi Rasty,

    You can use above example as a reference of polling method.

    In case of polling mode, application can periodically call EnetDma_retrieve* APIs to get TX free and RX full packets.

    Code snippet below shows polling mode usage for receive operation. The periodic task retrieves packets from Enet DMA and passes it to processing stack periodically.

    void EnetApp_periodicTask(void)
    {
        /* Receive packets from DMA */
        while (true)
        {
            status = EnetDma_retrieveRxPktQ(hRxCh, pRetrieveQ);
            /* Processes the received packets and enqueues into freeQ */
            process(pRetrieveQ);
            status = EnetDma_submitRxPktQ(hRxCh, pFreeQ);
     
            sleep(100);
        }
    }

    For TX, polling can be used to retrieve transmission complete packets. 

    void EnetApp_sendPkt(void)
    {
        /* Submit TX ready packets for transmission */
        status = EnetDma_submitTxPktQ(hTxCh, pSubmitQ);
    }
     
    void EnetApp_periodicTask(void)
    {
        /* Retrieve free TX packets from DMA */
        while (true)
        {
            sleep (100);
     
            status = EnetDma_retrieveTxPktQ(hTxCh, pRetrieveQ);
        }
    }

    Best Regards

    Ashwani

  • Actions items from today Meeting:

    1. If global interrupts disable is actually required for UDMA's ringQueueRaw callbalk.
    2. Replicate the test setup from Rasty with SDK's enet TCP server example and measure jitter.
    3. Work with enet team to discuss if the memory placement can be optimized
      1. after we've been able to replicate the tests
    4. The above is with TI-ARM CLANG -03 optimizations, move to GCC after this.

    Regards

    Ashwani

  • In task measure the difference in task wakeup time.

    Which API are you using to get this value ?

    Regards

    Ashwani

  •  CycleCounterP_getCount32()

  • Hi Rasty,

    To reproduce the issue on my setup below steps I follow

    1. Started with enet_cpsw_tcpserver example SDK 9.1
    2. open sysconfig 
      1. Added timer with 125us tick and "timerTickIsr"
      2. Added GTC to get tick count for timestamping
    3. Changes main.c file
      1. Added "timer_task" with priority MAIN_TASK_PRI-2
      2. Added jitter_task with priority MAIN_TASK_PRI
      3. Reduce priority of freertos_main to MAIN_TASK_PRI-1
      4. Added semaphore to synchronize timer ISR and jitter calculation
      5. Here is updated main.c file
      6.  /cfs-file/__key/communityserver-discussions-components-files/81/5037.main.c
    4. Changes in app_main.c file as below
      1. commented out ethernet related code
      2. Here is update file
      3. /cfs-file/__key/communityserver-discussions-components-files/81/app_5F00_main_5F00_without_5F00_ethernet.c

    Please review the changes and let me know if these are okay to reproduce your setup on my board ?

    Next Steps:


    1. Get the jitter values with Ethernet code enable

    2. Get the jitter values with Ethernet code disable

    Regards

    Ashwani

  • Task shall look like.

    You sould not print messages from this task.

    volatile uint32_t _diffCounterMax=0;

    void empty_hop2(void *args) __attribute__((section("CRITICAL_TEXT_SECTION1")))
    {
    uint32_t endCounter,diffCounter;
    startCounter = CycleCounterP_getCount32();
    while (1)
    {
    xSemaphoreTake( Sem, portMAX_DELAY); /* wait for wake up from TimeTick */
    endCounter = CycleCounterP_getCount32();
    diffCounter = endCounter - startCounter;
    if (diffCounter > _diffCounterMax)
    _diffCounterMax = diffCounter;
    startCounter = endCounter;
    }
    }

  • Hi ,

    Yesterday, I was out of office.

    Here is updated main.c file while app_main.c file is same as previously shared (excluding Ethernet code)

    /cfs-file/__key/communityserver-discussions-components-files/81/4064.main.c

    We created 3 Tasks 

    Timer ISR trigger = > Task 1: Main Task give semaphore to ==> Task 2: Empty Task give semaphore to ==> Task 3: Jitter Task

    I am seeing ~1 us jitter in this test.

    Now where you want me to add/ enable ethernet code (Task-1 or Task-2 or Task-3) and get new jitter value with ethernet enablement?

    Here

    Regards

    Ashwani

  • Suggested patch approach to DMA driver.

    0001-1.-Threaded-DMA-interrupt.patch.txt
    From 59a64ce5ba034b13cd7c711192c23007154c40c3 Mon Sep 17 00:00:00 2001
    From: rasty slutsker <rasty.slutsker@servotronix.com>
    Date: Thu, 21 Mar 2024 11:43:59 +0200
    Subject: [PATCH 1/3] 1. Threaded DMA interrupt
    
    ---
     .../drivers/makefile.am243x.r5f.ti-arm-gcc    |   4 +
     .../mcu_plus_sdk/source/drivers/udma/udma.c   |   7 +
     .../source/drivers/udma/udma_event.c          | 133 ++++++++++++++++--
     .../source/drivers/udma/udma_ring_common.c    |  36 +++--
     .../source/drivers/udma/udma_utils.c          |   8 +-
     5 files changed, 165 insertions(+), 23 deletions(-)
    
    diff --git a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/makefile.am243x.r5f.ti-arm-gcc b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/makefile.am243x.r5f.ti-arm-gcc
    index f748c2c804..19cccf3708 100644
    --- a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/makefile.am243x.r5f.ti-arm-gcc
    +++ b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/makefile.am243x.r5f.ti-arm-gcc
    @@ -196,6 +196,10 @@ FILES_PATH_common = \
     INCLUDES_common := \
         -I${CG_TOOL_ROOT}/include/c \
         -I${MCU_PLUS_SDK_PATH}/source \
    +    -IFreeRTOS-Kernel/include \
    +	-I${MCU_PLUS_SDK_PATH}/source/kernel/freertos/config/am243x/r5f \
    +	-I${MCU_PLUS_SDK_PATH}/source/kernel/freertos/config/am243x/r5f \
    +    -I${MCU_PLUS_SDK_PATH}/source/kernel/freertos/portable/TI_ARM_CLANG/ARM_CR5F \
     
     DEFINES_common := \
         -DSOC_AM243X \
    diff --git a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma.c b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma.c
    index db4945a294..b9fda63011 100644
    --- a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma.c
    +++ b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma.c
    @@ -64,6 +64,8 @@
     /* ========================================================================== */
     /*                            Global Variables                                */
     /* ========================================================================== */
    +SemaphoreP_Object dmaPollMutex;
    +static int32_t once=1;
     
     /* None */
     
    @@ -82,6 +84,11 @@ int32_t Udma_init(Udma_DrvHandle drvHandle, const Udma_InitPrms *initPrms)
         DebugP_assert(sizeof(Udma_EventObjectInt) <= sizeof(Udma_EventObject));
         DebugP_assert(sizeof(Udma_RingObjectInt) <= sizeof(Udma_RingObject));
         DebugP_assert(sizeof(Udma_FlowObjectInt) <= sizeof(Udma_FlowObject));
    +    if (once)
    +    {
    +        retVal = SemaphoreP_constructMutex(&dmaPollMutex);
    +        once = 0;
    +    }
     
         if((drvHandle == NULL_PTR) || (initPrms == NULL_PTR))
         {
    diff --git a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_event.c b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_event.c
    index 3065c21dcc..370442f16a 100644
    --- a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_event.c
    +++ b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_event.c
    @@ -40,9 +40,13 @@
     /* ========================================================================== */
     /*                             Include Files                                  */
     /* ========================================================================== */
    -
    +#include <stdio.h>
     #include <drivers/udma/udma_priv.h>
    -
    +#include <kernel/dpl/TaskP.h>
    +#include <kernel/dpl/SemaphoreP.h>
    +#include <kernel/freertos/FreeRTOS-Kernel/include/FreeRTOS.h>
    +#include <kernel/freertos/FreeRTOS-Kernel/include/queue.h>
    +#include <kernel/freertos/FreeRTOS-Kernel/include/task.h>
     /* ========================================================================== */
     /*                           Macros & Typedefs                                */
     /* ========================================================================== */
    @@ -52,7 +56,32 @@
     /* ========================================================================== */
     /*                         Structure Declarations                             */
     /* ========================================================================== */
    +typedef struct Udma_Event
    +{
    +    Udma_EventCallback callback;
    +    Udma_EventHandle eventHandle;
    +    uint32_t eventType;
    +    void *appData;
    +} Udma_Event;
    +#define EVENT_Q_LEN 256
    +typedef struct Udma_EventPollObject_t
    +{
    +	 /*
    +     * Handle to Qeueue
    +     */
    +    QueueHandle_t dmaPollQ;
    +
     
    +    /*
    +     * Handle to input task that sends polls the link status
    +     */
    +    TaskP_Object dmaPollTaskObj;
    +    uint8_t dmaPollTaskStack[1024];
    +	char dmaPollTaskName[64];
    +	Udma_EventObjectInt *handle;
    +	Udma_Event eventQ[EVENT_Q_LEN];
    +	StaticQueue_t xStaticQueue;
    +} Udma_EventPollObject;
     /* None */
     
     /* ========================================================================== */
    @@ -80,7 +109,7 @@ static void Udma_eventResetSteering(Udma_DrvHandleInt drvHandle,
     /* ========================================================================== */
     /*                            Global Variables                                */
     /* ========================================================================== */
    -
    +extern  SemaphoreP_Object dmaPollMutex;
     /* None */
     
     /* ========================================================================== */
    @@ -424,13 +453,33 @@ void UdmaEventPrms_init(Udma_EventPrms *eventPrms)
     
         return;
     }
    -
    +uint32_t dmaisr_max = 0;
    +uint32_t * pdmaisr_max= &dmaisr_max;
    +uint32_t CycleCounterP_getCount32(void);
    +void dmaPoll(void *args)
    +{
    +	Udma_EventPollObject* obj = (Udma_EventPollObject*)args;
    +    Udma_EventHandleInt eventHandle = obj->handle;
    +	while (1)
    +	{
    +		BaseType_t stat;
    +		Udma_Event event;
    +		stat = xQueueReceive(obj->dmaPollQ, &event, SystemP_WAIT_FOREVER  );
    +        SemaphoreP_pend(&dmaPollMutex,SystemP_WAIT_FOREVER);
    +		if ( pdPASS  == stat && event.callback)
    +		{
    +			event.callback(event.eventHandle, event.eventType, event.appData);
    +		}
    +		SemaphoreP_post(&dmaPollMutex);
    +	}
    +}
     static void Udma_eventIsrFxn(void *args)
     {
         uint32_t            vintrBitNum;
         uint32_t            vintrNum;
         uint32_t            teardownStatus;
    -    Udma_EventHandleInt eventHandle = (Udma_EventHandleInt) args;
    +	Udma_EventPollObject* obj = (Udma_EventPollObject*)args;
    +    Udma_EventHandleInt eventHandle = obj->handle;
         Udma_DrvHandleInt   drvHandle;
         Udma_EventPrms     *eventPrms;
         Udma_RingHandleInt  ringHandle;
    @@ -440,6 +489,8 @@ static void Udma_eventIsrFxn(void *args)
         drvHandle = eventHandle->drvHandle;
         vintrNum = eventHandle->vintrNum;
         DebugP_assert(vintrNum != UDMA_EVENT_INVALID);
    +		uint32_t start, end, dif;
    +		start = CycleCounterP_getCount32();
         /* Loop through all the shared events. In case of exclusive events,
          * the next event is NULL_PTR and the logic remains same and the while breaks */
         while(eventHandle != NULL_PTR)
    @@ -486,8 +537,26 @@ static void Udma_eventIsrFxn(void *args)
                     {
                         if((Udma_EventCallback) NULL_PTR != eventPrms->eventCb)
                         {
    -                        eventPrms->eventCb(
    -                            eventHandle, eventPrms->eventType, eventPrms->appData);
    +							BaseType_t xHigherPriorityTaskWoken = pdFALSE;
    +							BaseType_t stat;
    +
    +    /* We have not woken a task at the start of the ISR. */
    +							Udma_Event event;
    +							event.callback = eventPrms->eventCb;
    +							event.eventHandle = eventHandle;
    +							event.eventType = eventPrms->eventType;
    +							event.appData = eventPrms->appData;
    +							stat = xQueueSendFromISR( obj->dmaPollQ, &event, &xHigherPriorityTaskWoken );
    +							/* Now the buffer is empty we can switch context if necessary. */
    +							if( pdPASS == stat && xHigherPriorityTaskWoken )
    +							{
    +								/* Actual macro used here is port specific. */
    +								portYIELD_FROM_ISR (xHigherPriorityTaskWoken);
    +							}
    +/*							
    +							eventPrms->eventCb(
    +								eventHandle, eventPrms->eventType, eventPrms->appData);
    +								*/
                         }
                     }
                 }
    @@ -496,6 +565,12 @@ static void Udma_eventIsrFxn(void *args)
             /* Move to next shared event */
             eventHandle = eventHandle->nextEvent;
         }
    +		end = CycleCounterP_getCount32();
    +		dif = end - start;
    +		if (dif > dmaisr_max)
    +		{
    +			dmaisr_max = dif;
    +		}
     
         return;
     }
    @@ -748,7 +823,7 @@ static int32_t Udma_eventAllocResource(Udma_DrvHandleInt drvHandle,
         if(UDMA_SOK == retVal)
         {
             /* Do atomic link list update as the same is used in ISR */
    -        cookie = HwiP_disable();
    +        SemaphoreP_pend(&dmaPollMutex,SystemP_WAIT_FOREVER);
     
             /* Link shared events to master event */
             eventHandle->prevEvent = (Udma_EventHandleInt) NULL_PTR;
    @@ -766,7 +841,7 @@ static int32_t Udma_eventAllocResource(Udma_DrvHandleInt drvHandle,
                 eventHandle->prevEvent = lastEvent;
                 lastEvent->nextEvent   = eventHandle;
             }
    -        HwiP_restore(cookie);
    +        SemaphoreP_post(&dmaPollMutex);
         }
     
         if(UDMA_SOK == retVal)
    @@ -813,7 +888,7 @@ static void Udma_eventFreeResource(Udma_DrvHandleInt drvHandle,
         uintptr_t   cookie;
     
         /* Do atomic link list update as the same is used in ISR */
    -    cookie = HwiP_disable();
    +    SemaphoreP_pend(&dmaPollMutex,SystemP_WAIT_FOREVER);
     
         /*
          * Remove this event node - link previous to next
    @@ -831,7 +906,7 @@ static void Udma_eventFreeResource(Udma_DrvHandleInt drvHandle,
             eventHandle->nextEvent->prevEvent = eventHandle->prevEvent;
         }
     
    -    HwiP_restore(cookie);
    +    SemaphoreP_post(&dmaPollMutex);
     
         if(NULL_PTR != eventHandle->hwiHandle)
         {
    @@ -865,7 +940,8 @@ static void Udma_eventFreeResource(Udma_DrvHandleInt drvHandle,
     
         return;
     }
    -
    +Udma_EventPollObject pollpool[20];
    +int poolidx=0;
     static int32_t Udma_eventConfig(Udma_DrvHandleInt drvHandle,
                                     Udma_EventHandleInt eventHandle)
     {
    @@ -1058,6 +1134,37 @@ static int32_t Udma_eventConfig(Udma_DrvHandleInt drvHandle,
             }
         }
     
    +	{
    +		int32_t             retVal = SystemP_SUCCESS;
    +		TaskP_Params params;
    +		poolidx++;
    +		pollpool[poolidx].handle = eventHandle;
    +	/*Initialize semaphore to call synchronize the poll function with a timer*/
    +		pollpool[poolidx].dmaPollQ = xQueueCreateStatic(EVENT_Q_LEN,sizeof(Udma_Event),
    +                             (uint8_t*)&pollpool[poolidx].eventQ, &pollpool[poolidx].xStaticQueue );
    +							 
    +        if(NULL == pollpool[poolidx].dmaPollQ)
    +        {
    +            DebugP_logError("[UDMA] Event Q create failed!!!\r\n");
    +        }
    +		{
    +			/* Initialize the poll function as a thread */
    +			TaskP_Params_init(&params);
    +			sprintf(pollpool[poolidx].dmaPollTaskName,"DMA_poll_irq_%d",(int)eventHandle->coreIntrNum);
    +			params.name = pollpool[poolidx].dmaPollTaskName;
    +			params.priority       = 9; //todo ????
    +			params.stack          = pollpool[poolidx].dmaPollTaskStack;
    +			params.stackSize      = sizeof(pollpool[poolidx].dmaPollTaskStack);
    +			params.args           = &pollpool[poolidx];
    +			params.taskMain       = &dmaPoll;
    +
    +			retVal = TaskP_construct(&pollpool[poolidx].dmaPollTaskObj, &params);
    +			if(SystemP_SUCCESS != retVal)
    +			{
    +				DebugP_logError("[UDMA] Poll task create failed!!!\r\n");
    +			}
    +		}			 
    +	}
         if(UDMA_SOK == retVal)
         {
             /* Register after programming IA, so that when spurious interrupts
    @@ -1070,7 +1177,7 @@ static int32_t Udma_eventConfig(Udma_DrvHandleInt drvHandle,
                 HwiP_Params_init(&hwiPrms);
                 hwiPrms.intNum = coreIntrNum;
                 hwiPrms.callback = &Udma_eventIsrFxn;
    -            hwiPrms.args = eventHandle;
    +            hwiPrms.args = &pollpool[poolidx];
                 hwiPrms.priority = eventHandle->eventPrms.intrPriority;
                 retVal = HwiP_construct(&eventHandle->hwiObject, &hwiPrms);
                 if(SystemP_SUCCESS != retVal)
    diff --git a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_ring_common.c b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_ring_common.c
    index bcd8c9bf49..e271eae966 100644
    --- a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_ring_common.c
    +++ b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_ring_common.c
    @@ -42,6 +42,7 @@
     /* ========================================================================== */
     
     #include <drivers/udma/udma_priv.h>
    +#include <kernel/dpl/SemaphoreP.h>
     
     /* ========================================================================== */
     /*                           Macros & Typedefs                                */
    @@ -69,7 +70,7 @@ static inline void Udma_ringAssertFnPointers(Udma_DrvHandleInt drvHandle);
     /* ========================================================================== */
     
     /* None */
    -
    +extern SemaphoreP_Object dmaPollMutex;
     /* ========================================================================== */
     /*                          Function Definitions                              */
     /* ========================================================================== */
    @@ -350,6 +351,11 @@ int32_t Udma_ringDetach(Udma_RingHandle ringHandle)
     
         return (retVal);
     }
    +uint32_t dmaenq_max = 0;
    +uint32_t * pdmaenq_max= &dmaenq_max;
    +uint32_t dmadeq_max = 0;
    +uint32_t * pdmadeq_max= &dmadeq_max;
    +uint32_t CycleCounterP_getCount32(void);
     
     int32_t Udma_ringQueueRaw(Udma_RingHandle ringHandle, uint64_t phyDescMem)
     {
    @@ -377,11 +383,17 @@ int32_t Udma_ringQueueRaw(Udma_RingHandle ringHandle, uint64_t phyDescMem)
     
         if(UDMA_SOK == retVal)
         {
    -        cookie = HwiP_disable();
    -
    +		SemaphoreP_pend(&dmaPollMutex,SystemP_WAIT_FOREVER);
    +		uint32_t start, end, dif;
    +		start = CycleCounterP_getCount32();
             retVal = drvHandle->ringQueueRaw(drvHandle, ringHandleInt, phyDescMem);
    -
    -        HwiP_restore(cookie);
    +		end = CycleCounterP_getCount32();
    +		dif = end - start;
    +		if (dif > dmaenq_max)
    +		{
    +			dmaenq_max = dif;
    +		}
    +		SemaphoreP_post(&dmaPollMutex);
         }
     
         return (retVal);
    @@ -413,11 +425,17 @@ int32_t Udma_ringDequeueRaw(Udma_RingHandle ringHandle, uint64_t *phyDescMem)
     
         if(UDMA_SOK == retVal)
         {
    -        cookie = HwiP_disable();
    -
    +		uint32_t start, end, dif;
    +		SemaphoreP_pend(&dmaPollMutex,SystemP_WAIT_FOREVER);
    + 		start = CycleCounterP_getCount32();
             retVal = drvHandle->ringDequeueRaw(drvHandle, ringHandleInt, phyDescMem);
    -
    -        HwiP_restore(cookie);
    + 		end = CycleCounterP_getCount32();
    +		dif = end - start;
    +		if (dif > dmadeq_max)
    +		{
    +			dmadeq_max = dif;
    +		}
    +        SemaphoreP_post(&dmaPollMutex);
         }
     
         return (retVal);
    diff --git a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_utils.c b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_utils.c
    index 5c988d3bcc..4ec3284527 100644
    --- a/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_utils.c
    +++ b/ind_comms_sdk_am243x_09_01_00_03/mcu_plus_sdk/source/drivers/udma/udma_utils.c
    @@ -247,7 +247,13 @@ uint64_t Udma_defaultVirtToPhyFxn(const void *virtAddr,
                                       uint32_t chNum,
                                       void *appData)
     {
    -    return ((uint64_t) virtAddr);
    +#if defined (__aarch64__)
    +    uint64_t temp = virtAddr;
    +#else
    +    /* R5 is 32-bit machine, need to truncate to avoid void * typecast error */
    +    uint32_t temp = (uint32_t) virtAddr;
    +#endif
    +    return ((uint64_t) temp);
     }
     
     void *Udma_defaultPhyToVirtFxn(uint64_t phyAddr,
    -- 
    2.27.0.windows.1
    
    

  • Hi Rasty,

    An update:

    Experiment_1:

    4 Tasks running (Empty_P2, Ethernet_P2, Main_P2, Jitter_P1) + 125-us-ISR

    Time duration: 10 minutes

    Max Jitter calculation w,r,t, 125us as below:

    Experiment_2:

    3 Tasks running (Empty_P2, Ethernet_P2, Main_P2, Jitter_P1) + 125-us-ISR

    Time duration: 10 minutes

    Max Jitter calculation w,r,t, 125us as below:

    Next Step:

    1. Running this test for long duration and get the jitter values.
    2. Port and run this example with GCC

    Let me know if you have any inputs here ?

    Question:

    1. What is your plan for ethernet Tx traffic in packets per second
    2. What is your plan for ethernet Rx traffic in packets per second
    3. What is tolerable packet latency

    Based on your inputs, we will tune the example settings.

    Regards

    Ashwani

  • Hi Ashwani

    No requirerements to ethernet traffic , best efforts.

    Expected high priority task jitter withing 5 uSec under maximum Ethernet load. TCP/IP communication maybe slow, but connection drops are not allowed.

    Best regards

    Rasty 

  • Hi Rasty,

    An update: Running this test for long duration and get the jitter values.

    Experiment_1:

    4 Tasks running (Empty_P2, Ethernet_P2, Main_P2, Jitter_P1) + 125-us-ISR

    Time duration: 60 minutes

    Max Jitter calculation w,r,t, 125us as below:

    Experiment_2:

    3 Tasks running (Empty_P2, Ethernet_P2, Main_P2, Jitter_P1) + 125-us-ISR

    Time duration: 10 minutes

    Max Jitter calculation w,r,t, 125us as below:

    Next Step:

    1. Re-run above experiments with Ethernet cable connected.
    2. Port and run this example with GCC and re-run above experiments to see the effect of GCC
    3. Include your DMA patch and re-run above experiments.

    Let me know if you have any inputs here ?

    These results are without Ethernet cable connected.

    +/- 80 uSec.

    This jitter you are seeing with Ethernet cable connected ?

    I  have not seen below function hit without ethernet cable connected.

    void EnetUdma_txCqIsr(Udma_EventHandle hUdmaEvt,
    uint32_t eventType,
    void *appData)
    {

    Regards

    Ashwani

  • Please run with Ethernet cable and some TCP/IP traffic.

    Test without Ethernet cable plugged makes not sense.

    My input is that 25 uSec jitter is not acceptable and need to be solved.

  • Please run with Ethernet cable and some TCP/IP traffic.

    Thanks Rasty for confirmation.

    Re-run above experiments with Ethernet cable connected.

    We are seeing same 15-25us jitter in steady state (1-2 outlier as well).

    Next Steps:

    1. Port and run this example with GCC and re-run above experiments to see the effect of GCC
    2. Include your DMA patch and re-run above experiments.

    Regards

    Ashwani

  • Hi Rasty,

    1. We moved packet processing into a function which is called from the task having ethernet activity.
    2. void txpacket_processing(void) is added in "C:\ti\mcu_plus_sdk_am64x_09_00_00_35\source\networking\enet\core\src\dma\udma> .\enet_udma_priv.c "
    3. In app_main.c updated  int ethernet_main(void *args)
    Please find patch as below
    With above changes we have seen improvement in task jitter.
    Below showed screenshot of timestamp measured in jitter_task().
    Jitter will be (vaue - 125us)
    for example 145 - 125

    Can you make changes on your setup and provide updated results?

    Regards

    Ashwani

  • Hi Ashwani

    With my patch I archive much lower jitter - 5-7 uSecs

    Jitter of 20 uSec is not what we're looking for.

    I see that you only handle TX path with flag and polling, what is about RX?

    rasty

  • what is about RX?

    We have added same function at Rx side as well with this patch.

    /cfs-file/__key/communityserver-discussions-components-files/81/enet_5F00_udma_5F00_priv_5F00_tx_5F00_rx.patch

    Driver build is successful.
    C:\ti\mcu_plus_sdk_am64x_09_00_00_35\source\networking\enet> gmake -s -f .\makefile.cpsw.am64x.r5f.ti-arm-clang PROFILE=debug

    We are seeing task jitter ~15us.

    Below changes on application side:

    Note:


    Global interrupt is not disabled in our setup.

    Regards

    Ashwani