This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

LP-AM243: AM2434 GPIO Latency

Part Number: LP-AM243
Other Parts Discussed in Thread: AM2434

Hi expert,

    I modified gpio_led_blink demo code in order to measure the latency for setting GPIO from high to low or low to high. My code is simple.

1. manipulate register directly without using device driver.

2. put the code to TCMA  by add a function to perform toggle infinitely.

void __attribute__((section("gpio_toggle"))) toggle_gpio1_8(void)
{
while(1)
{
*GPIO1_8_SET_ADDRESS = GPIO1_8_MASK;
*GPIO1_8_CLEAR_ADDRESS = GPIO1_8_MASK;
}
}

linker command file

GROUP {
gpio_toggle: palign(8)
} > R5F_TCMA

3. set build environment to release mode and change optimize level to fast.

4. program to device and measure it.

Attachment is my project files. I use SDK 8.3

gpio1_8_toggle_am243x-lp.zip

Here are measurement result.

According to the measurement, the latency to toggle GPIO is 184ns but from datasheet.

The minimum pulse can be 3.6ns + 8ns * 0.975 FICLK = 500Mhz /4 =125MHz = 8ns.

The gap is huge, may I know is this typical value by using GPIO module to toggle GPIO?  

How do we get datasheet result?

Regards

Andre

  • Hi Andre,

    Can you share the use case for the same ? Also can you use PRU to toggle GPIO instead of R5 core ?

    Thanks and Regards,
    Aakash

  • Aakash,

          We do know PRU can have fast response to toggle GPIO. The purpose is to understand the latency from R5F writes to GPIO SET/OUT register to signal real change level.  Customer needs to understand the limitation. I don't think this is unreasonable request. 

    AM2434 R5F is running @800MHz 1.25ns and GPIO FCLK is 125MHZ 8ns. Is the 184ns  from CPU writes GPIO set register to GPIO signal change a reasonable result?  

    Regards

    Andre

  • Hi AndreTseng,

    Your toggle function is not in the TCM. Please check your map file. Can you please try to fix that and try again ?

    Best Regards,
    Aakash

  • Aaksan,

         I double verified the test code. The compiler option fast make problem. After I change optimization level to "NONE". The code is placed to TCMA now.

    However, result is almost identical to original.

    This doesn't surprise me.  This is very simple test program, it toggles gpio only. Once instructions are cached, they won't be victim out of cache. As long as cache always hit, performance should be identical to put them in TCMA.

    Any ideal why GPIO latency is so long? Any why fast compiler option will ignore code section assignment?

    Regards

    Andre 

  • Hi Andre,

    I would think it comes from the GPIO output. Can you measure the execution time of the *GPIO1_8_SET_ADDRESS = GPIO1_8_MASK using DPL function: CycleCounterP_getCount32.

    In fact the scope capture you have is sort of confirmed that the **GPIO1_8_CLEAR_ADDRESS = GPIO1_8_MASK" or "*GPIO1_8_SET_ADDRESS = GPIO1_8_MASK" took about 184ns (from high to low or from low to high).

    I heard that the PRU-ICSSG GPIO has better performance (less delay and more deterministic) than the SOC GPIO. If the GPIO performance is critical to your system, you may want consider the PRU-ICSS GPIO. 

    Best regards,

    Ming

  • Ming,

      The purpose is to understand the latency from R5F writes to GPIO SET/OUT register to signal real change level.  Using  CycleCounterP_getCount32 can only get the cycle instruction is executed.  It can't not represent GPIO latency. 

    AM2434 R5F is running @800MHz 1.25ns and GPIO FCLK is 125MHZ 8ns. Is the 184ns  from CPU writes GPIO set register to GPIO signal change a reasonable result?  

    184ns is far way from datasheet spec.  comparing to 3.6ns + 0.975*8ns, it almost 10x more than spec. C2K with 100Mhz can toggle GPIO at max 25Mhz (40ns) Please refer to spec below. MSP430FR is around 16MHz.  2.71Mhz seems can't match the speed of AM2434.

    Can you please double confirm this GPIO latency on AM243x is correct? If so, please advice  is parameter in datasheet is correct or not? how to measurement it?

    Regards

    Andre

  • Hi Andre,

    We think the extra delay 184ns-18.6ns =  165.4ns is most like the interrupt latency, because the "*GPIO1_8_SET_ADDRESS = GPIO1_8_MASK" in the GPIO_bankIsrFxn may cause another GPIO interrupt. Can you do a test for GPIO output without GPIO interrupt enabled?

    Best regards,

    Ming

  • Ming,

       This test code use while loop to test GPIO toggle. The test code only toggle GPIO and did not enable any interrupt. Please don't mix with another discussion thread.

      Base on your information 184ns is far away from spec 18.6ns. We want' to know why? and how to get such result.

    Regards

    Andre

  • Ming,

       BTW, The same test on LP-AM263x, the GPIO latency shows only ~35ns. Looks like more reasonable result since I can't find FICLK of GPIO module in TRM or datasheet. 

    May I have your help to check why GPIO latency on AM243x is so long?

    Regards

    Andre

  • Hi AndreTseng,

    We have an experiment result which measures GPIO latency. That data is around 130ns on AM243x-EVM. I suggest you to change some MPU settings in your project to see improvements.

    Disable strongly ordered config for peripheral MMR (done by MCU SDK by default for 4G peripheral space). This will allow R5F to do pipeline optimizations. This can be done via MPU configuration select Advanced config for this and use TEX as 0 and Bufferable(B) 1 Cacheble(C) 0.

    We will internally sync and come back by next week with some updates on the numbers provided by datasheet.

    Thanks and Regards,
    Aakash

  • Aakash,

        Thanks for your response. I follow your suggestion and  modified MPU setup. 


    As the result: the minimum latency become 6ns. However, the timing of GPIO become unpredictable. The maximum time is 160ns which is 10x them minimum one.

      

    I used debug mode and load code into device to check the difference between before modified MPU setting and after.  The assembly show they are identical. You can see these two files: strongly_ordered.txt and mpu_optimized.txt

    strongly ordered.txt
    00000040:   EAFFFFFF            b          #0x44
    00000044:   E3011018            movw       r1, #0x1018
    00000048:   E3401060            movt       r1, #0x60
    0000004c:   E3A00C01            mov        r0, #0x100
    00000050:   E5810000            str        r0, [r1]
    00000054:   E301101C            movw       r1, #0x101c
    00000058:   E3401060            movt       r1, #0x60
    0000005c:   E5810000            str        r0, [r1]
    00000060:   EAFFFFF7            b          #0x44
    mpu_optimized.txt
    00000040:   EAFFFFFF            b          #0x44
    00000044:   E3011018            movw       r1, #0x1018
    00000048:   E3401060            movt       r1, #0x60
    0000004c:   E3A00C01            mov        r0, #0x100
    00000050:   E5810000            str        r0, [r1]
    00000054:   E301101C            movw       r1, #0x101c
    00000058:   E3401060            movt       r1, #0x60
    0000005c:   E5810000            str        r0, [r1]
    00000060:   EAFFFFF7            b          #0x44

    The R5F pipeline impact a lot. 

    However, for I/O area, I think strongly ordered is mandatory. We can't not  optimize pipeline without strongly ordered access I/O.

    I will recommend that we denote test condition for GPIO spec and provide some caution regarding pipeline optimization impact  in datasheet.

     

    Regards

    Andre

     

  • Hi Andre,

    Here is the answer from our software development team:

    ----------

    GPIO access latency from R5F is 130ns. You might be able to improve write latency by disabling strongly ordered config for peripheral MMR (done by MCU SDK by default for 4G peripheral space). This will allow R5F to do pipeline optimizations.

    This can be done via MPU configuration select Advanced config for this and use TEX as 0 and Bufferable(B) 1 Cacheble(C) 0.

    If such low latency requirement is there, might want to checkout ICSS GPI and GPO pins, here you can poll at granularity of 4ns (@ 250 MHz) and 3ns (@ 333 MHz).

    -----------

    Best regards,

    Ming

  • Ming,

        Is it possible to provide the test conditions such as compiler options, code section location........  or test code that achieved 130ns?

    We would like to reproduce in the field. Thanks.

    Regards

    Andre

  • Hi AndreTseng,

    We have an internal project that got us 130ns. It requires some clean-up before being shared, so we will share the project with you by 16th Aug.

    Software Team and System Architect comments : "AM243x inherits a lot of latency baggage from K3 interconnect infrastructure. This explains why AM263 performance is far superior."

    We have data that proves that access from R5F core to OCRAM ~ 60ns so the performance will be worse for GPIO access.

    In your calculations you did not consider time from CPU (R5F core) to GPIO Peripheral Address. We are in discussions on how this information should be shared.

    Thanks and Regards,
    Aakash

  • Aakash,

         I will send you an other mail to describe why customer care about this discussion.  If some information is not good to publish here, we also prefer to release internally and disclose to NDA customer only.

  • Hi AndreTseng,

    As per our discussion on email, closing this thread.

    BR,
    Aakash