LP-AM243: AM2434 GPIO Latency

AndreTseng

Part Number: LP-AM243
Other Parts Discussed in Thread: AM2434

Hi expert,

I modified gpio_led_blink demo code in order to measure the latency for setting GPIO from high to low or low to high. My code is simple.

1. manipulate register directly without using device driver.

2. put the code to TCMA by add a function to perform toggle infinitely.

void __attribute__((section("gpio_toggle"))) toggle_gpio1_8(void)
{
while(1)
{
*GPIO1_8_SET_ADDRESS = GPIO1_8_MASK;
*GPIO1_8_CLEAR_ADDRESS = GPIO1_8_MASK;
}
}

linker command file

GROUP {
gpio_toggle: palign(8)
} > R5F_TCMA

3. set build environment to release mode and change optimize level to fast.

4. program to device and measure it.

Attachment is my project files. I use SDK 8.3

gpio1_8_toggle_am243x-lp.zip

Here are measurement result.

According to the measurement, the latency to toggle GPIO is 184ns but from datasheet.

The minimum pulse can be 3.6ns + 8ns * 0.975 FICLK = 500Mhz /4 =125MHz = 8ns.

The gap is huge, may I know is this typical value by using GPIO module to toggle GPIO?

How do we get datasheet result?

Regards

Andre

over 3 years ago

0 Aakash Kedia over 3 years ago

TI__Mastermind 26025 points

Hi Andre,

Can you share the use case for the same ? Also can you use PRU to toggle GPIO instead of R5 core ?

Thanks and Regards,
Aakash

0 AndreTseng over 3 years ago in reply to Aakash Kedia

TI__Intellectual 2505 points

Aakash,

We do know PRU can have fast response to toggle GPIO. The purpose is to understand the latency from R5F writes to GPIO SET/OUT register to signal real change level. Customer needs to understand the limitation. I don't think this is unreasonable request.

AM2434 R5F is running @800MHz 1.25ns and GPIO FCLK is 125MHZ 8ns. Is the 184ns from CPU writes GPIO set register to GPIO signal change a reasonable result?

Regards

Andre

0 Aakash Kedia over 3 years ago in reply to AndreTseng

TI__Mastermind 26025 points

Hi AndreTseng,

Your toggle function is not in the TCM. Please check your map file. Can you please try to fix that and try again ?

Best Regards,
Aakash

0 AndreTseng over 3 years ago in reply to Aakash Kedia

TI__Intellectual 2505 points

Aaksan,

I double verified the test code. The compiler option fast make problem. After I change optimization level to "NONE". The code is placed to TCMA now.

However, result is almost identical to original.

This doesn't surprise me. This is very simple test program, it toggles gpio only. Once instructions are cached, they won't be victim out of cache. As long as cache always hit, performance should be identical to put them in TCMA.

Any ideal why GPIO latency is so long? Any why fast compiler option will ignore code section assignment?

Regards

Andre

0 Ming Wei over 3 years ago in reply to AndreTseng

TI__Mastermind 49325 points

Hi Andre,

I would think it comes from the GPIO output. Can you measure the execution time of the *GPIO1_8_SET_ADDRESS = GPIO1_8_MASK using DPL function: CycleCounterP_getCount32.

In fact the scope capture you have is sort of confirmed that the **GPIO1_8_CLEAR_ADDRESS = GPIO1_8_MASK" or "*GPIO1_8_SET_ADDRESS = GPIO1_8_MASK" took about 184ns (from high to low or from low to high).

I heard that the PRU-ICSSG GPIO has better performance (less delay and more deterministic) than the SOC GPIO. If the GPIO performance is critical to your system, you may want consider the PRU-ICSS GPIO.

Best regards,

Ming

0 AndreTseng over 3 years ago in reply to Ming Wei

TI__Intellectual 2505 points

Ming,

The purpose is to understand the latency from R5F writes to GPIO SET/OUT register to signal real change level. Using CycleCounterP_getCount32 can only get the cycle instruction is executed. It can't not represent GPIO latency.

AM2434 R5F is running @800MHz 1.25ns and GPIO FCLK is 125MHZ 8ns. Is the 184ns from CPU writes GPIO set register to GPIO signal change a reasonable result?

184ns is far way from datasheet spec. comparing to 3.6ns + 0.975*8ns, it almost 10x more than spec. C2K with 100Mhz can toggle GPIO at max 25Mhz (40ns) Please refer to spec below. MSP430FR is around 16MHz. 2.71Mhz seems can't match the speed of AM2434.

Can you please double confirm this GPIO latency on AM243x is correct? If so, please advice is parameter in datasheet is correct or not? how to measurement it?

Regards

Andre

0 Ming Wei over 3 years ago in reply to AndreTseng

TI__Mastermind 49325 points

Hi Andre,

We think the extra delay 184ns-18.6ns = 165.4ns is most like the interrupt latency, because the "*GPIO1_8_SET_ADDRESS = GPIO1_8_MASK" in the GPIO_bankIsrFxn may cause another GPIO interrupt. Can you do a test for GPIO output without GPIO interrupt enabled?

Best regards,

Ming

0 AndreTseng over 3 years ago in reply to Ming Wei

TI__Intellectual 2505 points

Ming,

This test code use while loop to test GPIO toggle. The test code only toggle GPIO and did not enable any interrupt. Please don't mix with another discussion thread.

Base on your information 184ns is far away from spec 18.6ns. We want' to know why? and how to get such result.

Regards

Andre

0 AndreTseng over 3 years ago in reply to AndreTseng

TI__Intellectual 2505 points

Ming,

BTW, The same test on LP-AM263x, the GPIO latency shows only ~35ns. Looks like more reasonable result since I can't find FICLK of GPIO module in TRM or datasheet.

May I have your help to check why GPIO latency on AM243x is so long?

Regards

Andre

0 Aakash Kedia over 3 years ago in reply to AndreTseng

TI__Mastermind 26025 points

Hi AndreTseng,

We have an experiment result which measures GPIO latency. That data is around 130ns on AM243x-EVM. I suggest you to change some MPU settings in your project to see improvements.

Disable strongly ordered config for peripheral MMR (done by MCU SDK by default for 4G peripheral space). This will allow R5F to do pipeline optimizations. This can be done via MPU configuration select Advanced config for this and use TEX as 0 and Bufferable(B) 1 Cacheble(C) 0.

We will internally sync and come back by next week with some updates on the numbers provided by datasheet.

Thanks and Regards,
Aakash

0 AndreTseng over 3 years ago in reply to Aakash Kedia

TI__Intellectual 2505 points

Aakash,

Thanks for your response. I follow your suggestion and modified MPU setup.

As the result: the minimum latency become 6ns. However, the timing of GPIO become unpredictable. The maximum time is 160ns which is 10x them minimum one.

I used debug mode and load code into device to check the difference between before modified MPU setting and after. The assembly show they are identical. You can see these two files: strongly_ordered.txt and mpu_optimized.txt

strongly ordered.txt

Fullscreen

1
2
3
4
5
6
7
8
9
00000040:   EAFFFFFF            b          #0x44
00000044:   E3011018            movw       r1, #0x1018
00000048:   E3401060            movt       r1, #0x60
0000004c:   E3A00C01            mov        r0, #0x100
00000050:   E5810000            str        r0, [r1]
00000054:   E301101C            movw       r1, #0x101c
00000058:   E3401060            movt       r1, #0x60
0000005c:   E5810000            str        r0, [r1]
00000060:   EAFFFFF7            b          #0x44
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

00000040:   EAFFFFFF            b          #0x44
00000044:   E3011018            movw       r1, #0x1018
00000048:   E3401060            movt       r1, #0x60
0000004c:   E3A00C01            mov        r0, #0x100
00000050:   E5810000            str        r0, [r1]
00000054:   E301101C            movw       r1, #0x101c
00000058:   E3401060            movt       r1, #0x60
0000005c:   E5810000            str        r0, [r1]
00000060:   EAFFFFF7            b          #0x44

mpu_optimized.txt

Fullscreen

1
2
3
4
5
6
7
8
9
00000040:   EAFFFFFF            b          #0x44
00000044:   E3011018            movw       r1, #0x1018
00000048:   E3401060            movt       r1, #0x60
0000004c:   E3A00C01            mov        r0, #0x100
00000050:   E5810000            str        r0, [r1]
00000054:   E301101C            movw       r1, #0x101c
00000058:   E3401060            movt       r1, #0x60
0000005c:   E5810000            str        r0, [r1]
00000060:   EAFFFFF7            b          #0x44
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

00000040:   EAFFFFFF            b          #0x44
00000044:   E3011018            movw       r1, #0x1018
00000048:   E3401060            movt       r1, #0x60
0000004c:   E3A00C01            mov        r0, #0x100
00000050:   E5810000            str        r0, [r1]
00000054:   E301101C            movw       r1, #0x101c
00000058:   E3401060            movt       r1, #0x60
0000005c:   E5810000            str        r0, [r1]
00000060:   EAFFFFF7            b          #0x44

The R5F pipeline impact a lot.

However, for I/O area, I think strongly ordered is mandatory. We can't not optimize pipeline without strongly ordered access I/O.

I will recommend that we denote test condition for GPIO spec and provide some caution regarding pipeline optimization impact in datasheet.

Regards

Andre

0 Ming Wei over 3 years ago in reply to AndreTseng

TI__Mastermind 49325 points

Hi Andre,

Here is the answer from our software development team:

----------

GPIO access latency from R5F is 130ns. You might be able to improve write latency by disabling strongly ordered config for peripheral MMR (done by MCU SDK by default for 4G peripheral space). This will allow R5F to do pipeline optimizations.

This can be done via MPU configuration select Advanced config for this and use TEX as 0 and Bufferable(B) 1 Cacheble(C) 0.

If such low latency requirement is there, might want to checkout ICSS GPI and GPO pins, here you can poll at granularity of 4ns (@ 250 MHz) and 3ns (@ 333 MHz).

-----------

Best regards,

Ming

0 AndreTseng over 3 years ago in reply to Ming Wei

TI__Intellectual 2505 points

Ming,

Is it possible to provide the test conditions such as compiler options, code section location........ or test code that achieved 130ns?

We would like to reproduce in the field. Thanks.

Regards

Andre

0 Aakash Kedia over 3 years ago in reply to AndreTseng

TI__Mastermind 26025 points

Hi AndreTseng,

We have an internal project that got us 130ns. It requires some clean-up before being shared, so we will share the project with you by 16th Aug.

Software Team and System Architect comments : "AM243x inherits a lot of latency baggage from K3 interconnect infrastructure. This explains why AM263 performance is far superior."

We have data that proves that access from R5F core to OCRAM ~ 60ns so the performance will be worse for GPIO access.

In your calculations you did not consider time from CPU (R5F core) to GPIO Peripheral Address. We are in discussions on how this information should be shared.

Thanks and Regards,
Aakash

0 AndreTseng over 3 years ago in reply to Aakash Kedia

TI__Intellectual 2505 points

Aakash,

I will send you an other mail to describe why customer care about this discussion. If some information is not good to publish here, we also prefer to release internally and disclose to NDA customer only.

0 Aakash Kedia over 3 years ago in reply to AndreTseng

TI__Mastermind 26025 points

Hi AndreTseng,

As per our discussion on email, closing this thread.

BR,
Aakash

Arm-based microcontrollers

Arm-based microcontrollers forum

LP-AM243: AM2434 GPIO Latency