Fast GPIO output with TIRTOS

Martin H.

Expert 2315 points

Other Parts Discussed in Thread: SYSBIOS

Equipment:

ICEv2 board,

CCSv6 vs 6.0.1.00040,

TI XDS100v2 USB Emulator,

Windows7,

am335x_sysbios_ind_sdk_1.1.0.6

NDK 2.24.1.18

NDK's NSP 1.10.2.09 not activated

SYSBIOS 6.40.3.39

Compiler TI v5.1.10

Project: 'standalone Ethernet switch' derived from the EthernetIp example of the SDK,

PRUSS for Ethernet access

Hi,

some time ago I had BeagleBoneBlack project that was able to change the level of an GPIO output every 40ns.

The condition was to have these 2 statements right at the beginning of main():

MMUConfigAndEnable();

CacheEnable(0x03); // Instruction, Data and Unified Cache at all levels

Now with TIRTOS it is only possible to change it every 215ns, which is too slow for this project.

I did not use the statements above and try to avoid Starterware functions (double definition of MMUInit() for instance).

Instead MMUInit(applMmuEntries) is called, which came with the example code. In the .cfg file “Add the Cache-Modul” is checked ( Enable Cache and Enable L2-cache ways at startup). Surprisingly there is no “Cache.enableCache = true;” in the .cfg script.

What I would need is a hint how to speed up that GPIO3 output again. The processor frequency is now at 550MHz. Is there a way to raise it? Or am I missing something principal?

All comments are very much appreciated.

Thank you.

Regards,

Martin H.

over 9 years ago

0 Martin H. over 9 years ago

Expert 2315 points

Increasing the cpu frequency from 550MHz to 800MHz did not make any difference.

Can I be sure that cache is enabled if it is checked in the .cfg file as described above?

Regards,

Martin H.

0 Martin H. over 9 years ago in reply to Martin H.

Expert 2315 points

I added some example code to enable L1 data + L2 caching for the address range 0x80000000-0x90000000 in the *.cfg file, but it makes no difference. Still 215ns. Can anybody make a suggestion?

Martin H.

0 Martin H. over 9 years ago in reply to Martin H.

Expert 2315 points

Another strange thing is this:
If I comment out MMUInit(applMmuEntries), the the function UTILsGetBoardType() returns 2, which stands for AM335X_BOARD_TYPE_ICE. But it should be 3, i.e. AM335X_BOARD_TYPE_ICE_V2.

0 Frank Walzer over 9 years ago in reply to Martin H.

TI__Mastermind 43121 points

Martin,

Martin H. said:
If I comment out MMUInit(applMmuEntries), the the function UTILsGetBoardType() returns 2, which stands for AM335X_BOARD_TYPE_ICE. But it should be 3, i.e. AM335X_BOARD_TYPE_ICE_V2.

Yes, this is mostly due to this code in MMUInit():
   Mmu_enable();
    UTILsDetectBoardType();
    return 0;

Now I think future versions will remove the DetectBoardType() and you need to call it then elsewhere. At least once before you use the get function. You may change this now if you like...

Regards,

0 Martin H. over 9 years ago in reply to Frank Walzer

Expert 2315 points

Hi Frank,

thank you for your response. So I will change it and further ignore it.

I am still working on the issue of fastening GPIO output. My last step was to set GPIO_CTRL of GPIO3 to 0, so that

the functional clock is interface clock, Module is enabled, clocks are not gated:

HWREG(SOC_GPIO_3_REGS + GPIO_CTRL) = 0

It did not help either. I called that statement in main(). Could NDK have overwritten the settings at a later point of time?

Does anybody know?

Martin H.

0 Martin H. over 9 years ago in reply to Martin H.

Expert 2315 points

Hi,

now I thought that putting

MMUConfigAndEnable();

CacheEnable(0x03);

at the beginning of main() in the TIRTOS project would provide the results I had achieved with the pure Starterware project and BBB (40ns delay between 2 GPIO outputs).

But not even MMUInit(), which is invoked in MMUConfigEnable(), did return. The reason probably is that pageTable was set to a much higher address in the TIRTOS project (0x83F94000 instead of 0x80008000 in the Starterware project).

The collapse happens after this loop (in MMUInit() has finished:

/* Set the master page table with fault entries */

for(idx = MMU_PAGETABLE_NUM_ENTRY; idx !=0; idx--) {

*masterPt++ = MMU_PAGETABLE_ENTRY_FAULT;

}

This is the message I get: No source available for "0x8004400c". Most likely a memory problem.

Next I removed the function calls above and added

Cache_enable(Cache_enableCache); // Enable L1 and L2 data and program caches

after MMUInit(applMmuEntries) to make sure that caches are enabled. But that did not speed-up GPIO output either. I hardly dare to ask: Does anybody have an idea what might be done?

Regards,

Martin H.

0 Martin H. over 9 years ago in reply to Martin H.

Expert 2315 points

It seems that my monologue is leading nowhere ;-(
Martin H.

0 Ashish Kapania over 9 years ago in reply to Martin H.

TI__Mastermind 21645 points

Hi Martin,

Its difficult to comment on what exactly is causing the performance bottleneck but based on your posts, I am guessing that there is some bad interaction happening between SYS/BIOS and Starterware MMU/Cache code that is preventing the code/data from being cached. Is your application calling any Starterware Cache/MMU APIs ? If so, can you replace all of them with SYS/BIOS MMU/Cache APIs ? You can refer the SYS/BIOS MMU and Cache cdoc for a list of supported APIs.

One trick you can use to get good performance is to load the app's code/data into the L2 cache and lock it. Since SYS/BIOS apps are not too big, you may be able to fit the entire app in L2 cache. The A8 cache module has a Cache_lock/unlock API that allows you to load and lock code/data into cache ways. The map file can be referenced to determine the address ranges of code/data memory to be loaded into the cache.

Best,

Ashish

0 PratheeshGangadhar over 9 years ago in reply to Martin H.

TI__Mastermind 42560 points

Hi,

MMUInit(applMmuEntries);

Can you crosscheck that GPIO3 address space is configured as device type (bufferable flag set) in your applMmuEntries ? This has to be default for peripheral address space. I have seen ~210ns when its strongly ordered (not cached and not buffered) as ARM then waits for write to complete before the next write. When its buffered its 60-80ns IIRC as ARM pipeline can schedule next instruction before the previous instruction is completed

Processors

Processors forum

Fast GPIO output with TIRTOS