GPIO outputs are not continuous

Martin H.

Hi,
the problem I am facing is to do 1 million GPIO outputs periodically (one output every 40ns).
Doing this is ok for 10,000 outputs per period: There is a continuous pulse sequence.

But when the number is raised to 50,000 outputs, there are pulses missing in the sequence like this (yellow signal):

The problem has been discussed before, when MMU and Caching had not yet been enabled.
I am aware that there are some BUS systems between the core and the output pins and that the signal slightly jitters. But that
doesn't explain to me why it is ok for 10000 outputs but not for 50000 ore more. It would be nice to find a workaround.

This is the programme, main() is relatively short:

int main(void) {

MUX_EVM(); // pin-multiplexing

MMUConfigAndEnable(); // these 2 functions have been derived
CacheEnable(0x03); // from the uartEdma_Cache.c example from Starterware

InitGpio2(); // 4 pins as output
InitGpio3();

InitSanst(); // fills InstrTable[] with asm instructions
// (read double word from memory and output to GPIO2),
// fills ValTable[] with unsigned integers that shall be output

while(1) {

Delay(10000000);
GPIOPinWrite(SOC_GPIO_3_REGS, 4, GPIO_PIN_HIGH); // pos. going edge to trigger a scope

Ping((unsigned int) ValTable, GPIO2 + GPIO_DATAOUT, (unsigned int) InstrTable);

GPIOPinWrite(SOC_GPIO_3_REGS, 4, GPIO_PIN_LOW); // clear trigger pulse

}

Ping() is the function that reads the unsigned integers from ValTable[] and writes them to GPIO2.
That is pure assembler code.

At first I thought that the data in ValTable[] might have been corrupted. But that is not the case.

The next idea was that something might be wrong with MMUConfigAndEnable().
The board has 512MB of DDR SDRAM at address 0x8000 0000 and with

#define START_ADDR_DDR (0x80000000)
#define NUM_SECTIONS_DDR (512)

I would thing (but I am not sure) that the memory is set up correctly in MMUConfigAndEnable():

REGION regionDdr = {
MMU_PGTYPE_SECTION, START_ADDR_DDR, NUM_SECTIONS_DDR,
MMU_MEMTYPE_NORMAL_NON_SHAREABLE(MMU_CACHE_WT_NOWA,MMU_CACHE_WB_WA),
MMU_REGION_NON_SECURE, MMU_AP_PRV_RW_USR_RW,
(unsigned int*)pageTable
};

Does anybody know why that pulse sequence is not steady any longer?
Any hint is welcome!
Thank you!

Regards,
Martin H.

over 8 years ago

0 Martin H. over 8 years ago

Expert 2315 points

Hi,

I increased the number of GPIO2 outputs up to 21,000. That seems to be the maximum number that still provides a continuous pulse sequence.
I suspect that the size of the MMU's L1 cache (32k I-CACHE, 32k D-CACHE) is the reason for that limitation. On the other hand 21,000 is not 32k and I would have expected a smaller number (4 data bytes per GPIO output --> 8k double words in L1 D-CACHE -> 8K output operations).

So if anybody has a better explanation (or even a workaround) I would be happy to hear about that.
Thank you!

Regards,
Martin H.

0 Lalindra Jayatilleke over 8 years ago in reply to Martin H.

TI__Mastermind 30365 points

Martin,
Which platform are you trying this on? and also was this noticed on other pins?
Lali

0 qxc over 8 years ago in reply to Martin H.

Genius 5820 points

I'm setting GPOs out of a timer-interrupt which is called with a frequency of about 2 MHz. There setting/clearing the bits works smoothly (which results in a maximum output frequency of 1 MHz).

In my application all caches are enabled, and I'm compiling with highest optimisation level (especially the last level with link time optimisation gives a huge boost) and with optimisation for speed rather than for size. My whole application is smaller than 250 kBytes which means it fits into L2 cache completely - but I don't know how much influence this has on the speed really.

so I'd suggest to check these options, larger output frequencies are definitely possible.

0 Martin H. over 8 years ago in reply to qxc

Expert 2315 points

Lali,

thank you for taking care of this.
The board is ICEv2 and I am using Starterware with Windows7 and CCS vs. 6.1.0.00104.
No, this problem (to increase the number of GPOs running Starterware) has not yet been discussed in another thread.
However, there have been similar threads regarding high speed GPO:
1. GPIO output not equally spaced
This was a test with BBB trying to increase the GPO rate. MMU and cache had not yet been used at that time.

2. Fast GPIO output with TIRTOS
An explanation how to modify MMU settings for fast GPOs using TIRTOS.

3. Ethernet send() function interferes with GPIO
In that example fast GPO (using MMU and caching) has been used in a TIRTOS project. It turned out that the the GPO output signal was not equally spaced any longer when data was sent out via ethernet periodically.
The suspicion was that interrupts might have an influence. But that could not be confirmed and a solution was not found.
A high number of GPOs was not yet important at that time.

I kept the example code as simple as possible and tested it with the phyCORE board and ICEv2 (minor code modifications regarding the selected GPO ports). The results are identical. That makes me think that the boards are ok.
I suppose this issue requires a real MMU expert?
Do you think there is a chance to solve the problem?

qxc,
thanks for replying.
Good to know what rates can be achieved with interrupts.
But an output frequency of 1MHz means a periode of 1µs. I need 40ns (or at least 70ns) per output. This (40ns) is achieved with 2 assembler instructions only. I can't imagine that a timer interrupt can do that as it requires additional instructions to enter the isr (interrupt manager).
Regarding the L2 cache, did you find a source that explains how to put parts of the code into L2 cache?
Would be interesting to know.

Thanks to both of you!
Regards,
Martin H.

0 qxc over 8 years ago in reply to Martin H.

Genius 5820 points

Hi Martin,

> But an output frequency of 1MHz means a periode of 1µs. I need 40ns (or at least 70ns) per output.

OK, seems I missed some measurement units. AFAIR what you want to do is beyond the possibilities of the normal AM335x core. It is not a problem of number of assembler instructions, but of the architecture itself: when writing to the GPOs this way, the operation is routed through different busses which all have to be synchronised - which is quite slow.

Your only chance is to use a PRU core and to write to the GPOs from there, it is much faster because PRU has direct access to GPOs and is not limited by this difficult routing.

> Regarding the L2 cache, did you find a source that explains how to put parts of the code into L2 cache?

No, It is only an assumption. Normally CPUs do some complex prediction and preloading of code to have the correct ones in cache before executing it - and I assume this prediction is clever enough to load everything when it fits into the cache completely.

qxc

0 Martin H. over 8 years ago in reply to qxc

Expert 2315 points

Hi qxd,

thanks for your response.
I was aware of the buses but I could not imagine that the number of output events has something to do with the problem.
PRUs had to be excluded as they only have 15(?) GPOs each. I need 32 simultaneous outputs.

I did some further investigation today and did NOT find a solution. But perhaps it is worth mentioning the results nevertheless:

So far I thought that pure assembler code would be the fastest.
But the following function also provides 40ns outputs (and that long InstrTable[] is not needed any longer):

   void Ping(void) {
      volatile unsigned int *pt, ui=0;
      pt = ValTable;

      for(ui=0; ui<NMAX; ui++) {
       HWREG(SOC_GPIO_3_REGS + GPIO_DATAOUT) = *pt++;
      }
   }

The surprise is that now 50,000 outputs are (nearly) stable, instead of 21,000 when using assembler instructions.
But when the number is raised to 75,000 the output sequence is not stable any longer.
This is what I took from the scope:

Fig.: Delay after 8th and 16th pulse at the beginning of the sequence

Fig. 2: Longer pulses (168ns) in the middle of the sequence

I also triggered the beginning of the output sequence and looked at the end of the sequence. There is a jitter of up to 7 to 8µs(!).
With 50,000 outputs that jitter is only 22ns.
Increasing --opt_for_speed from 1 to 5 did not solve the problem.

I also checked the CPSR register, it reports 'irq disabled' and 'system mode'. Fiq is enabled.
Invoking IntMasterFIQDisable() did not modify the fiq bit. Perhaps the debugger needs fiq.

Any other ideas from anybody are most welcome.

Thank you.

Regards,
Martin

0 qxc over 8 years ago in reply to Martin H.

Genius 5820 points

I'd recommend to ask in Sitara-forum for exakt timing when setting/clearing GPO's. But avoid to refer to StarterWare or to post any souce codes, just ask for the timing - elsewhere Biser Gatchev will move your question back to StarterWare forum and you will not get any answer!

0 Martin H. over 8 years ago in reply to qxc

Expert 2315 points

Hello qxc,
good idea. My hope is that a TI engineer can give a binding answer stating whether it is possible or not.

Meanwhile I will have a look at XMOS and see what that technology can do..

Regards,
Martin H.

Processors

Processors forum

GPIO outputs are not continuous