Omap 3530: Fastest GPIO toggle speed

Raj Tiwari

Other Parts Discussed in Thread: DM3730, TMS320F28335, OMAP3530, TMDSDOCK28335, TMDSCNCD28335

Folks,

I am scoping out an application on OMAP 3530 SoC where linux code interacts with the user and DSP code acts as a multi-channel waveform generator. I will be using DSP-Link for the two sides to communicate. My questions are:

What is a fastest speed at which I can toggle a GPIO pin from DSP code?
If I want to do the above on multiple channels/pins, what limitations would I run into?

Thanks in advance for any insights.

-Raj

over 15 years ago

0 Nick Nickls over 15 years ago

Prodigy 80 points

In my tests GPIO with MPU each access to GPIO ports is about 230ns.

from C code

*(volatile unsigned long *)(OMAP34XX_GPIO5_BASE + 0x3C) = 0x400;
*(volatile unsigned long *)(OMAP34XX_GPIO5_BASE + 0x3C) = 0x800;

And from assembler code:
gpio_pulse:
ldr r0, gpio_data_out_1
ldr r1, gpio_data_out_2
ldr r2, gpio_5_out
str r0, [r2]
str r1, [r2]
str r0, [r2]
str r1, [r2]
mov pc, lr

L3 160MHz L4 is 83MHz cpu 500MHz

Inserting delays gives addition time with granularity of L4 frequency (83MHz). I am in panic what happens ? And why 230ns for one access to port ?

My test program uses u-boot, modified after initialization process.

0 Matt Pope over 15 years ago in reply to Nick Nickls

Intellectual 425 points

I have seen this as well.

Where gpio response to a set & clr can toggle at no more than ~250ns or ~4MHz

There should be a way to speed this up - but I haven't found it.

Hopefully someone out there know how to do this.

I would like to push fpga config data much faster than 4Mb rates.

I know that in the CE code the freqsel is set to 0x7: 1.75 MHz to 2.1 MHz

The full range of this setting is 0.75MHz to 21MHz - don't even know if this has any bearing.

Or what needs to be changed for a change there to have any effect. Or even if I'm looking in the correct place.

Hope someone out there really knows how this stuff works and how to affect a speed change to the gpio.

Thx,

Matt

0 Nick Nickls over 15 years ago in reply to Matt Pope

Prodigy 80 points

The best time I've got is 48ns (L4/4 mHz) by using DMA in burst mode. In any other case there is no less than ~120ns delay.

I think it is L3 and L4 effects and there is no workaround.

Now I look for the best way to fast data transfer from other external MPU with really fast peripheral.

0 Matt Pope over 15 years ago in reply to Nick Nickls

Intellectual 425 points

Will you please elaborate on how you did this via DMA - burst mode?

This may work for me with some sw rewrite.

That would provide up to a 5x improvement.

0 Matt Pope over 15 years ago in reply to Matt Pope

Intellectual 425 points

Can anyone can tell me the purpose of the EN_PERIPH_DPLL bits in the CM_CLKEN_PLL register.

This selection of 0.75MHz to 21MHz appears in several places for different dplls.

From the description given - I'm not really sure what this setting does.

Also, what is the purpose of the FCLK & ICLK enables for the gpio banks.

The only FCLK that I see referenced is for debounce (32K). Is it possible there is another functional clk that controls the update rate?

0 Nick Nickls over 15 years ago in reply to Matt Pope

Prodigy 80 points

Set DMA4_CSDPi bits 15:14 DST_BURST_EN to non zero value (page 1016 of SPRUF98F–April 2010) and use frame or block synchronization - DMA4_CCRi bit 5 FS.

DMA will write transferred data as fast as possible. Set SRC_BURST_EN to non zero value too ! Or use constant mode if possible ( DMA4_CCRi bit 16 CONST_FILL_ENABLE).

0 Matt Pope over 15 years ago in reply to Nick Nickls

Intellectual 425 points

Thanks Nick - I'll have a look at that and see what it will do.

Can anyone shed some light on this stuff too:

The purpose of the EN_PERIPH_DPLL bits in the CM_CLKEN_PLL register.

This selection of 0.75MHz to 21MHz appears in several places for different dplls.

From the description given - I'm not really sure what this setting does.

Also, what is the purpose of the FCLK & ICLK enables for the gpio banks.

The only FCLK that I see referenced is for debounce (32K). Is it possible there is another functional clk that controls the update rate?

Any help or insight on this would be very helpful to me and probably others.

Thanks again Nick and in advance for anyone's help on this issue.

Matt

0 Brad Griffis over 15 years ago in reply to Matt Pope

TI__Guru*** 125430 points

MPope said:

Can anyone shed some light on this stuff too:

The purpose of the EN_PERIPH_DPLL bits in the CM_CLKEN_PLL register.

This selection of 0.75MHz to 21MHz appears in several places for different dplls.

From the description given - I'm not really sure what this setting does.

Are you talking about PERIPH_DPLL_FREQSEL? I looked at the CM_CLKEN_PLL register and that seems closer to your description. Here's a screenshot:

You need to tell PLL the approximate speed of the input clock using that field so that it can reliably lock. I don't see any connection to this issue (well, internal clock speeds maybe, but that would be set by the multiplier not by the FREQSEL).

More generally, I don't think you're going to get blazing fast GPIO performance no matter what you do. There's a very long path to get those writes to the GPIO registers. CPU -> L1D -> L2 -> L3 interconnect -> L4 Interconnect -> GPIO. Furthermore, the L4 interconnect requires multiple L4 cycles per access and you are competing with anything else making peripheral accesses on the L4 interconnect, i.e. you might have to wait.

Can you describe in more detail what type of sequence you're trying to bit bang? How many outputs? How many inputs? How fast does it change?

One possible workaround I was thinking about would be to use a McBSP if you only need 1 output and/or input. That would give all kinds of benefits such as writing a single word that has 32 bits worth of data in it, plus the FIFO capabilities, DMA capabilities, determinism, etc.

Brad

0 Matt Pope over 15 years ago in reply to Brad Griffis

Intellectual 425 points

Thanks Brad for the response - the info is greatly appreciated.

Your response begs a couple more questions though...

First, I haven't seen any clock sources that are feeding these plls in the 0.75-21MHz range.

Where and how is one suppose to be able to determine this?

Second - from your description, I'm probably barking up the wrong tree anyway. That it's not a functional clk speed that's the issue - rather the multiple levels of interconnect that are causing the speed issue.

I guess I find it a little surprising that there aren't a few gpio or a bank that are/were capable of high speed i/o - basically could go close to code speed - ie as fast as code could be made to toggle them. But I guess that wasn't a concern for the part since it was originally targeted for the mobile market vs embedded.

What I am trying to do is configure a Spartan 6 fpga via its jtag interface. Basically using their supplied code from xapp058. This takes a xsvf config file and processes it and spits it out over jtag. The jtag interface consists of TDI, TDO, TCLK, TMS. TDI = data into fpga, TDO = data out of fpga, TCLK = clk into fpga, TMS = mode select. Basically all I had to do was point it to the file and the proper gpio pins for each signal and off it goes.

Right now it takes appox 4 sec to process this file/bit stream. The xilinx prog cable can do it in <1 second. So obviously I'd like to reduce the cfg time if possible.

The gpio that I have available to do the configuration are: 130-132,134-139,12 - this encompasses most of MMC2/McSPI3. Using anything else would require a spin of the processor board.

Currently I'm using Bank 5 bits 8-11, gpio: 136-139 -> TMS, TCK, TDI, TDO

The fpga requires 3,713,440 bits for cfg. The xsvf file is 467,992bytes.

I'm definitely open to using a McBSP or McSPI or DMA for that matter. I can honestly say I've not thought it would be possible to use a bsp or spi or any other serial protocol engine to do this configuration. But the idea is definitely interesting.

Either of these approaches are probably over my programming head at the moment - but it's always good to learn and take on a new challenge.

Thanks again for the help

Matt

0 Brad Griffis over 15 years ago in reply to Matt Pope

TI__Guru*** 125430 points

MPope said:

First, I haven't seen any clock sources that are feeding these plls in the 0.75-21MHz range.

Where and how is one suppose to be able to determine this?

This will be determined by the clock that you decide to install on your board. For example it could be 12, 13, 19.2, 24, 26 MHz, etc.

MPope said:
The gpio that I have available to do the configuration are: 130-132,134-139,12 - this encompasses most of MMC2/McSPI3. Using anything else would require a spin of the processor board.

I don't know exactly what the JTAG waveforms look like so I'm not sure if this can work or not. Do you have a good idea of when TMS transitions? If the TMS transitions occur at some well defined boundary (after n bits) then I think you might be able to make it work with the SPI.

Could you make the following hookup

mcspi3_clk (gpio_130) --> TCK
mcspi3_simo (gpio_131) ---> TDI
mcspi3_somi (gpio_132) ---> TDO
gpio_133 ---> TMS

You would then configure the McSPI clock speed to be equivalent to your desired TCK speed. Then you would write your data to the SPI transmit register. It would shift that data out on simo while simultaneously shifting data in with somi. You would then read the receive register to see what was shifted in.

The big unknown here is TMS. I don't see any way to automate control of TMS through the SPI peripheral so you would need to "manually" toggle it as a straight GPIO. If these transitions occur "between words" then this would be easy. The SPI word length is configurable to be anything from 4 bits to 32 bits. So you may be able to use that to your advantage to perhaps match the size of the IR or DR registers.

Just to set your expectations on TI support, we can certainly help you in setting up the SPI interface. However, if you have detailed questions related to the JTAG state machine for programming this FPGA we will not be able to answer those questions! So I hope the info I've given you so far is helpful, but I don't think I can get any more specific as I've reached the end of my JTAG knowledge! :)

Best regards,
Brad

0 Matt Pope over 15 years ago in reply to Brad Griffis

Intellectual 425 points

So, on the clocks, the sysclk = 26Mhz. This is clearly not inside the band specified in this register setting.

Is there somewhere this gets divided to meet this range? I'm not sure I've seen this anywhere?

Ultimately though, what effect are any of these clks going to have on the gpio toggle speed. From the sounds of it, little or none - I'm guessing.

With some cuts and jumps on the fpga board I should be able to make this happen - move the spi pins to the jtag pins on the fpga.

Some advice on setting up the SPI would be nice. I know I'm on my own on the jtag to fpga protocol. But Xilinix goes through this fairly well - so I should be able to handle that.

I will have to do some research on the TMS signal and its usage in the configuration process.

I'll post back when I have an understanding of the details - it will most likely be tomorrow or Monday before I can get to and through it.

Thanks again for the insight.

Matt

0 Brad Griffis over 15 years ago in reply to Matt Pope

TI__Guru*** 125430 points

I recommend researching TMS before you hack up your board or go too far down the path of programming SPI. Otherwise you may do a lot of work for nothing!

0 Matt Pope over 15 years ago in reply to Brad Griffis

Intellectual 425 points

Thanks for the ref on the clk stuff. I'll have a look and see if it helps make sense of things.

It's not looking to good for the spi use unless it can handle single bit operations.

I found this nice description, while I was eating lunch, on the jtag sequence.

There are a bunch of different length transfers as well as some that require tms to transition at the end.

Table 10-4: Single Device Configuration Sequence

TAP Controller Step and Description

Set and Hold # of Clocks TDI TMS TCK

1. On power-up, place a logic 1 on the TMS, and clock

the TCK five times. This ensures starting in the TLR

(Test-Logic-Reset) state. X 1 5

2. Move into the RTI state. X 0 1

3. Move into the SELECT-IR state. X 1 2

4. Enter the SHIFT-IR state. X 0 2

5. Start loading the CFG_IN instruction, LSB first: 000101 0 5

6. Load the MSB of CFG_IN instruction when exiting

SHIFT-IR, as defined in the IEEE standard. 0 1 1

7. Enter the SELECT-DR state. X 1 2

8. Enter the SHIFT-DR state. X 0 2

9. Shift in the Spartan-6 FPGA bitstream. Bitn (MSB) is

the first bit in the bitstream(1). bit1 ... bitn 0 (bits in bitstream)-1

10. Shift in the last bit of the bitstream. Bit0 (LSB) shifts

on the transition to EXIT1-DR. bit0 1 1

11. Enter UPDATE-DR state. X 1 1

12. Move into RTI state. X 1 1

13. Enter the SELECT-IR state. X 1 2

14. Move to the SHIFT-IR state X 0 2

15. Start loading the JSTART instruction. The JSTART

instruction initializes the startup sequence. 001100 0 5

16. Load the last bit of the JSTART instruction. 0 1 1

17. Move to the UPDATE-IR state. X 1 1

18. Move to the RTI state and clock the startup

sequence by applying a minimum of 16 clock cycles

to the TCK. X 0 16

19. Move to the TLR state. The device is now functional. X 1 3

Sorry some of the cols don't stay aligned when posted - hopefully you can still make sense of it.

Thx,

Matt

0 Brad Griffis over 15 years ago in reply to Matt Pope

TI__Guru*** 125430 points

So is Step 9 where you send in the whole file (or the bulk of it)? If so, that will be the most time consuming part of the operation (i.e. most data being sent) and it appears that TMS=0 for that entire part of the sequence. So what you could do is bit bang steps 1-8 and then use the super-duper SPI mode for step 9 and then bit bang the rest. What's the maximum TCK speed the FPGA can support? Basically I envision that your GPIO will be kind of slow for steps 1-8 but then you'll be able to blast in the data in step 9.

0 Matt Pope over 15 years ago in reply to Brad Griffis

Intellectual 425 points

Yes, step 9 is the bulk of the transfer.

Can the spi pins be manually done/toggled while in spi mode?

Or, will the pins mode have to be changed to gpio to do the bit twiddle?

Is it possible to change the mode and not affect the pins?

I think 99+% fast and <1% slow is just fine.

Max jtck is specified at 25Mhz.

0 Brad Griffis over 15 years ago in reply to Matt Pope

TI__Guru*** 125430 points

You would change the pin muxing at run time to switch between GPIO mode and SPI mode. Sounds like it should create a HUGE optimization!

0 Matt Pope over 15 years ago in reply to Brad Griffis

Intellectual 425 points

I thought that might be the case.

I noticed that the word length in the spi cfg reg has reserved areas for 1,2,3 bit xfer lengths.

Any chance of those working? Especially the 1 bit? That would then make this a piece of cake.

But since that would be the case, I'm sure there is no chance of that working.

Any idea of what would happen if any of those three modes are selected?

0 Brad Griffis over 15 years ago in reply to Matt Pope

TI__Guru*** 125430 points

Using a 1-bit mode would not speed anything up because then you would need to poll after every single bit to see if the word/bit had completed. It would actually become slower!

I don't understand why you say there is no chance of it working. Just use slow GPIO for everything but step 9. For step 6 you're just blasting in a huge bitstream right? If that's the case then I would anticipate using the largest word size, 32-bit.

0 Matt Pope over 15 years ago in reply to Brad Griffis

Intellectual 425 points

No it probably wouldn't speed it up - even if it were slower, that would probably be better then going from gpio to spi and back to gpio and hoping there were no glitches on the i/o pins that would cause some problem.

The part I was referring to as no chance of working was the 1,2,or 3 WL bit transfer modes. I think either way (spi only or gpio/spi/gpio) has possibility and should be a vast improvement. I would probably prefer the spi only mode just because the flow would be easier for me to program and it seems like it would have the best chance of working, in my opinion. I'm not a sw guru by the stretch of anyone's imagination.

So if the 1-3 bit modes work, then each step just has its length set and go - could be a simple function for all of the xfers but the large 1.3Mb xfer. Step 9 uses 32bits to the last 32bit word and then would be split into 1-3 bytes and a 1 bit xfer.

I think that would work well if it can be done with the 1,2 & 3 bit xfers.

There are 7 1-bit xfers, 6 2-bit xfers and 1 3-bit xfer - every thing else is more than 4-bits, 3 5-bits, 1 16-bit.

So, I guess it's just up to if the 1 to 3-bit xfers work or not to determine the path to follow. Any thoughts on the 1,2, or 3 WL transfers?

0 Brad Griffis over 15 years ago in reply to Matt Pope

TI__Guru*** 125430 points

MPope said:
So, I guess it's just up to if the 1 to 3-bit xfers work or not to determine the path to follow. Any thoughts on the 1,2, or 3 WL transfers?

I'd just use GPIO. You just need to write to the PADCONF register to switch between GPIO mode and SPI mode. No big deal.

0 Mohit Hada over 14 years ago in reply to Nick Nickls

Intellectual 745 points

hi nick,

sorry to catch on this thread so late, but can you tell me whether using DMA can I read a GPIO port of beagle board / OMAP at suh a high frequency

i.e. around 20 MHZ as per your findings.... It will be really helpful.

regards

mohit

0 Mohit Hada over 14 years ago in reply to Mohit Hada

Intellectual 745 points

I want to read 12 bit parallal ADC interface on DM3730 / OMAP 3530 / BEagle board

0 Nick Nickls over 14 years ago in reply to Mohit Hada

Prodigy 80 points

It's too fast for OMAP. Only short burst readings available on 20MHz. I have the same problem and decide to use tms320f28335 with external memory for collecting ADC data. And then transfer it via SPI to OMAP for complex processing.

0 Mohit Hada over 14 years ago in reply to Nick Nickls

Intellectual 745 points

Hi, Could you please give some details or reference how you are doing it???

thanks

mohit

0 Brian Viele over 14 years ago in reply to Mohit Hada

Prodigy 20 points

I haven't been able to achieve such performance on my OMAP3530. My access time seems to be limited to about 230ns as someone else described. I'm getting really frustrated with this as my system is spending enormous amounts of time talking to peripherals rather than actually processing. If I could get the full 83MHz L4 speed, or even 20+ mhz, I would be okay with it, but as it is, I'm only getting 4MHz through the interface.

Even with DMA enabled on SPI, the channel takes 300+ns to load the next data word into the TX register, making the bus horribly inneficient.

I have dug through U-boot trying to ensure I have optimal settings for the L3 and L4 bus clocks, and they are as follows

CORE -> L3 -> L4 = 332MHz -> 166MHz -> 83MHz

I wrote a little program that looped through a bunch of peripherals (SPI, UART, DMA, etc) and uses a timer to profile access time, and i get an average of 223ns access time to each device.

What am I missing? Is there something special that needs to be configured inside of the L3/L4 controllers to get better performance? Anyone have any thoughts on this?

Brian

0 Nick Nickls over 14 years ago in reply to Brian Viele

Prodigy 80 points

You are right ! With IVA2 i had even slower performance. I see that OMAP not suitable for fast peripheral access due to it architecture. Look for example at http://focus.ti.com/docs/toolsw/folders/print/tmdsdock28335.html and http://focus.ti.com/docs/toolsw/folders/print/tmdscncd28335.html it seems can help you. This processors can access peripheral devices without unpredictable delays and store data at external memory. Then you can transfer data to OMAP over SPI or use dual access memory devices.

0 Nilavarasan Periasamy over 14 years ago in reply to Brad Griffis

Prodigy 20 points

I have been searching for information to configure GPIO on beagleboard xm as output and input. All the information that I came across seems confusing but mostly starts from PADCONF and didn't clearly mentioned the follwing GPIO pin configurations. Can you point to me to a simple configuration steps in order to get GPIO to work as input and output accesing directly the registers. Not neccesarily beagleboard but any other equivalent that will help me to understand to configuring the GPIO's

Processors

Processors forum

Omap 3530: Fastest GPIO toggle speed