This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Fast GPIO bit polling in C++

Other Parts Discussed in Thread: CC3200

Dear all,


I am using a TivaC series launchPad (TM4C123G) with CSS C. I am trying to make fast GPIO port read when pin from another port is toggled (PORTF_1).

Polling a bit with

while(HWREG(GPIO_PORTF_BASE + GPIO_O_DATA + 0x08)!= 0x02)
{
;
}

is quite slow, but to make things worse it introduces a lot of jitter (which is quite the same I guess). I get 150 ns of jitter with MCU running @ 80 Mhz - 12 clocks?

If I try to poll with GPIO_PORTF_AHB_BASE - the program throws me to the (FaultISR)

//*****************************************************************************
//
// This is the code that gets called when the processor receives a fault
// interrupt.  This simply enters an infinite loop, preserving the system state
// for examination by a debugger.
//
//*****************************************************************************
static void
FaultISR(void)

If anyone knows a faster bit polling it would be appreciated.

Best regards

Primoz

  • Hello Primoz

    This is TM4C123 I suspect? If yes, then Set the register bit SYSCTL.GPIOHBCTL for Port F for the AHB bus aperture to be active. If the bit is clear then it will use the APB address space and not the AHB address space

    Regards
    Amit
  • Hi Amit,

    thank you for your post. Setting SYSCTL.GPIOHBCTL did indeed help with my while statement (gaining ~25 nanoseconds), as did with other port  reads and writes. But since I have your attention I would very much like to ask you some more questions.

    In my program I am trying to read parallel port from the ADC. With your help I can read more than 4 Ms/s, but this is only true if speed optimization is on for compiler and that really messes up with the flow (everything in my loop gets messed up). Without optimization I can get "only" up to 2 Ms/s therefore I would like to write an assembler function to optimize it up a little. So my questions:

    Is there a user manual/reference for TI TM4C123 C for non ROM functions (like Tiva C-series TM4C123x ROM user's guide) ?

    Is there something like this for assembler or can I use cortex M4 instructions from ?


    Is there a good example project for use of an assembler function within CCS C project that you know about?


    Best regards

    Primoz

  • Hi Primoz,

    By ROM function I guess you mean Tivaware functions?

    I guess you want a guide to use Direct Register programming? Well, the best source for that should be the datasheet. If you noticed there's initialization examples just before the register maps. Speaking of register maps, you should read them if you want to do direct register programming.

    If you want to program in assembly you need to consult ARM infocenter but if you need to also consult the TM4C datasheet to see the peripherals register addresses.

    How many bits does your external ADC have? I would use the DMA for the purpose you want. You could use a timer to trigger the DMA transfer. It's possible to get and load data into the GPIO with the DMA
  • Hello Primoz,

    Please describe the system, communication handshake, number of bits to be sent, etc. You can use Assembly, but it is important that when the ADC function is being executed for maximizing the transfer rate, it not be interrupted.

    Regards
    Amit

    Regards
    Amit
  • Hi Amit and Luis,

    thank you for new information.
    My system is a 10 MHz ADC. Conversion starts on clock ON, data ready approx. 70 ns after the start of the conversion, clock does not need to be 50:50. ADC is directly connected to TM4C123 (on launchPad ), PWM is used as a clock source (with fast comparator <5 ns needed for ADC). 11 bits are read - 8 MSBs on gpioB and 3 on gpioA (LSB is dropped for ease of board production). I poll pwm pin to know when the conversion starts.
    I should point out that no on the fly analysis is needed - data is simply saved to memory after every conversion. For simplicity I simply turn off the interrupts. If I understand DMA correctly it is only used to offload work from the main core, but can not really increase the speed in my case?

    As mentioned above non-optimized C will work with speeds up to 2 MHz. I toggle a pin 5 on portC to read the flow on the oscilloscope (25 ns could be gained if this is dropped, but this way I would loose any indication on what is going on). And with optimized compiler it is just very hard to know where the data reading takes place (analyzing generated assembly and signal from the oscilloscope...). And since "while" statement takes >100 ns for a single check this is also my jitter - making it hard to catch the data ready point - 70 ns after clock ON -'till next clock ON.
    Using assembler I would hopefully decrease the jitter - improving the acquisition rate.

    To think again maybe a solution would be to use a deterministic way and simply clock the ADC from the loop?
    Assembler would probably help again?
    Will 'play' some more - could be fun.

    Best regards
    Primoz
  • Hello Primoz,

    Correct my understanding

    The Clock is 10MHz (100ns). After the rising edge of clock, the data comes after 70ns and this happens for every clock edge.

    Regards
    Amit
  • Hello Amit,

    the maximum clock for the part is 10 MHz (LTC1420), and ADC is only clocked on the rising edge. Valid data is approx. 70 ns (and two full cycles) after the rising edge. 10 MHz is probably not really reachable with my current hardware - pwm signal turns very much 'sinusoidal', but true 5 MHz would be quite nice for what I need.

    Problem I have at the moment is how to synchronize PWM and my loop - if my loop takes exactly as many steps as the PWM cycle it stays in phase, but at the start phase is not well defined - duration of the 'while' statement (single test). I am gonna try with the PWM reset just before the loop, but for optimization - determinism and speed assembler is probably preferable.

    Regards

    Primoz

  • Well, I think the best would be,

    Have another PWM wave, 70nS out of phase with the one you use to clock the ADC. This new PWM will say when the valid data is ready.

    Now you either use a interrupt for that which would require a lot of code optimization due to the speed of the clock
    or,
    you use a DMA trigger that would right away get data from the GPIO you want. I think it's withing the next cycle after the PWM triggers, can you confirm ?
    You would need 2 DMA channels most likely since you use GPIOA and GPIOB, you can simply connect the PWM to 2 GPIO pins or timers (if I am not mistaken, don't quite remember the DMA for the TM4C123)
    The idea of the DMA it that moves data from 1 point to another without intervention of the MCU. You program it and it does the task really fast! And yes of course, it of loads the task from the CPU.
    You a bit pushing software limits, you need to take advantage of your PWM.
  • Hello Luis,

    DMA will not start after the edge as it has to get the transfer descriptors and then begin the operation of actual data transfer. And this assumes that the CPU is not accessing the SRAM.
    I would actually use assembly due to the timing requirement and keep the code in a packed loop.

    Regards
    Amit
  • Thanks for the info Amit. A question about DMA performance on the Tiva and the CC3200 was actually asked in the 43oh forum. It seems it isnt available in the Tiva docs.
    I guesa this one is impossible to run from the assembly
  • Hello Luis,

    "I guesa this one is impossible to run from the assembly", why?

    Regards
    Amit
  • It seems you and the poster suggest the use of assembly for this application the best choice.
  • Luis Afonso said:
    need 2 DMA channels most likely since you use GPIOA and GPIOB

    Luis - good effort in trying to assist here - but poster (in 1st post) twice mentioned GPIO_F & Amit's following post targeted GPIO_F - as well.  Unclear where your identification of 2 GPIO ports (above, your quote) originated.

    As to poster's issue - if moving to ASM programming would (or could) significantly enhance GPIO (or other key/obvious) MCU needs - would not such a (limited) "ASM Code Library" - dedicated to TM4C123 and/or TM4C129 - make sense?  

    Should ASM programming "not" yield the desired performance increase - nor justify the time/effort/expense - perhaps the (obvious) move to a higher performance MCU or addition of a (simple) programmable logic device (to perform the hi-speed banging) could (productively) join this fray...   (or - one could choose an ADC which better matches the MCU's "normal/standard" capabilities...)

  • Primoz Kusar said:
    Hi Amit and Luis,

    thank you for new information.
    My system is a 10 MHz ADC. Conversion starts on clock ON, data ready approx. 70 ns after the start of the conversion, clock does not need to be 50:50. ADC is directly connected to TM4C123 (on launchPad ), PWM is used as a clock source (with fast comparator <5 ns needed for ADC). 11 bits are read - 8 MSBs on gpioB and 3 on gpioA (LSB is dropped for ease of board production). I poll pwm pin to know when the conversion starts.
    I should point out that no on the fly analysis is needed - data is simply saved to memory after every conversion. For simplicity I simply turn off the interrupts. If I understand DMA correctly it is only used to offload work from the main core, but can not really increase the speed in my case?

    I got that from here  on the 5th post :x

  • OMG - hereby surrender cb1's "eagle-eye" status to Luis!

    Poster and Amit both focused upon "GPIO_F" - and while I did read that (quoted) writing - GPIO's "A & B" did not stick.

    Eagle Eye status earns very special benefits from heralded vendor...
  • Hello cb1,

    Our Port F was derived from the first thread when PORTF is used for a poll routine. Poster and Luis did clarify the use of GPIO's A and B.

    Using another MCU could be a solution considering the 8-bit limitation on the TM4C123 device. However extracting the additional ounce of processing via ASM (though requires detailed attention) may help the user as well w/o re-investing effort and money into another platform.

    Regards
    Amit
  • Ciao Amit,

    Amit Ashara said:
    extracting the additional ounce of processing via ASM (though requires detailed attention) may help the user as well w/o re-investing effort and money into another platform.

    Indeed it may - or it may not!   We won't know until the full investment of ASM time/effort/funds have been paid.

    Point earlier made - would not a small, focused sampling of ASM code routines - lasered upon such, "high usage areas" (such as poster's) make sense?   Along with the code samples - the comparative execution time & resource burden of "ASM vs. C" could be presented.   While not all "use cases" could be anticipated & resolved - such represents a solid beginning - and may very well save many MCU users, "lost time/effort/funds!"

  • Hello cb1

    Indeed. We have been fighting the same with SPI of newer ADC/DAC which have very high bandwidth requirement and rely on more interrupt driven mechanism to move out data. We have moved out of the comfort of TivaWare to the HWREG. May be now time to move to ASM (and a detailed appnote would be the way to go)

    However as of now (w/o losing focus on the post), we may have to help the user build the code to do the same.

    Hello Primoz,

    In lack of a ASM library of functions, we need to collaborate on the forum!

    Regards
    Amit
  • Appears that we've agreed upon the (possible) need for a beginning library of ASM and/or other MCU performance enhancement methods.

    To remain "fair & balanced" should we not note that the "Pro IDEs" often remark, "Ours are not your father's IDEs!"   Claim is that they're so good - that hand coded routines - in ASM - buy/gain you little!  

    Devil in the detail - suspect this often depends upon the depth, complexity & scope of task.

    Recall that "time to market" is often "everything" and C exists (primarily) due to the complexity & uncertainty which (too often) accompany ASM.

  • Hello cb1,

    Indeed for the toolchains supported the ASM code has to be done considering the syntax requirements.

    Regards
    Amit
  • Dear Amit and all,

    I managed to write an assembler function for data acquisition. Using assembler turns reasonably easy at the end (but it takes some time to get there). Currently I am running ADC conversion @ 4 MHz.

    But as it turns out - assembler does not necessarily solve the jitter problem. Since we are talking about ARM with  3-stage pipeline, it is not necessary deterministic any more. Surprisingly with C compiler I managed to get the same period for every repetition of while statement - it was just that the different steps within while loop were either completely out of order with optimisation on or quite slow otherwise. I got jitter using assembler function !? I don't have time to solve this - so 4 MHz is what I am using (my current hardware connections /ADC-Tiva board/ probably do not allow for more anyway)

    It is not easy to find much on the pipeline-ing since ARM people feel this is intellectual property they do not want to share, so if anyone has any knowledge - link to it - it sure would be nice to read it.

    If anyone is interested in my current solution - response to this post should be delivered to my e-mail and I can go into the details.

    LP Primoz

  • Primoz Kusar said:
    Currently I am running ADC conversion @ 4 MHz.

    Really?     Does that not (by a multiple) exceed MCU's ADC spec?

    Most have to "do something" w/that ADC data - might that (further) reduce the "sustainable" rate of any/all such ADC measures?

    [edit: when post opened today it masked "all answers" - thus your use of an external ADC was hidden - and this (newest) post does not "restate" external ADC.]

    Our small firm long has advocated for such external ADC use - when higher speed & accuracy is required.

  • Hi cb1-,

    this thread is about external ADC read through the parallel communication. If I could somehow manage to get rid of a jitter (pipelining), I could probably go up to 4.4 or maybe even slightly higher.

    Regarding the internal ADC - I believe they market it @ 1 MHz with up to 2 MHz with correct use of phase and two ADCs. But external is a bit faster and with ~100 MHz sample and hold...

    PS: somehow I succeeded to  select the wrong link in an e-mail, so I will only receive the updates from the most current post.

  • Primoz Kusar said:
    this thread is about external ADC read through the parallel communication

    Indeed you are correct, Sir - yet kindly realize that thread's title provides (none) of that (external ADC use) directly!      And - your recent writing - describing your success @ 4MHz - did not restate the use of external ADC.     I'd note that 90%+ here adopt MCU's ADC - thus leading to my "misread" of your post.     (which I later noted/corrected)

    My firm/myself are strongly supportive of your efforts - in fact we've a current project in which we employ an "array" of 10MHz, SPI ADCs - in which we "common clock & common chip select" - up to 8 ADCs - simultaneously.     (our goal was to be able to "best sync" the conversions across multiple channels)     (the vendor of that family of ADCs was either unaware - or unwilling to confirm - if our "multi-device, common clocking" was at all new/novel.     (thus far - works for us...)

    For completeness - there now exist Cortex M4 MCUs which note 5MHz ADC conversion rates - that w/out "phasing."     Currently we are comparing that MCU vs. our earlier - external ADC usage - as just described.

    Good luck w/your project - always enjoyable to discover those willing to, "explore new ground."     (yet employing some boundary - some caution)