This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28P650DK: Multi-GPIO Pulse Width Measurement Using Timer-Based Sampling

Part Number: TMS320F28P650DK

Hi All,

My requirement:
When a 36-microsecond ON-time pulse is applied using a function generator, the flag should increment. For ON-times other than 36 microseconds, the flag should not increment.

When I configure only 16 pins, it works correctly. In the Expressions window, I can see that valid_count[3] is increasing as expected. However, when I configure 32 GPIO pins, I observe different values in debug_us[3] in the Expressions window.

What is the issue here?
Is it not possible to configure 32 pins, or is there any other solution to capture the signal?

 

 

 
 
  • Hi Saravanan,

    Before diving in, a few details would help refine the recommendation further:

    • What is your timer ISR frequency / sampling interval? This directly determines how much CPU time is available per interrupt cycle.
    • What is your SYSCLK frequency? This affects minimum detectable pulse width and available cycle budget.
    • What tolerance is acceptable around 36 µs? (e.g., ±1 µs, ±2 µs)
    • Are the 32 GPIOs spread across multiple GPIO port registers, or concentrated in one? Reading from multiple ports adds overhead.
    • Is CPU2 available, or is it already allocated to other tasks?

    The Root Cause: CPU Overhead Scaling

    The hardware itself can handle 32 GPIO pins — the TMS320F28P650DK supports up to 98 GPIOs in the 169-ball NMR package (and more in larger packages) [2]. The issue is not a pin count limitation.

    The problem is ISR execution time. When you poll 16 pins inside a timer interrupt, the read-process-compare loop completes fast enough that your sampling accurately captures the 36 µs pulse edges. When you double to 32 pins, the ISR takes significantly longer to execute, which causes:

    • Sampling jitter — the time between when you read pin 0 vs. pin 31 within a single ISR is no longer negligible relative to 36 µs
    • Missed or shifted edges — leading to the incorrect debug_us[3] values you're observing
    • Potential ISR overrun — if the ISR doesn't complete before the next timer tick, measurements become unreliable

    This is consistent with the known limitation that the TMS320F28P650DK has only 5 external interrupts (XINT1–XINT5) per CPU, so interrupt-driven edge capture for 32 pins isn't feasible either [1].


    Recommended Solutions

    Option 1: eCAP Signal Monitoring Unit (Best for a subset of critical pins)

    The F28P65x eCAP module includes a Signal Monitoring Unit that can measure pulse width in hardware and automatically flag out-of-range pulses [3]. You configure MUNIT_MIN and MUNIT_MAX registers to define your 36 µs acceptance window, and the hardware generates error events for non-matching pulses — no CPU polling required [3]. The eCAP uses a 32-bit timestamp counter and supports four capture event registers (CAP1–CAP4) [4].

    Limitation: The number of eCAP modules is finite, so this won't cover all 32 pins. Use it for your most timing-critical channels.

    Option 2: CLB (Configurable Logic Block) Pulse-Width Filter (Best for hardware-based qualification)

    For pins that can't use eCAP, the CLB can implement custom pulse-width filtering entirely in hardware [5]. The approach:

    1. Use a CLB counter that resets when the GPIO input is low
    2. Set a counter match event equal to your 36 µs threshold
    3. Use a CLB state machine to qualify the pulse — output goes high only when the pulse width matches
    4. Route the qualified output via the CLB Output XBAR

    This eliminates CPU overhead per pin but requires more upfront configuration effort and introduces a small signal delay [5].

    Option 3: Optimize Your Polling Approach

    If you want to keep the timer-based method:

    • Read GPIO port registers as 32-bit words instead of individual pin reads — GpioDataRegs.GPADAT.all captures 32 pins in a single read cycle
    • Increase your timer frequency to compensate for the wider sampling window
    • Split processing across CPU1 and CPU2 — each CPU has its own 5 XINTs, giving you 10 total interrupt-driven pins, with the remaining 22 polled [1]
    • Apply GPIO input qualification via GPxQSEL registers to filter noise, ensuring stable reads (qualification requires the pulse to be held for at least tw(IQSW) + tw(SP) + 1 SYSCLK cycle) [6]

    Option 4: Hybrid Architecture

    The most practical approach for 32 pins:

    Method
    Pin Count
    CPU Load
    eCAP Signal Monitoring
    4–6 critical pins
    Near zero
    CLB pulse-width filter
    4–8 pins per CLB tile
    Near zero
    Optimized port-level polling
    Remaining pins
    Moderate
    CPU2 XINT-driven capture
    Up to 5 additional pins
    Low (on CPU2)

    Conclusion

    Your 16-pin configuration works because the ISR completes within budget. At 32 pins, the software overhead corrupts your timing measurements. The fix is to offload pulse-width detection to hardware (eCAP Signal Monitoring or CLB) for as many pins as possible, and optimize the remaining polling with port-level reads. Sharing your SYSCLK frequency, timer ISR rate, and tolerance around 36 µs would help determine the exact partitioning strategy.


    1. TMS320F28P650DK: Need More Than 5 GPIO External Interrupts — TI E2E
    2. TMS320F28P65x Real-Time Microcontrollers Data Sheet
    3. TMS320F28P65x Technical Reference Manual — eCAP Signal Monitoring Unit
    4. TMS320F28P65x Data Sheet — eCAP Block Diagram
    5. TMS320F28P650DK: Pulse Width Filter — TI E2E
    6. TMS320F28P65x Data Sheet — GPIO Input Qualification

    Best Regards,

    Zackary Fleenor

  • Hi Saravanan,

    Any feedback on this thread?

    Best Regards,

    Zackary Fleenor

  • Hi Fleenor,

    Thanks for your reply.

    I followed your approach, but I am still seeing some garbage values when configuring all 32 pins. However, it works fine when using only 16 pins.

    I used the third option you suggested. Could you please help me resolve this issue?

    I am facing two issues:

    1. When I generate 50 pulses with a 36 µs ON time, I can see the rising_count reaching 50. However, the valid_count only reaches 38 instead of 50.

    2. In the code below, I configured 16 pins, and in the Expressions window, valid_count[0–15] works correctly. But when I extend the configuration to 32 pins, it does not work as expected.

  • Hi,

    I am measuring ~36µs pulses on 32 GPIOs using Timer ISR (1µs sampling) on C2000.

    Issue:
    - rising_count matches expected pulses (e.g., 50)
    - valid_count is lower than expected
    - Some pulses are not getting validated

    Configuration:
    - CPU freq: 200 MHz
    - Sampling: 1 µs (Timer1 ISR)
    - Target pulse: 36 µs ±5 µs
    - Noise reject < 28 µs
    - Max pulse width: 80 µs
    - Confirm edges: 2 samples

    Below is the critical logic:

    ```c
    #define MIN_CYCLES ((36 - 5) * 200)
    #define MAX_CYCLES ((36 + 5) * 200)
    #define MIN_WIDTH_CYCLES (28 * 200)
    #define MAX_WIDTH_CYCLES (80 * 200)

    static inline uint32_t elapsedCycles(uint32_t start, uint32_t end)
    {
    return (end - start);
    }

    static inline void processFall(uint16_t i, uint32_t fall_ts, uint32_t now)
    {
    uint32_t width = elapsedCycles(rise_time[i], fall_ts);

    if (width < MIN_WIDTH_CYCLES)
    {
    noise_count[i]++;
    return;
    }

    last_width_cycles[i] = width;

    if (width >= MIN_CYCLES && width <= MAX_CYCLES)
    {
    valid_count[i]++;
    }
    }

    __interrupt void cpuTimer1ISR(void)
    {
    uint32_t now = getTimestamp();
    uint32_t cur_b = GPIO_readPortData(GPIO_PORT_B);
    uint32_t cur_c = GPIO_readPortData(GPIO_PORT_C);

    for (uint16_t i = 0; i < 32; i++)
    {
    if (!pin_enabled[i]) continue;

    uint8_t hi = (pin_info[i].port == 0) ?
    (cur_b & pin_info[i].bit) :
    (cur_c & pin_info[i].bit);

    if (pin_state[i] == 0) // IDLE
    {
    if (hi)
    {
    if (confirm_count[i] == 0)
    rise_time[i] = now;

    confirm_count[i]++;

    if (confirm_count[i] >= 2)
    {
    pin_state[i] = 1;
    confirm_count[i] = 0;
    rising_count[i]++;
    }
    }
    else
    {
    confirm_count[i] = 0;
    }
    }
    else // HIGH
    {
    if (!hi)
    {
    if (confirm_count[i] == 0)
    cand_fall_time[i] = now;

    confirm_count[i]++;

    if (confirm_count[i] >= 2)
    {
    processFall(i, cand_fall_time[i], now);
    pin_state[i] = 0;
    confirm_count[i] = 0;
    }
    }
    else
    {
    confirm_count[i] = 0;

    if (elapsedCycles(rise_time[i], now) > MAX_WIDTH_CYCLES)
    {
    processFall(i, now, now);
    pin_state[i] = 0;
    }
    }
    }
    }
    }

  • Hi Saravanan,

    A few follow up questions:

    • What specific "garbage values" are you seeing? Are the debug_us values wildly off (e.g., 0, or hundreds of µs), or just slightly outside your ±5 µs window?
    • Which pulses are being missed? Is the valid_count shortfall (38 instead of 50) consistent across all 32 pin indices, or concentrated on specific pins (e.g., pins 16–31)?
    • Are your 32 GPIOs spread across only Port B and Port C, or do they span additional ports?

    Direct Answer: Your ISR Is Overrunning the 1 µs Budget

    The root cause is confirmed: your ISR takes longer than 1 µs to execute when processing 32 pins, causing you to miss sampling windows and corrupt pulse width measurements.

    Here's the math. At 200 MHz, you have 200 CPU cycles per 1 µs timer tick. The minimum interrupt latency alone is 14 SYSCLK cycles (70 ns) [1]. Your ISR then must:

    1. Read getTimestamp()
    2. Read two port registers (GPIO_PORT_B and GPIO_PORT_C)
    3. Loop through 32 pins with branching, array accesses, and arithmetic

    Each iteration involves multiple memory accesses to arrays (pin_enabled[], pin_info[], pin_state[], confirm_count[], rise_time[], cand_fall_time[]) plus conditional logic. Even at ~5–6 cycles per array access, 32 iterations with this logic easily exceeds 200 cycles. When the ISR doesn't complete before the next timer tick fires, you miss entire sample points, which explains why:

    • rising_count reaches 50 (edges are eventually detected, just delayed)
    • valid_count only reaches 38 (the measured width is corrupted by missed samples, pushing some measurements outside your ±5 µs window)

    This is consistent with the 16-pin case working—half the iterations stays within budget [2][3].


    Why Your Current Code Structure Fails at 32 Pins

    Your timestamp approach has a critical flaw at scale: rise_time[i] = now uses the same timestamp for all 32 pins, but the falling edge detection happens in a different ISR invocation. If the ISR overruns and you miss a tick, the elapsed cycle count becomes inaccurate by multiples of your sampling period. With ±5 µs tolerance (31–41 µs valid window), missing even 1–2 samples shifts the measured width by 1–2 µs, and missing more pushes valid pulses outside your acceptance window.


    Recommended Fix: Reduce ISR Execution Time

    Immediate optimizations (stay with Option 3):

    1. Eliminate the per-pin branch and struct lookup. Pre-compute two bitmasks at initialization:
    uint32_t portB_mask; // bits set for all enabled pins on Port B
    uint32_t portC_mask; // bits set for all enabled pins on Port C

    Then in the ISR, detect changes using XOR instead of iterating all 32 pins:

    uint32_t changed_b = (cur_b ^ prev_b) & portB_mask;
    uint32_t changed_c = (cur_c ^ prev_c) & portC_mask;
    Only process pins with actual transitions. Most ISR invocations will have 0 or 1 transitions, reducing the loop to near-zero work.
    1. Use __builtin_ctz() (count trailing zeros) to find set bits in the changed mask without iterating all 32 positions:
    while (changed_b) {
    uint16_t bit_pos = __builtin_ctz(changed_b);
    // process only this pin
    changed_b &= ~(1UL << bit_pos);
    }
    1. Store prev_b and prev_c at the end of each ISR to enable edge detection via XOR.

    2. Move arrays to RAM and ensure the ISR itself executes from RAM (not Flash) to eliminate wait states that add to latency [1].

    3. Verify with ERAD profiling — the F28P65x has an Embedded Real-time Analysis and Diagnostics module that can measure your actual ISR execution time in CPU cycles [4]. This will confirm whether your ISR fits within budget after optimization.

    If optimization alone isn't sufficient:

    • Increase timer period to 2 µs — doubles your cycle budget to 400 cycles, at the cost of ±1 µs additional measurement uncertainty (still within your ±5 µs tolerance)
    • Offload to CLA — the CLA has direct GPIO data register access [5] and can run the polling loop in parallel with the main CPU
    • Use CLB tiles (6 available on F28P65x) for hardware-based pulse qualification on a subset of pins [6]

    Summary

    Problem
    Cause
    Fix
    valid_count < rising_count
    ISR overrun corrupts width measurement
    Reduce ISR time via XOR-based edge detection
    Works at 16 pins, fails at 32
    ISR exceeds 200-cycle budget
    Process only changed pins, not all 32 every tick
    Garbage values in debug_us
    Missed samples shift timestamps
    Execute ISR from RAM; consider 2 µs period

    To help refine this recommendation further, it would be helpful to know:

    • Your actual ISR execution time (use ERAD or toggle a GPIO at ISR entry/exit and measure with a scope)
    • Whether the valid_count shortfall is uniform across all pins or concentrated on later-indexed pins (which would confirm the intra-ISR timing skew)
    • Whether CPU2 is available for workload splitting
    • Your compiler optimization level (-o2, --fp_mode=relaxed, etc.)
    • Whether the code is executing from Flash or RAM

    1. TMS320F28P65x Technical Reference Manual — Interrupt Latency
    2. TMS320F28P650DK: Need More Than 5 GPIO External Interrupts — TI E2E
    3. TMS320F28P65x Real-Time Microcontrollers Data Sheet
    4. TMS320F28P65x TRM — ERAD ISR Profiling
    5. TMS320F28P65x Data Sheet — C28x Bus Controller Peripheral Access (CLA GPIO Access)
    6. TMS320F28P650DK: Pulse Width Filter — TI E2E

    Best Regards,

    Zackary Fleenor

  • Hi Saravanan,

    I wanted to check in on the status of this thread. In my last response, I provided:

    1. Root cause analysis confirming your ISR is exceeding the 1 µs execution budget at 32 pins
    2. Specific code optimizations using XOR-based edge detection and __builtin_ctz() to process only changed pins
    3. Profiling recommendations using ERAD to measure actual ISR execution time

    To help move this forward, could you provide an update on:

    • Have you implemented the XOR-based edge detection approach? This should dramatically reduce ISR overhead by processing only pins with transitions rather than all 32 pins every tick.

    • What is your actual ISR execution time? You can measure this by:

      • Using ERAD profiling , or
      • Toggling a GPIO at ISR entry/exit and measuring with an oscilloscope
    • Is the valid_count shortfall uniform across all 32 pins, or concentrated on specific pin indices? This would confirm whether the issue is intra-ISR timing skew vs. overall ISR overrun.

    • Compiler settings: What optimization level are you using (-o0, -o2, -o3)? Is the ISR executing from Flash or RAM?

    • Is CPU2 available for workload splitting if needed?

    If you've resolved the issue using a different approach, I'd be interested to hear what worked for your application. If you're still encountering difficulties, the additional diagnostic information above will help us identify the exact bottleneck.

    Looking forward to your update.

    Best Regards,

    Zackary Fleenor