TMS320F28P650DK: Multi-GPIO Pulse Width Measurement Using Timer-Based Sampling

SARAVANAN S

Hi All,

My requirement:
When a 36-microsecond ON-time pulse is applied using a function generator, the flag should increment. For ON-times other than 36 microseconds, the flag should not increment.

When I configure only 16 pins, it works correctly. In the Expressions window, I can see that valid_count[3] is increasing as expected. However, when I configure 32 GPIO pins, I observe different values in debug_us[3] in the Expressions window.

What is the issue here?
Is it not possible to configure 32 pins, or is there any other solution to capture the signal?

apr16.c

2 months ago

0 Fleenor 2 months ago

TI__Mastermind 33633 points

Hi Saravanan,

Before diving in, a few details would help refine the recommendation further:

What is your timer ISR frequency / sampling interval? This directly determines how much CPU time is available per interrupt cycle.
What is your SYSCLK frequency? This affects minimum detectable pulse width and available cycle budget.
What tolerance is acceptable around 36 µs? (e.g., ±1 µs, ±2 µs)
Are the 32 GPIOs spread across multiple GPIO port registers, or concentrated in one? Reading from multiple ports adds overhead.
Is CPU2 available, or is it already allocated to other tasks?

The Root Cause: CPU Overhead Scaling

The hardware itself can handle 32 GPIO pins — the TMS320F28P650DK supports up to 98 GPIOs in the 169-ball NMR package (and more in larger packages) [2]. The issue is not a pin count limitation.

The problem is ISR execution time. When you poll 16 pins inside a timer interrupt, the read-process-compare loop completes fast enough that your sampling accurately captures the 36 µs pulse edges. When you double to 32 pins, the ISR takes significantly longer to execute, which causes:

Sampling jitter — the time between when you read pin 0 vs. pin 31 within a single ISR is no longer negligible relative to 36 µs
Missed or shifted edges — leading to the incorrect debug_us[3] values you're observing
Potential ISR overrun — if the ISR doesn't complete before the next timer tick, measurements become unreliable

This is consistent with the known limitation that the TMS320F28P650DK has only 5 external interrupts (XINT1–XINT5) per CPU, so interrupt-driven edge capture for 32 pins isn't feasible either [1].

Method	Pin Count	CPU Load
eCAP Signal Monitoring	4–6 critical pins	Near zero
CLB pulse-width filter	4–8 pins per CLB tile	Near zero
Optimized port-level polling	Remaining pins	Moderate
CPU2 XINT-driven capture	Up to 5 additional pins	Low (on CPU2)

Conclusion

Your 16-pin configuration works because the ISR completes within budget. At 32 pins, the software overhead corrupts your timing measurements. The fix is to offload pulse-width detection to hardware (eCAP Signal Monitoring or CLB) for as many pins as possible, and optimize the remaining polling with port-level reads. Sharing your SYSCLK frequency, timer ISR rate, and tolerance around 36 µs would help determine the exact partitioning strategy.

Best Regards,

Zackary Fleenor

0 Fleenor 1 month ago in reply to Fleenor

TI__Mastermind 33633 points

Hi Saravanan,

Any feedback on this thread?

Best Regards,

Zackary Fleenor

0 SARAVANAN S 1 month ago in reply to Fleenor

Prodigy 110 points

Hi Fleenor,

Thanks for your reply.

I followed your approach, but I am still seeing some garbage values when configuring all 32 pins. However, it works fine when using only 16 pins.

I used the third option you suggested. Could you please help me resolve this issue?

I am facing two issues:

When I generate 50 pulses with a 36 µs ON time, I can see the rising_count reaching 50. However, the valid_count only reaches 38 instead of 50.
In the code below, I configured 16 pins, and in the Expressions window, valid_count[0–15] works correctly. But when I extend the configuration to 32 pins, it does not work as expected.

0 SARAVANAN S 1 month ago in reply to Fleenor

Prodigy 110 points

Hi,

I am measuring ~36µs pulses on 32 GPIOs using Timer ISR (1µs sampling) on C2000.

Issue:
- rising_count matches expected pulses (e.g., 50)
- valid_count is lower than expected
- Some pulses are not getting validated

Configuration:
- CPU freq: 200 MHz
- Sampling: 1 µs (Timer1 ISR)
- Target pulse: 36 µs ±5 µs
- Noise reject < 28 µs
- Max pulse width: 80 µs
- Confirm edges: 2 samples

Below is the critical logic:

```c
#define MIN_CYCLES ((36 - 5) * 200)
#define MAX_CYCLES ((36 + 5) * 200)
#define MIN_WIDTH_CYCLES (28 * 200)
#define MAX_WIDTH_CYCLES (80 * 200)

static inline uint32_t elapsedCycles(uint32_t start, uint32_t end)
{
return (end - start);
}

static inline void processFall(uint16_t i, uint32_t fall_ts, uint32_t now)
{
uint32_t width = elapsedCycles(rise_time[i], fall_ts);

if (width < MIN_WIDTH_CYCLES)
{
noise_count[i]++;
return;
}

last_width_cycles[i] = width;

if (width >= MIN_CYCLES && width <= MAX_CYCLES)
{
valid_count[i]++;
}
}

__interrupt void cpuTimer1ISR(void)
{
uint32_t now = getTimestamp();
uint32_t cur_b = GPIO_readPortData(GPIO_PORT_B);
uint32_t cur_c = GPIO_readPortData(GPIO_PORT_C);

for (uint16_t i = 0; i < 32; i++)
{
if (!pin_enabled[i]) continue;

uint8_t hi = (pin_info[i].port == 0) ?
(cur_b & pin_info[i].bit) :
(cur_c & pin_info[i].bit);

if (pin_state[i] == 0) // IDLE
{
if (hi)
{
if (confirm_count[i] == 0)
rise_time[i] = now;

confirm_count[i]++;

if (confirm_count[i] >= 2)
{
pin_state[i] = 1;
confirm_count[i] = 0;
rising_count[i]++;
}
}
else
{
confirm_count[i] = 0;
}
}
else // HIGH
{
if (!hi)
{
if (confirm_count[i] == 0)
cand_fall_time[i] = now;

confirm_count[i]++;

if (confirm_count[i] >= 2)
{
processFall(i, cand_fall_time[i], now);
pin_state[i] = 0;
confirm_count[i] = 0;
}
}
else
{
confirm_count[i] = 0;

if (elapsedCycles(rise_time[i], now) > MAX_WIDTH_CYCLES)
{
processFall(i, now, now);
pin_state[i] = 0;
}
}
}
}
}

0 Fleenor 1 month ago in reply to SARAVANAN S

TI__Mastermind 33633 points

Hi Saravanan,

A few follow up questions:

What specific "garbage values" are you seeing? Are the debug_us values wildly off (e.g., 0, or hundreds of µs), or just slightly outside your ±5 µs window?
Which pulses are being missed? Is the valid_count shortfall (38 instead of 50) consistent across all 32 pin indices, or concentrated on specific pins (e.g., pins 16–31)?
Are your 32 GPIOs spread across only Port B and Port C, or do they span additional ports?

Direct Answer: Your ISR Is Overrunning the 1 µs Budget

The root cause is confirmed: your ISR takes longer than 1 µs to execute when processing 32 pins, causing you to miss sampling windows and corrupt pulse width measurements.

Here's the math. At 200 MHz, you have 200 CPU cycles per 1 µs timer tick. The minimum interrupt latency alone is 14 SYSCLK cycles (70 ns) [1]. Your ISR then must:

Read getTimestamp()
Read two port registers (GPIO_PORT_B and GPIO_PORT_C)
Loop through 32 pins with branching, array accesses, and arithmetic

Each iteration involves multiple memory accesses to arrays (pin_enabled[], pin_info[], pin_state[], confirm_count[], rise_time[], cand_fall_time[]) plus conditional logic. Even at ~5–6 cycles per array access, 32 iterations with this logic easily exceeds 200 cycles. When the ISR doesn't complete before the next timer tick fires, you miss entire sample points, which explains why:

rising_count reaches 50 (edges are eventually detected, just delayed)
valid_count only reaches 38 (the measured width is corrupted by missed samples, pushing some measurements outside your ±5 µs window)

This is consistent with the 16-pin case working—half the iterations stays within budget [2][3].

Why Your Current Code Structure Fails at 32 Pins

Your timestamp approach has a critical flaw at scale: rise_time[i] = now uses the same timestamp for all 32 pins, but the falling edge detection happens in a different ISR invocation. If the ISR overruns and you miss a tick, the elapsed cycle count becomes inaccurate by multiples of your sampling period. With ±5 µs tolerance (31–41 µs valid window), missing even 1–2 samples shifts the measured width by 1–2 µs, and missing more pushes valid pulses outside your acceptance window.

Recommended Fix: Reduce ISR Execution Time

Immediate optimizations (stay with Option 3):

Eliminate the per-pin branch and struct lookup. Pre-compute two bitmasks at initialization:

uint32_t portB_mask; // bits set for all enabled pins on Port B
uint32_t portC_mask; // bits set for all enabled pins on Port C

Then in the ISR, detect changes using XOR instead of iterating all 32 pins:

uint32_t changed_b = (cur_b ^ prev_b) & portB_mask;
uint32_t changed_c = (cur_c ^ prev_c) & portC_mask;

Only process pins with actual transitions. Most ISR invocations will have 0 or 1 transitions, reducing the loop to near-zero work.

Use __builtin_ctz() (count trailing zeros) to find set bits in the changed mask without iterating all 32 positions:

while (changed_b) {
uint16_t bit_pos = __builtin_ctz(changed_b);
// process only this pin
changed_b &= ~(1UL << bit_pos);
}

Store prev_b and prev_c at the end of each ISR to enable edge detection via XOR.
Move arrays to RAM and ensure the ISR itself executes from RAM (not Flash) to eliminate wait states that add to latency [1].
Verify with ERAD profiling — the F28P65x has an Embedded Real-time Analysis and Diagnostics module that can measure your actual ISR execution time in CPU cycles [4]. This will confirm whether your ISR fits within budget after optimization.

If optimization alone isn't sufficient:

Increase timer period to 2 µs — doubles your cycle budget to 400 cycles, at the cost of ±1 µs additional measurement uncertainty (still within your ±5 µs tolerance)
Offload to CLA — the CLA has direct GPIO data register access [5] and can run the polling loop in parallel with the main CPU
Use CLB tiles (6 available on F28P65x) for hardware-based pulse qualification on a subset of pins [6]

Summary

Problem	Cause	Fix
`valid_count` < `rising_count`	ISR overrun corrupts width measurement	Reduce ISR time via XOR-based edge detection
Works at 16 pins, fails at 32	ISR exceeds 200-cycle budget	Process only changed pins, not all 32 every tick
Garbage values in `debug_us`	Missed samples shift timestamps	Execute ISR from RAM; consider 2 µs period

To help refine this recommendation further, it would be helpful to know:

Your actual ISR execution time (use ERAD or toggle a GPIO at ISR entry/exit and measure with a scope)
Whether the valid_count shortfall is uniform across all pins or concentrated on later-indexed pins (which would confirm the intra-ISR timing skew)
Whether CPU2 is available for workload splitting
Your compiler optimization level (-o2, --fp_mode=relaxed, etc.)
Whether the code is executing from Flash or RAM

Best Regards,

Zackary Fleenor

0 Fleenor 1 month ago in reply to SARAVANAN S

TI__Mastermind 33633 points

Hi Saravanan,

I wanted to check in on the status of this thread. In my last response, I provided:

Root cause analysis confirming your ISR is exceeding the 1 µs execution budget at 32 pins
Specific code optimizations using XOR-based edge detection and __builtin_ctz() to process only changed pins
Profiling recommendations using ERAD to measure actual ISR execution time

To help move this forward, could you provide an update on:

Have you implemented the XOR-based edge detection approach? This should dramatically reduce ISR overhead by processing only pins with transitions rather than all 32 pins every tick.
What is your actual ISR execution time? You can measure this by:
- Using ERAD profiling , or
- Toggling a GPIO at ISR entry/exit and measuring with an oscilloscope
Is the valid_count shortfall uniform across all 32 pins, or concentrated on specific pin indices? This would confirm whether the issue is intra-ISR timing skew vs. overall ISR overrun.
Compiler settings: What optimization level are you using (-o0, -o2, -o3)? Is the ISR executing from Flash or RAM?
Is CPU2 available for workload splitting if needed?

If you've resolved the issue using a different approach, I'd be interested to hear what worked for your application. If you're still encountering difficulties, the additional diagnostic information above will help us identify the exact bottleneck.

Looking forward to your update.

Best Regards,

Zackary Fleenor

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28P650DK: Multi-GPIO Pulse Width Measurement Using Timer-Based Sampling

The Root Cause: CPU Overhead Scaling

Recommended Solutions

Option 1: eCAP Signal Monitoring Unit (Best for a subset of critical pins)

Option 2: CLB (Configurable Logic Block) Pulse-Width Filter (Best for hardware-based qualification)

Option 3: Optimize Your Polling Approach

Option 4: Hybrid Architecture

Conclusion

Direct Answer: Your ISR Is Overrunning the 1 µs Budget

Why Your Current Code Structure Fails at 32 Pins

Recommended Fix: Reduce ISR Execution Time

Immediate optimizations (stay with Option 3):

If optimization alone isn't sufficient:

Summary