CC2541 Hangs

Dan Nelson

I've been making a bit of progress on this. The processor seems to hang when there is a change in the ble connection state and there is an interrupt on port 1. A series of interrupts on port 1 will ultimately result in a data packet of 15 bytes being send from the peripheral. I'm using the latest 1.3.2 stack and the 8.10.3 IAR compiler. I thought that I might have been pushing data onto the ble stack too quickly, but I've convinced myself that's not the case. When the processor hangs, I get this message:

Wed Jul 03 16:07:57 2013: The stack 'IdataStack' is filled to 100% (192 bytes used out of 192). The warning threshold is set to 50.%
Wed Jul 03 16:07:57 2013: The stack 'XdataStack' is filled to 100% (640 bytes used out of 640). The warning threshold is set to 50.%
Wed Jul 03 16:07:57 2013: The stack pointer for stack 'XdataStack' (currently XData:0xFFFF) is outside the stack range (XData:0x0001 to XData:0x0281)

I'm ready to ship product but can't do so because of this bug. I hope someone from TI can have a look at this and give me some suggestions. I've come to believe that it's a problem in the ble stack.

Dan

over 12 years ago

0 Aslak N. over 12 years ago

TI__Mastermind 23440 points

Hi Dan,

Those messages right there (everything max / FF) means that the debugger has lost it's sync with the device. You can verify it by checking e.g. the disassembly view or the register view for FF's.

It may be that IAR 8.20.2 is more robust in this regard - at least it is in my limited experience with it. However, as the libraries are built with 8.10.4, that's what we recommend for the release build.

To your original question - what does this interrupt routine do? Does it happen only for port 1? Are you yourself sending out that data packet? What does it contain? Do you have a sniffer trace?

Best regards,
Aslak

0 Dan Nelson over 12 years ago in reply to Aslak N.

Intellectual 480 points

Thanks Aslak,

The device is a peripheral. The process is as follows: It waits for a connection from the central. Once connected, the central sends it a command to start. This starts a repeating 250 mS event timer. The event sends an i2c command to a sensor. The sensor responds with an interrupt on port 1 when it has data ready. The interrupt routine reads data on port 1 and processes it. The interrupt routine then kicks off another command to the sensor which triggers a second interrupt. The second interrupt sets an event (osal_set_event). This final event does some final processing and then sends a 15 byte packet to the central.

The above process repeats until the central sends a command to stop. So every 250 ms the peripheral collects data from two interrupts from the sensor. The data are then processed in an event which sends out a 15 byte packet. This process continues quite happily.

But the processor will occasionally hang, apparently when there is a loss of connection, or perhaps when the connection re-establishes. The hang seem to be happening during the interrupt routine itself, that's what an LED is telling me, although I'm not 100% sure that it won't occur outside of the interrupt as well.

I start the interrupt routine with a HAL_ENTER_ISR() and end it similarly. I originally thought that I might be pushing data too fast to the ble stack, but I monitor the connection status and stop the process any time the connection state changes. The data are collected and processed well within the 250 ms window, there is no over-run. An oscilloscope shows that the interrupts are playing nicely.

Dan

There are two sensors, both with interrupts on port 1. The first sensor is given an i2c command to start collecting data. When the data are ready, sensor signals the processor via the interrupt. During the interrupt routine, data is read from the sensor and then a second command to the sensor is sent via i2c. After

0 Aslak N. over 12 years ago in reply to Dan Nelson

TI__Mastermind 23440 points

Hi,

You might have run into a bug associated with entering sleep. In essence what happens in these cases is that the ISR happens just as the device is entering sleep. You can verify this by disabling POWER_SAVE and checking if it works then.

If this is the case, the workaround is;

- In the ISR, add CLEAR_SLEEP_MODE(); (this should always have been there to prevent OSAL from sleeping if the ISR has set an event)

- In hal_sleep.c, remove the line with PCON_IDLE from this macro:

#define HAL_SLEEP_PREP_POWER_MODE(mode) \
st( SLEEPCMD &= ~PMODE; /* clear mode bits */ \
SLEEPCMD |= mode; /* set mode bits */ \
while (!(STLOAD & LDRDY)); \
halSleepPconValue = PCON_IDLE; \
)

- Ensure that you at some point in the osal_run_system() loop have the line ALLOW_SLEEP_MODE();. On some targets this is included in Hal_ProcessPoll and so doesn't have to be inserted by you. If it's not there, insert this line right below the call to Hal_ProcessPoll() in osal_run_system().

Best regards,
Aslak

0 Dan Nelson over 12 years ago in reply to Aslak N.

Intellectual 480 points

Thanks Aslak, I'll try that this morning.

Dan

0 Dan Nelson over 12 years ago in reply to Aslak N.

Intellectual 480 points

No, doesn't fix it.

:-(

0 Dan Nelson over 12 years ago in reply to Aslak N.

Intellectual 480 points

Here's an update at the end of the day.

Power save mode is turned off. Voltage regulator control is turned off.

Debug LEDs indicate that the processor thinks it's in a connection when it hangs. The LEDs also indicate that it has not exited from the interrupt routine.

I've been looking at sniffer files but have not yet come to any conclusions. No orderly connection termination is indicated when the processor hangs.

The processor appears to truly have died, not just the ble connection. There is no response to the debug terminal, and when I set a recurring task to toggle the LED, toggling ends.

My guess so far is that there is some timing issue between the interrupts and either the ble stack or the osal. Slight variation in the code of the interrupt routine affect how readily the processor hangs.

I have noted that the central doesn't enable notification until after it has sent the command to start sending data. So the peripheral might send its 15 byte notification beforehand. I wouldn't think that this would cause a problem.

Dan

0 Aslak N. over 12 years ago in reply to Dan Nelson

TI__Mastermind 23440 points

Hi,

This sounds weird. Could you post the content of the ISR routine(s)?

Aslak

0 Dan Nelson over 12 years ago in reply to Aslak N.

Intellectual 480 points

Sorry I didn't get back to you yesterday, time zone differences are probably getting the better of us.

My isr is at the end of this email.

I think there is a timing issue involved. If the process has started successfully then it seems to continue to run successfully. If there is an error and the processor hangs, it seems to happen shortly after the command to start has been received from the central. And I think that it is most likely to happen if the central sends the command to start the process immediately after the connection has been established.

Also note that the addition of the routine called test() alters the timing? enough to increase the likelyhood of the hang.

I've got a ble sniff and this is what happens just before the hang.

1. ATT_Write_Req (The central sends a 'start' command)

2. ATT_Write_Rsp

3. ATT_Find_Info_Req

4. ATT_Find_Info_Rsp

5. ATT_Write_Req (the central enables notifications, probably should be before the 'start', but this is how the central code is at the moment)

6 .ATT_Write_Rsp

7. ATT_Find_Info_Req

8. ATT_Find_Info_Req

Connection stops shortly afterwards as the peripheral hangs.

void test( void ) // Does nothing, but does cause trouble in the wrong place.
{
}

#pragma vector = P1INT_VECTOR
__interrupt void p1_ISR( void )
{
   HAL_ENTER_ISR();

// PIN 5 //////////////////////////////////////////////////////////////////////////////////////////////
   if( P1IFG & PIN5 ) // accelerometer
   {

       uint8 accelerometer_samples; // The number of sets of 32 samples we'll collect, i.e., the number of accelerometer interrupts.
       if( probe_type( ) == PROBE_BORE_CAM )
           accelerometer_samples = 4;
       else
           accelerometer_samples = 2;

       static uint32 X, Y, Z;
       uint32 read_X, read_Y, read_Z;

       accelerometer_read_dataset( &read_X, &read_Y, &read_Z );
       accelerometer_interrupt_count++;

       if( accelerometer_interrupt_count == 1 )
           X = Y = Z = 0;

       X += read_X;
       Y += read_Y;
       Z += read_Z;

       if( accelerometer_interrupt_count >= accelerometer_samples ) // Last interrupt in this series.
       {
           accelerometer_interrupt_count = 0;

           X /= ( 32 * accelerometer_samples );
           Y /= ( 32 * accelerometer_samples );
           Z /= ( 32 * accelerometer_samples );

           uint8 x_high, x_low, y_high, y_low, z_high, z_low;

           x_high = ( uint8 )( X >> 8 );
           x_low = ( uint8 )X;
           y_high = ( uint8 )( Y >> 8 );
           y_low = ( uint8 )Y;
           z_high = ( uint8 )( Z >> 8 );
           z_low = ( uint8 )Z;

           if( ( probe_type( ) == PROBE_CORE_CAM ) || ( probe_mode == PROBE_MODE_ROLLING_SHOTS_ACCELEROMETER_ONLY ) )
           {
               core_shot.accelerometer.x_high = x_high;
               core_shot.accelerometer.x_low = x_low;
               core_shot.accelerometer.y_high = y_high;
               core_shot.accelerometer.y_low = y_low;
               core_shot.accelerometer.z_high = z_high;
               core_shot.accelerometer.z_low = z_low;
           }
           else
           {
               bore_shot.accelerometer.x_high = x_high;
               bore_shot.accelerometer.x_low = x_low;
               bore_shot.accelerometer.y_high = y_high;
               bore_shot.accelerometer.y_low = y_low;
               bore_shot.accelerometer.z_high = z_high;
               bore_shot.accelerometer.z_low = z_low;
           }

               // Continue processing later.
           accelerometer_stop( );
           osal_set_event( camera_task_id, ACCELEROMETER_ACQUISITION_COMPLETE );
       }
       else if( probe_mode != PROBE_MODE_ROLLING_SHOTS_MAGNETOMETER_ONLY ) // Kick off another data aquisition cycle.
       {

//           test( );
           accelerometer_initiate_aquisition( );
       }
       P1IFG &= ~PIN5;   // Clear the individual pin interrupt flag.
   }

   IRCON2 &= 0xf7; // Clear the port1 interrupt flag.
   CLEAR_SLEEP_MODE();
   HAL_EXIT_ISR();

}

0 Aslak N. over 12 years ago in reply to Dan Nelson

TI__Mastermind 23440 points

It occurs to me that a couple of things could be happening.

Obviously, the RF event occurs during your interrupt. This causes an interrupt on Timer2 that probably changes the context from your interrupt to the T2_VECTOR.

In this ISR (in the LIB), if you have halt during rf enabled (it's enabled by default) the device will enter PM0 during the RF event and turn off the gating of the clock to the CPU. It may be that your i2c routine objects to being interrupted, and so hangs.

If you have clkdivide on halt enabled (disabled by default), the system CLKSPD will be set to 1MHz during the RF event. This will likely play merry hell with your ongoing I2C transaction and perhaps hang somewhere in the I2C routine.

You can't avoid being interrupted (unless you use HAL_ENTER_CRITICAL_SECTION( ) ), but you can avoid having the T2 interrupt last a long time by calling HCI_EXT_HaltDuringRfCmd(HCI_EXT_HALT_DURING_RF_DISABLE); during your init.

If you wish to Halt the MCU during the RF event to reduce the peak current consumption, you could still avoid messing with the CLKSPD by calling HCI_EXT_ClkDivOnHaltCmd( HCI_EXT_DISABLE_CLK_DIVIDE_ON_HALT );

Best regards,
Aslak

0 Dan Nelson over 12 years ago in reply to Aslak N.

Intellectual 480 points

Thanks Aslak,

I've had a close look at the i2c routines and there's nothing that I can see that would cause them to hang. I tried your suggested fix as well, with no success.

If I enforce a 5 second delay from the connection until I start my interrupt routines and sending 15 byte notifications, then I don't seem to get any problems. If I leave a shorter delay, then it's very likely that the connection will be dropped within a few seconds of being established. At this point another connection will be made and the cycle may continue, and at some point the processor will most likely hang, usually withing half a dozen of these reconnection cycles.

Often I will see an update parameter request about 5 seconds after the connection is made. If I haven't enforced a 5 seconds delay, and the peripheral has interrupts triggered and notifications being sent ( 800 ms intervals), then I will likely see the next notification dropped, the one that was scheduled after the update parameter request. The connection will then be lost shortly afterwards. This is common, but not consistent. I originally thought that this may have been the problem, but not so sure now.

Time to go home after a long day. More scans tomorrow.

Dan

0 Aslak N. over 12 years ago in reply to Dan Nelson

TI__Mastermind 23440 points

Hi Dan,

Well, the i2c functions could hang if there's any while loops that are waiting for some HW flag, and this particular HW module got confused when the tick speed changed.

Try to set some GPIOs and time the interrupt function. Does it take an unreasonable amount of time? How much time does it take? What's your connection interval before and after the connection parameter update?

You could also set some GPIOs before some suspicious I2C-function, and then clear the GPIO after the call has returned. In this way you can see where it hangs.

How does the i2c communication look when it hangs? Do you have a logic analyzer available to record this?

So, in short, disabling HaltOnRF and ClkDivideOnHalt doesn't make any difference?

Best regards,
Aslak

0 Dan Nelson over 12 years ago in reply to Aslak N.

Intellectual 480 points

Thanks Aslak.

I put GPIO toggles either side of the i2c while loops yesterday, nothing hanging there. And the problem occurs even with no power saving mode, so there wouldn't be a tick speed change.

Disabling HaltOnRF and ClkDivideOnHalt doesn't make any difference.

The biggest clue at the moment is that process seems to work fine as long as I wait 5 seconds after the connection is established before I start sending the 800 millisecond notifications. What I've seen so far is that the process is likely to go wrong right after the parameter update request. I need to determine whether that's coincidental or causal. Its timing is pretty suspicious.

The only scan I have of an actual hang shows that the hang occurs exactly when the notification should have been sent. In all of the scans where there is a disconnection only, I see the disconnection shortly after the notification should have been sent but wasn't.

Dan

0 Andrew King over 12 years ago in reply to Dan Nelson

Prodigy 230 points

Dan, Alask

I don't know whether my experience is relevant, but I found that introducing a 10mSec repetitive interrupt (a very fast service routine) on ANY of the peripheral ports 0, 1, & 2 caused an existing working Peripheral - Observer application to lose both its Advertising and Observation capability.

I didn't have time to go into a detailed analysis of the fault as I was able to achieve the desired functionality by redefining a peripheral port as a timer port and using the timer suitably configured to cause an interrupt on an edge trigger (!!).

It would seem that there is clearly some difference between the ways that peripheral and ,at least, timer interrupts interact with the BLE stack.

Regards

ayemk

0 Dan Nelson over 12 years ago in reply to Andrew King

Intellectual 480 points

Thanks Ayemk, all clues are useful at this point.

Dan

0 Dan Nelson over 12 years ago in reply to Dan Nelson

Intellectual 480 points

Slight correction to the last paragraph of the previous post. The processor hangs at the time that the next cycle is initiated, not when the notification would have been sent. So this is when i2c and interrupts would be happening.

Dan

0 Aslak N. over 12 years ago in reply to Dan Nelson

TI__Mastermind 23440 points

Hi Dan,

Interesting observation on the ConnParamUpdate timing..

Without ClkDivOnHalt I think you should be able to debug. If you are, please let me know where in the code it breaks, and if applicable what the call stack is (View->Call stack) when it hangs and you press break in IAR. If you are able to get it, perhaps IAR 8.20.2 is more robust for debugging.

Also, please confirm whether turning off the Connection Parameter Update request stops it from hanging (or whether a big timeout consistently stops it from hanging).

Are you able to determine whether it manages to exit the ISR routine when it hangs? Should be relatively easy to see from a GPIO perspective.

Best regards,
Aslak

0 Dan Nelson over 12 years ago in reply to Aslak N.

Intellectual 480 points

I think I have two separate processor hanging issues. One has to do with i2c, as you have suggested. I think the other one is separate to that. The i2c problem is difficult to reproduce, the other can be reproduced relatively easily and always occurs soon after a connection. I've implemented your i2c suggestions, but the second problem still persists.

First things first, the (maybe)non-i2c problem.

The parameter update request appears to be coincidental, but yes, a large timeout after the connection before starting consistently stops the connection dropping/reconnecting cycle. Turning off parameter update has no effect. It's still a bit of a mystery as to what happens during the period after the connection. I need to wait 5 seconds after the connection to be sure that the problem doesn't occur, and 5 seconds is an eternity in the life of a microprocessor. I may go through one, two or three data collection/interrupt cycles before the connection drops. GPIO LEDs tell me that the processor hasn't exited the interrupt. I also have a separate LED switched in GAP_state_change_callback() that tells me that the processor thinks is was in a connection when the eventual hang occurs.

IAR doesn't give me anything other than the stack overflow message.

Now, I've got a few questions about the i2c fix.

1. Is this the correct sequence when exiting an interrupt routine?

    IRCON2 &= 0xf7; // Clear the port1 interrupt flag.
   CLEAR_SLEEP_MODE();
   HAL_EXIT_ISR();

2. I've called HCI_EXT_ClkDivOnHaltCmd( HCI_EXT_DISABLE_CLK_DIVIDE_ON_HALT ) during my intialisation, but not HCI_EXT_HaltDuringRfCmd(HCI_EXT_HALT_DURING_RF_DISABLE). And I've seen a hang since doing so (different to what we've been talking about as above). In this case both i2c lines were left pulled low, so I'm assuming some i2c problem as you've described. Do I need to set both of these parameters as shown here?

Thanks Aslak, appreciate your input.

Dan

0 Dan Nelson over 12 years ago in reply to Dan Nelson

Intellectual 480 points

I think I've figured out what's happening. The central (iPhone) is initially connecting at a 30 ms rate. There is a small exchange of data which goes well. Then the central puts the peripheral into data collection mode and the processor starts working harder, there are multiple interrupts and i2c communications. I think that there's just too much going on for it to sustain a fast connection rate at the same time and is therefore unstable. The connection drops and restarts, and after a time the processor can hang. If the data collection and all its associated interrupts and i2c accesses is delayed until after the connection parameter update, then all is well, hence the success of putting in the delay.

Is there a callback or some indicator that the connection parameter update has been completed?

Also, would you please have a look at the last part of my previous post and comment?

Thanks

Dan

0 Aslak N. over 12 years ago in reply to Dan Nelson

TI__Mastermind 23440 points

Hi Dan,

I see. Yeah, timing (and processing time) is everything in the BLE game.

In the BLE 1.3.2 stack you can get a notification to the application of the update succeeding. See GAPRole_RegisterAppCBs() for peripheral.c/h.

Best regards,
Aslak

0 Ehsan Azarnasab over 11 years ago in reply to Aslak N.

Prodigy 70 points

any news on this one? I think we have a similar problem and also IAR just shows the stack overflow message.

Bluetooth®︎

Bluetooth forum

CC2541 Hangs