• Resolved

CC3120MOD: Call to sl_WlanProfileGet() and then sl_Stop() results in CMD_TIMEOUT and SL_API_ABORTED

Genius 4140 points

Replies: 14

Views: 172

Part Number: CC3120MOD

Hi TI,

Another product, another unique configuration, another set of issues we've found with CC3120MOD parts during test.

We noticed during some testing that when accessing the WLAN Profiles and then stopping the SL, we get the dreaded SL_API_ABORTED, we log the event callback and we get a CMD_TIMEOUT error.

To start with we are using the latest service packs (EDIT: Using 3.10.00.04) and latest SDK. Our operating system is FreeRTOS and our CPU is MSP432P4111. This is a totally different configuration (hw / sw and use case) from my other posts on the topic of CC3120MOD.

To capture the fault, I've captured the following signals:

1. TP_4, this is a test point used in firmware to indicate when we are entering and exiting the sl_WlanProfileGet() call

2. INTR: This is the interrupt signal from the CC3120MOD

3. SPI: As normal

4. ISR_UNMASK: This signal indicates when the SL driver has unmasked the interrupt

5. ISR_FIRE: As we are using FreeRTOS we have our ISR handler wake up a task, when this signal toggles we call _SlDrvRxIrqHandler(NULL) in a thread safe manor

The image below shows the following

Point 1. Start call to sl_WlanProfileGet()

Point 2: Finished call to sl_WlanProfileGet(), starting call to sl_Stop()

Point 3: SL driver is locked up, note the interrupt fires, but driver does nothing. The task blocks until the timeout occurs.

However when I insert a one second delay between calls to sl_WlanProfileGet() and sl_Stop(), I get the following successful result:

The following notes are made:

1. After about 0.8 Seconds the CC3120MOD fires an interrupt and this seems to all the driver to continue and permit the sl_Stop();

2. Adding 1 second delay before sl_Stop() ensures this interrupt does not get missed.

The demo code to re-produce the error is pasted below.

Attached is also the logic captures if needed.

Any thoughts TI?

Thanks.

void vCC3X20__DemoFault(void)
{
	Lint16 s16Return;
	Lint16 s16Status;
	SlWlanSecParams_t SecParams;
	SlWlanGetSecParamsExt_t SecExtParams;
	Lint8 s8ProfileName[SL_WLAN_SSID_MAX_LENGTH];
	Lint16 s16ProfileNameLength;
	Luint8 u8MACAddx6[6];
	Luint32 u32Priority;


	// Start the SL
	s16Return = sl_Start(0, 0, 0);
	if(s16Return < 0)
	{
		// Fault
		// SL_API_ABORTED = -2005 which is the bad one.
	}
	else
	{
		// Debug
		MAP_GPIO_setOutputHighOnPin(C_BSP__TP__4);

		// Get the profile
		s16Status = sl_WlanProfileGet(0, (signed char *)&s8ProfileName[0], &s16ProfileNameLength, &u8MACAddx6[0], &SecParams, &SecExtParams, (unsigned long *)&u32Priority);
		if(s16Status < 0)
		{
			// Error
		}
		else
		{

			// Remove this statement, causes the fault
			vTaskDelay(1000);

			// Indicate when stop is occurring
			MAP_GPIO_setOutputLowOnPin(C_BSP__TP__4);

			// Stop now
			s16Return = sl_Stop(200);
			if(s16Return < 0)
			{
				//Error
			}
			else
			{
				// success
			}
		}

	}

}

Demo Faults.zip

  • This may also be a duplicate of this issue: https://e2e.ti.com/support/wireless-connectivity/wifi/f/968/t/852088

    Thanks.

  • In reply to stomp:

    Hi Stomp.

    Thanks for the detailed analysis. I suspect maybe there is something wrong in the host driver. I will try to reproduce now and get back to you.

    Best Regards,
    Vince 

  • In reply to Vincent Rodriguez:

    Hi,

    Last night I upgraded our driver version to include the modifications in SDK 3.30.01.02 (specifically the improvements to object syncing).

    This is has improved our situation and we'll continue to monitor over the next week or so to see the impact.

    Thanks.

  • In reply to Vincent Rodriguez:

    Hi,

    We've just had another failure on our test unit. This is with the 3.30.01.02 code.

    Here is the scenario:

    1. Bring up the link and run as normal, set station mode, wlan options, etc..

    2. If we get a fatal event, bring the link down by calling sl_stop()

    3. Restart step 1

    We have received a SYNC_LOSS event during testing and the driver hang the OS due to some issues with sl_SyncObjDelete trying to delete an already deleted object.

    Here is the stack trace:

    And the offending function:

    ActiveIndex = 0

    g_pCB->NumOfDeletedSyncObj = 0

    g_pCB->ObjPool[ActiveIndex].NextIndex = 0

    Thanks.

  • In reply to stomp:

    Actually if you have a look at the above code:

     ActiveIndex = g_pCB->FreePoolIdx;
        while(MAX_CONCURRENT_ACTIONS > ActiveIndex)
        {
            OSI_RET_OK_CHECK(sl_SyncObjDelete(&g_pCB->ObjPool[ActiveIndex].SyncObj));
            g_pCB->NumOfDeletedSyncObj++;
            ActiveIndex = g_pCB->ObjPool[ActiveIndex].NextIndex;
        }

    We have a g_pCB.ObjectPool that has 18 items (Array index 0-17).

    This code relies on ActiveIndex > 18 for the while loop to exit.

    In my code application g_CB.ObjPool[0].NextIndex = 0;

    How is this supposed to work? 

    It looks like we have to cause an intentional array indexing error when ActiveIndex == 18.

  • In reply to stomp:

    And here is another example from a test. We have a UDP socket loading up our device with Tx and Rx packets.

    My ObjPool is 18 items, I've been through the entire driver and put memory protection around all of the ObjPool array indexing.

    Here is the call stack:

    Here is the offending function:

    The value pointed to by *ListIndex == 18, this means that g_pCB->ActivePoolIdx == 18

    This means that the very first call on line 2636 (in my code) in driver.c (will differ from your code as I've put array checks everywhere) tests the 18th element in the array for its next_index. The 18th element does not exist. The 17th does.

  • Genius 13610 points

    In reply to stomp:

    Hi stomp,

    Do you have a code excerpt for reproducing this fatal error for the UDP case? I would like to test it here to help you better. 

    Jesu

  • Genius 13610 points

    In reply to stomp:

    Stomp,

    FreePoolIdx keeps track of the pool objects that are not being used. In total there are 18 pool objects and they get allocated from the lowest index first. As one pool object gets allocated (e.g. say for sl_Recv) the freepoolidx is incremented. This while loop just goes through all the free indexes and begins to delete them till it gets to 18.

    Moving on, I doubt there is issues related to that code because those sync objects are not being used so the host driver should have no problem deleting them.

    Jesu

  • In reply to Jesu:

    Thanks Jesu,

    We've struggled for at least 2 years now with the reliability of the communications links to these devices. We mainly do safety critical here, so maybe our testing is a bit more extreme.

    This system in particular has a MSP432P4111 + CC3120MOD and has tasks for WiFi link management, WiFi network scanning, AWS, AWS+OTA and UDP, so there is a lot going on.

    As I see it, the ObjPool can be accessed and modified in unsafe cases (based on simple array bounds cases I posed above) and you are relying on thread priorities being just right such that the counting semaphores in the ObjectPool don't suffer from any types of inversion or deadlock as can be the case when the NextIndex is never 18 during sl_stop();

    I'll try and get that UDP code you, its very simple.

    Thanks.

  • Genius 13610 points

    In reply to stomp:

    Hi stomp,

    Could you confirm what version of the wifi plugin you're using for MSP432... is it v2.4? If so, have you made modifications to the host driver?

    Jesu