This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3220SF-LAUNCHXL: sl_Start hangs (new)

Part Number: CC3220SF-LAUNCHXL
Other Parts Discussed in Thread: UNIFLASH, CC3220SF, CC3220S

Hi community !

I have read all the other threads on "sl_Start hangs" and they did not help.

It has been quite hard to chase that one down because I was not able to reproduce with a very simple code. Here below the simplest version of my code that still represents what I want to achieve:

void wifi_process(void) {

	switch(state) {
	case 2:
		sl_Start(0, 0, 0);
		break;
	case 20:
		sl_WlanSetMode(ROLE_AP);
		break;
	case 21:
		sl_Stop(0);
		sl_Start(0, 0, 0);
		DEBUG_INFO("Success! :)\r\n");
		break;
	case 22:
		return;
	default:
		break;
	}
	++state;
}

The 2nd call to sl_Start hangs for about 1min and then throw a "[ERROR] - FATAL ERROR: Async event timeout detected [event opcode =0x8]".

It must be a kind of timing issue, because when having a lot of debug output (on the UART ), the issue is NOT present, when removing debug output, the issue appears...

Another way to "toggle" the issue (present / not present) is to add (enough) other processes in the forever loop! Note that if only the wifi_process() function above is present in the forever loop, everything is fine...

Now, an even more funny thing is (which I hope can give you, experts out there, a clue) : while the issue was happening consistently using the debugger, I simply loaded another program (demo example from SDK 1.6) to the chip with UniFlash. Then, WITHOUT changing anything to my code, run the debugger again, and the issue was gone !!! This does not make any sense to me, and I truly hope someone can shine some light on this issue.

Thanks a bunch !

Note that all this happens while using the debugging functionality in CCS v7.3.

SDK 1.60.0.04

Below, more code to illustrate the weirdness of the problem...

NOT working:

// Forever Loop
    for(;;) {

    	/* The SimpleLink host driver architecture mandate calling 'sl_task' in a NO-RTOS application's main loop.       */
		/* The purpose of this call, is to handle asynchronous events and get flow control information sent from the NWP.*/
		/* Every event is classified and later handled by the host driver event handlers.                                */
		sl_Task(NULL);

		/* Modules process functions */
		wifi_process();
		other_process_A();
		other_process_B();
		other_process_C();
		other_process_D();
		other_process_E();
		other_process_F();
		other_process_G();
		other_process_H();
		other_process_I();
		other_process_J();
		other_process_K();
		other_process_L();
    }

Working !

// Forever Loop
    for(;;) {

    	/* The SimpleLink host driver architecture mandate calling 'sl_task' in a NO-RTOS application's main loop.       */
		/* The purpose of this call, is to handle asynchronous events and get flow control information sent from the NWP.*/
		/* Every event is classified and later handled by the host driver event handlers.                                */
		sl_Task(NULL);

		/* Modules process functions */
		wifi_process();
//		other_process_A();
//		other_process_B();
//		other_process_C();
//		other_process_D();
//		other_process_E();
//		other_process_F();
//		other_process_G();
//		other_process_H();
//		other_process_I();
//		other_process_J();
//		other_process_K();
//		other_process_L();
    }

  • Hi Vincent,

    Are you seeing this occur with any amount of "other_process" calls are added or is it specifically the amount you have added here? What do these processes consist of?

    Thanks,
    Ben M
  • Hi Benjamin, 

    Thanks for taking the time.

    The other processes are pretty much irrelevant. I believe that it's just about timing. I have now been able to reproduce the issue with calling the wifi_process() and by calling 20 times the same other process (which only consist of one if() that will be true every 5 secs. Note that by re-programming the board with my own program through UniFlash slightly changed the behavior and I had to add the Scan functionality (which is what I'm after here) in order to reproduce consistently (see explanation below about this). Anyway, here is the code that I can reproduce with:

    // Forever Loop
        for(;;) {
    
        	/* The SimpleLink host driver architecture mandate calling 'sl_task' in a NO-RTOS application's main loop.       */
    		/* The purpose of this call, is to handle asynchronous events and get flow control information sent from the NWP.*/
    		/* Every event is classified and later handled by the host driver event handlers.                                */
    		sl_Task(NULL);
    
    		/* Modules process functions */
        	wifi_process();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
    		process_A();
        }

    with

    void wifi_process(void) {
    
    	static uint32_t mGeneralTimer;
    
    	switch(mState) {
    	case 0:
    		DEBUG_INFO("Wifi init...\r\n");
    		++mState;
    		break;
    	case 1:
    		sl_Start(0, 0, 0);
    		mState = 20;
    		break;
    	case 20:
    		/* Set NWP role as STA */
    		sl_WlanSetMode(ROLE_STA);
    		++mState;
    		break;
    	case 21:
    		/* Restart the device */
    //		DEBUG_INFO("Restarting...\r\n");
    		sl_Stop(0);
    		sl_Start(0, 0, 0);
    		++mState;
    		break;
    	case 22: {
    //		DEBUG_INFO("Scanning...\r\n");
    		SlWlanNetworkEntry_t netEntries[30];
    		_i16 resultsCount = sl_WlanGetNetworkList(0, 30, &netEntries[0]);
    		if(resultsCount == SL_ERROR_WLAN_GET_NETWORK_LIST_EAGAIN) {
    			mGeneralTimer = clock_get_current_time();
    			mState = 23;
    		}
    		else if(resultsCount >= 0) {
    			mState = 30;
    		}
    		break;
    	}
    	case 23:
    		if(clock_diff_to_now(mGeneralTimer) > 2000) {  // wait 2 secs
    			mState = 22;
    		}
    		break;
    	case 30:
    		sl_WlanSetMode(ROLE_AP);
    		++mState;
    		break;
    	case 31:
    		sl_Stop(0);
    		sl_Start(0, 0, 0);
    		mState = 300;
    		DEBUG_INFO("Success! :)\r\n");
    		break;
    	case 300:
    	default:
    		break;
    	}
    
    }

    I hope that you can reproduce ! Thanks !

    Now the another topic: it was very surprising that by re-programming the board with UniFlash the behavior of this particular issue is changed (actually as others that experienced the "sl_Start hangs" issue). How can you explain that the image that is written on the device flash has an influence on the "debugged image" ?   For example, with the code that I post in the original post, the issue would not show up if I previously loaded the same application to the board with UniFlash and THEN debug it... With another image (like mentioned below), the issue would come! And this consistently (back and forth).

    If you want to investigate this phenomenon, here are all the clues that might help to understand :

    If prior to DEBUG my OWN application I program the board through UniFlash with the following:

    - Importing the local_ota UniFlash project, building the local_ota_CC3220SF_LAUNCHYXL_freertos_ccs project, setting the output image into the UniFlash project and programming the board, THEN debugging my own application will hang in that famous second sl_Start call

    - Doing the same with the OOB_SF_freertos UniFlash project, and THEN debugging my own application, it does NOT hang, and works fine.

    - By programming my OWN application through UniFlash and THEN debugging my own application also works

    - (extra) By programming an application where the first call to GPIO_init()  is missing, the debugger in CCS cannot be used

    Hope that will help !

    Kind Regards to all ! Vincent Vuarnoz

  • Hi Vincent,

    The number of processes and the amount of time the loop takes is important because in a nortos system, sl_Task() needs to be called to receive async events. In the case of the simplelink API calls, if sl_Task() isn't called in a timely manner when an sync event is expected (in the case of sl_Start() for example) the operation can appear to have timed-out. I believe this is what is happening by calling the functions in your wifi_process() then having many other processes that don't call sl_Task() before returning to the call at the beginning of the loop.

    Best Regards,
    Ben M
  • Hi, Thanks for your time.

    I have changed my code to the following, but it didn't help...

    This is a serious issue for us. Do you have any suggestions on how to solve it?

    Thanks !

    // Forever Loop
        for(;;) {
    
        	/* The SimpleLink host driver architecture mandate calling 'sl_task' in a NO-RTOS application's main loop.       */
    		/* The purpose of this call, is to handle asynchronous events and get flow control information sent from the NWP.*/
    		/* Every event is classified and later handled by the host driver event handlers.                                */
    		sl_Task(NULL);
    
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
    		sl_Task(NULL);
    		process_A();
        }

    EDIT: Also tried to have the calls to sl_Stop(0) and sl_Start(0, 0, 0) in different states (so sl_Task() is called in between). But it didn't help...

    It must be some kind of timing problem. Remember that my wifi_process() function alone works great !

    We appreciate any help !

  • Any news ?

    + ping #1

    + ping #2

  • Is anyone looking into this???
    It is a serious issue for us !
  • Hi Vincent,

    I built a non-rtos example using the v1.60 SDK based on your example main loop and tested with both the CC3220S and CC3220SF. I do not see the issue. The only difference is that I made my own "other_processes" and removed your "clock_" calls in favor of a blocking sleep call.

    Please provide further guidance.

    Best Regards,
    Ben M
  • Thank you so much for taking the time to look into this. I will try to provide the full file of code and hopefully you'll be able to reproduce. Unfortunately I'll be on holidays next week so that will be after that.

    Kind Regards
    Vincent V.
  • Hey Benjamin,

    I'm able to produce the issue with another project. It is created from scratch on top of SDK v2.10 gpiointerrupt nortos css example. You will find it in attachements.

    Interesting observations:

    • removing the for loop (with very_simple_process()) : it works !
    • with NUM_ITER = 1, it works !
    • with NUM_ITER = 5, it DOESN'T work !
    • with NUM_ITER = 10, it DOESN'T work !
    • with NUM_ITER = 20, it DOESN'T work !
    • with NUM_ITER = 50, it works !

    • with NUM_ITER = 20, and by activating the extra debug messages in "wifi_interface.c" (#define wifi_interface_debug_extra), it also works...

    By "doesn't work", I mean after ~ 1min getting this error message "[ERROR] - FATAL ERROR: Async event timeout detected [event opcode =0x8]", and sl_Start returns error (-2005, SL_API_ABORTED).

    I really hope that you will be able to reproduce! Thanks!

    Vincent V.

    SDK_2.10_gpiointerrupt_CC3220SF_LAUNCHXL_nortos_ccs.zip

  • Hey Benjamin !
    Were you able to reproduce with the simple project I provided?
    Thanks!
  • Dear Vincent -
    CC3220 SDK requires CCS v7.4 and above. See dependencies in SDK release notes, here:
    file:///C:/ti/simplelink_cc32xx_sdk_1_60_00_04/release_notes_simplelink_cc32xx_sdk_1_60_00_04.html
    I noticed you mentioned that you were using CCS v7.3 with SDK v1.60.

    further - SDK version v2.10 was released on schedule (we have done quarterly updates since Q1 of 2017) at the end of March -
    www.ti.com/.../SIMPLELINK-CC3220-SDK, and it requires CCS v8.0, which you can download from here: processors.wiki.ti.com/.../Download_CCS

    You can go here ==> www.ti.com/.../SIMPLELINK-CC3220-SDK and choose ALERT ME button, which will trigger an email to you each quarter to notify you that SDK is updated and the release notes will always have the dependencies listed there to help you stay aligned and up to date.
  • Dear Josh,

    Thanks for your reply.

    The test program that I have uploaded is reproducing the issue with SDK 2.10 and CCS v8.

    Program in attachement again.

    I really hope that you can reproduce! See my previous post on how different values of NUM_ITER changes the behaviour.

    Thank you !

    Vincent V.

    1351.SDK_2.10_gpiointerrupt_CC3220SF_LAUNCHXL_nortos_ccs.zip

  • Hi Vincent,

    I'm currently out of the office, but I'll be back in tomorrow and can test this out then. 

    Best Regards,

    Ben M

  • Great! Looking forward to hear from you.
  • Hi Vincent!

    Yes, I was able to run the simple example you shared and create the issue. I was not able to get the exact same results as your tests with different iterations, but because I was able to recreate it I am investigating now.


    Thanks!

    Ben M

  • Hey Ben !
    How is it going ?
  • Hi Vincent,

    This is still being investigated. No root cause has been identified.

    Best,

    Ben M

  • Hi Vincent,

    The issue seems to be caused by these statements in the code:

    Status = sl_Stop(0);
    Status = sl_Start(0, 0, 0);

    I expect that by calling sl_Stop(0) and sl_Start(0,0,0) back to back, you are not giving the network processor enough time to enter hibernate before attempting to wake it back up. The result is that the wake-up signal from the sl_Start(0,0,0) call is missed while the device is shutting down, which causes the host driver to generate the fatal error for the  host. There is a minimum hibernate time of 10 ms for the device, which I believe this issue is related to.

    By enabling the extra debug print statements, you give the network processor enough time to shut down and so the test passes.

    You can fix the issue by increasing the value of the parameter passed to the sl_Stop(). I tested with sl_Stop(1) and it seems to be fine, though I would typically recommend something like 100 or 200 unless the shorter time is absolutely critical to the application.


    Best Regards,

    Ben M

  • Hey Ben,
    Alright. Thanks for the update !
  • Hey Ben,
    I have tested with a delay of 10ms and everything is working fine!
    This is great news! Thank you very much.

    Though, I would mention this in the documentation. At the moment every example uses a value of 0 for timeout when needing to restart the device.

    Thanks for your time !