This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3100 SDK v1.1.0 RTOS bug (disable context switch during MsgWrite and HdrRead)

Other Parts Discussed in Thread: CC3100, CC3200

Hi,


I have downloaded the SDK v1.1.0 for the CC3100 and ported it to a STM32F106 MCU using FreeRTOS.

To get this running, I needed to disable a task switch during _SlDrvMsgWrite and _SlDrvRxHdrRead using taskENTER_CRITICAL and taskEXIT_CRITICAL.

My changes to driver.c:

_SlReturnVal_t _SlDrvMsgWrite(_SlCmdCtrl_t  *pCmdCtrl,_SlCmdExt_t  *pCmdExt, _u8 *pTxRxDescBuff)
{
    ....
    taskENTER_CRITICAL();
#ifdef SL_START_WRITE_STAT
    sl_IfStartWriteSequence(g_pCB->FD);
#endif
   ...
#ifdef SL_START_WRITE_STAT
    sl_IfEndWriteSequence(g_pCB->FD);
#endif
   taskEXIT_CRITICAL();
   return SL_OS_RET_CODE_OK;
}

_SlReturnVal_t   _SlDrvRxHdrRead(_u8 *pBuf, _u8 *pAlignSize)
{
     _u32       SyncCnt  = 0;
    _u8        ShiftIdx;
    taskENTER_CRITICAL();

#ifndef SL_IF_TYPE_UART
    /*  1. Write CNYS pattern to NWP when working in SPI mode only  */
    NWP_IF_WRITE_CHECK(g_pCB->FD, (_u8 *)&g_H2NCnysPattern.Short, SYNC_PATTERN_LEN);
#endif

    /*  2. Read 4 bytes (protocol aligned) */
    NWP_IF_READ_CHECK(g_pCB->FD, &pBuf[0], 4);
    _SL_DBG_SYNC_LOG(SyncCnt,pBuf);

    /* Wait for SYNC_PATTERN_LEN from the device */
    while ( ! N2H_SYNC_PATTERN_MATCH(pBuf, g_pCB->TxSeqNum) )
    {
        /*  3. Debug limit of scan */
        //VERIFY_PROTOCOL(SyncCnt < SL_SYNC_SCAN_THRESHOLD);
        if( SyncCnt > SL_SYNC_SCAN_THRESHOLD ) {
            Uart_Puts("CC3100: Lost sync!\n");
            taskEXIT_CRITICAL();
            while(1) {
            }
        }
    ...
   }
   taskEXIT_CRITICAL();
...
}

Is this a known bug?

  • Hi Peter,

    We are not aware of this. Can you please explain in detail why you are modifying the host driver with this change?

    The same host driver can be used with CC3200 using FreeRTOS and also, on CC3100 - MSP430F5520LP we have the same driver being used with FreeRTOS (Please refer to MQTT client example, which uses FreeRTOS).

    Can you please take them as reference and use the host driver as it is?

    Regards,
    Raghavendra
  • The workaround for _SlDrvRxHdrRead (no context switch during sending sync header and reading data) is based on bug MCS00131560 "Host (SPI): Sync loss may occur after the Host is sending CNYS and wait for a long period of time before reading the response", which is described in the "Servicepack v1.0.0.1.2 Release Notes".

    Please note: I applied the latest Servicepack 1.0.0.10.0 (i.e. uploaded firmware to CC3100) and i'm using the latest CC3100 SDK 1.1.0 (i.e. using all .c and .h files in the directories "cc3100-sdk/simplelink" and "cc3100/sdk/oslib").

    However, it seems that the latest firmware of CC3100 and the latest CC3100 SDK have still the same problem: sync lost during _SlDrvRxHdrRead if hosts sends the sync header and then waits too long for reading the response. I found out that _SlDrvMsgWrite seems to have the same problem: sync lost if the write operation is interrupted for a too long periode.

    My guess is that there is a critical phase during _SlDrvRxHdrRead and _SlDrvMsgWrite during which the SPI communication shall not be interrupted. Otherwise, the sync is lost. To disable this "interrupt" in an RTOS, I disabled context switching before the critical phase and enabled the context switching after the critical phase.

    I also guess that this is just relevant for CC3100 using SPI or UART and a fast host MCU (>50 MHz).

    Does this description help?

    Do you know a different workaround, because disabling context switching in an RTOS is not advisable.

  • Hi Peter,

    Thanks for the clear explanation on your intention of adding this work around and the issue that you were referring to. We will check this internally and get back to you.

    Regards,
    Raghavendra
  • Hi Peter,

    The described but relates to a use case where the device goes to low power deep sleep and not when the device is running so I guess this is not your case.

    Anyway, your workaround implies that there is another task that causes this to occur.

    Can you describe the tasks and their respective priorities in your system? can you profile what happens on time scale with the tasks when this occur?

    Please note that once the driver task execute either of the functions you mentioned, it grabs the global lock (GlobalLockObj) and any other task that tries to communicate with the device would be blocked since the global lock is taken. As long as the other task blocks, there would not be any transactions on SPI lines.

    You may be describing a "priority inversion" occurance. Can you please check?

    Shlomi

  • Hi Shlomi,

    thanks for the answer.

    Some more description of my scenario: the CC3100 is connected to a SPI port, where the CC3100 is the only device. Only one task is accessing the CC3100, therefore I started with SL_PLATFORM_MULTI_THREADED not defined, but then enabled this (also still just one task accessing the CC3100).

    Therefore, I do not think that this is a case of "priority inversion". In addition, FreeRTOS is using priority inheritance for mutexes.

    I played around with my scenario:

    I disabled the FreeRTOS completey and run my application without any OS and scheduler.

    I added a random delay into my functions and sl_IfWrite (CC3100_SpiWrite) and sl_IfRead (CC3100_SpiRead):

    int CC3100_SpiWrite(Fd_t fd, unsigned char *pBuff, int len)
    {
        int i;
        CS_LOW;
        for(i=0;i<len;i++){
            DelayUS(rand()%1000);
            while (SPI_I2S_GetFlagStatus(SPI2, SPI_I2S_FLAG_TXE) == RESET);
            SPI_I2S_SendData(SPI2,pBuff[i]);
            while (SPI_I2S_GetFlagStatus(SPI2, SPI_I2S_FLAG_RXNE) == RESET);
            SPI_I2S_ReceiveData(SPI2);
        }
        CS_HIGH;
        return len;
    }


    int CC3100_SpiRead(Fd_t fd, unsigned char *pBuff, int len)
    {
        int i;
        CS_LOW;
        for(i=0;i<len;i++){
            DelayUS(rand()%1000);
            while (SPI_I2S_GetFlagStatus(SPI2, SPI_I2S_FLAG_TXE) == RESET);
            SPI_I2S_SendData(SPI2,0xff);
            while (SPI_I2S_GetFlagStatus(SPI2, SPI_I2S_FLAG_RXNE) == RESET);
            pBuff[i]=SPI_I2S_ReceiveData(SPI2);
        }
        CS_HIGH;
        return len;
    }


    With this, the MCU looses sync with the CC3100 quite soon.

    Please note: I'm using the latest version of the CC3100 firmware and the CC3100 SDK.

    Can you reproduce the bug with this descrption?

    Best regards,

    Peter

  • Hi Peter,

    I'm not sure that this is the same scenario as in the OS case so let's concentrate on the non-OS first.

    Can you inidicate when the system gets stuck? is it right after initialization complete or is it taking some time? what is the use case you are running (i.e. open a socket and transmit data, just connect to an AP, etc)?

    I would like you to verify that you start clean with no profiles stored on the serial flash. In addition, please work in active mode. This is very important as I would like to overrule any power management issues. To do this you need to activate the following API (as appears on wlan.h):

    For setting always on power management policy use: <b> sl_WlanPolicySet(SL_POLICY_PM , SL_ALWAYS_ON_POLICY, NULL,0)

    I will be able to test it next week (as it is holiday time in here till next week).

    Shlomi

  • Peter,

    Just a quick update from my side.

    I was able to quickly build a setup with MSP430F5529LP and CC3100, using the HTTP server example code from SDK and modified the spi_read() and spi_write() with a 1mSec delay (as I do not have rand API so I took 1mSec delay). I could work with the device as usual and have not observed the device getting stuck or loosing sync.

    Shlomi

  • Hi Shlomi,

    I have done a similar step: setup MSP430F5529LP and CC3100 using the "Getting started with wlan stations" example, added part of my application and the random delay in spi read/write (without DMA).

    And yes, I get the same problem: sync lost to CC3100 just after one second!

    My application: an http-server using "non blocking" socket. If I switch to "blocking" socket, the problem seems to vanish and the sync does not get lost. However, I have not analyzed this enough.

    Note: My application is using an "own" HTTP server on the host MCU (STM32F103, ARM Cortex M3@72MHz), because some kind of CGI scripts are required, e.g., to create SVG images based on files of a SD card. This is not feasible with the HTTP server built in the CC3100 (but this is ok for the CC3100, and of course, I could switch to the CC3200).

    Anyway, how can I send you my MSP430F5529LP/CC3100 CCS project in order that you can reproduce the scenario?

  • Brandon,

    I believe you can attach it in this post. Is there an issue doing so?

    Shlomi

  • I meant Peter, sorry...
  • I can't find a dedicated "upload" button or link.

    Therefore I will try to drag and drop a zip-file into the editor field:

    example.zip

  • Hi Peter,

    Yes, I got your code, thanks.

    I tried to use it as is and got error on listen(). It happens because of port 80 that is occupied for the internal web server. Are you disabling the internal HTTP server? from your code you are not checking for the return value from WebServer_Start() so you would get errors when trying to accept() on a socket that has not passed the listen() phase. Please clarify.

    I still need to test the data transaction but with this setup and non-blocking mode I do not loose sync with the device. The host keeps on calling accept() endlessly and does not loose sync with CC3100.

    Will also update tomorrow.

    Appreciate if you can clarify the above and maybe try with another TCP port and report back.

    Shlomi

  • Peter,

    An update on my side. I executed the same scenario you are running, i.e. connect->receive->send->close in a loop. Did that 1000 times over few minutes and I cannot reproduce.

    Note I do not use your project files. I implemented my own but the scenario is the same.

    Can you where in the process you are failing? what is the call stack? are you able to do any send/receive? can you share logic of the SPI lines?

    regards,

    Shlomi

  • Shlomi,


    when the CC3100 comes out of the box, its HTTP server is not enabled, therefore the HTTP server of the MSP430 can use port 80 and runs without any problems.

    I've updated my program and here is the zip file:

    getting_started_with_wlan_station.zip

    Please use exactly this program and do not try to reproduce the scenario by playing around!

    The zip file contains even the out file: flash this file to your MSP430F5529LP and you will receive something like the following on the backchannel UART (115200 Baud):

     Getting started with station application - Version 1.2.0
    *******************************************************************************
    Version: CHIP 67108864, MAC 31.1.3.0.1, PHY 1.0.3.34, NWP 2.4.0.2, ROM 13107, HOST 1.0.0.10
     Device is configured in default state
     Device started as STATION
    Connect to x:x
    Connected with IP 192.168.1.119, DHCP: 1
    Stop CC3100 HTTP server
     Connection established w/ AP and IP is acquired
    Start MSP430 HTTP server on port 80
    Error: CC3100: Sync lost!


    Some descriptions of the program:

    With lines 48 and 49 of spi.c, you can enable or disable the random delay in spi_Write/Read.
    With line 9 of webserver.c, you can enable or disable the nonblocking mode.
    With line 59 of main.c, you can enable or disable the HTTP server of the CC3100 (along with its port).

    The sync to the CC3100 is lost with the following settings: enabled random delay in SPIwrite/read and enabled nonblocking mode.
    Enabling/disabling the HTTP server of the CC3100 makes no effect.

    Best regards,

    Peter

  • Sure Peter,

    Will test and get back to you.

    Shlomi

  • Hi Peter,

    With your workspace I am able to reproduce.

    I will explore more in coming days.

    BTW, if you work in ACTIVE mode (i.e. the device does not go into low power mode), the issue disapear.

    You can try yourself by adding the line:

    retVal = sl_WlanPolicySet(SL_POLICY_PM , SL_ALWAYS_ON_POLICY, NULL,0);

    Shlomi

  • Hi Peter,

    Just to update.

    We are still working on profiling this issue.

    Haven't forgot about it :)

    Shlomi

  • Hi Peter,

    As mentioned, this bug is reported internally. Once it is resolved, the fix would be integrated and released as part of the service pack.

    It involves loosing some bytes on the SPI FIFO in particular cases when the device moves to low power deep sleep mode.

    Meanwhile, I can suggest 2 ways to overcome it:

    1. work in always-on mode as I suggested before
    2. since the device expect to receive 32 bits on the SPI, introducing a random delay between each byte can cause the device to go into deep sleep in the middle of 32 bits words. What you can do is force a consecutive 4 bytes transactions. I tested it myself and it works. The code from spi.c is:

    int spi_Write(Fd_t fd, unsigned char *pBuff, int len)
    {
    int len_to_return = 0;
    int count = 0;

    ASSERT_CS();

    len_to_return = len;

    while (len)
    {
    if ((count%4) == 0)
    {
    SPI_DELAY();
    }
    while (!(UCB0IFG&UCTXIFG));
    UCB0TXBUF = *pBuff;
    while (!(UCB0IFG&UCRXIFG));
    UCB0RXBUF;
    len --;
    pBuff++;
    count++;
    }

    }

    Same for spi_read().

    Regards,

    Shlomi

  • Hi Peter,

    I am closing the thread for now.

    Once the issue is resolved on the device, a new ServicePack including this patch would be released.

    For now, you can use the work arounds suggested.

    For any additional queries related to this one, please open a new thread and add a link to this one for reference.

    Regards,

    Shlomi

  • Hi Shlomi,

    Sorry for the late reply.

    The suggested workaround using "always_on" fixes the problem also in my original scenario with the RTOS and nonblocking server sockets.

    So this is not just "hypothetical" problem, but a real issue in my opinion.

    Best regards,

    Peter

  • Peter,

    I do agree it is a real issue, this is why R&D started looking at it.

    What about the other suggestion where you need to validate 4-bytes transactions to the device (this is much better work around than the always on - unless you do not have any power constraints).

    Shlomi

  • Hi Shlomi,

    my hardware SPI module can only send 2 bytes at most in one transaction.

    To get the 4-bytes transactions without interrupts (and arbitrary delays), the software must disable context switching before a 4-bytes transaction and enable context switching afterwards. This will decrease the software performance and is also critical for "real time" performance, where context switching should be possible at ANY time.

    Therefore I'm preferring the "always_on" workaround, because I do not have power constraints but real time constraints.

    Best regards,

    Peter
  • Peter,

    It is clear and acceptable.

    Thanks for the update.

    Shlomi

  • Hi Shlomi,

    I'm curious to know if this issue was solved in the new SDK 1.2.0, since I'm facing similar symptoms with my cc3200, and I'm limited with the solutions.

    Yuval