This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3220SF: SDHostSetExpClk() infinite loop using SimpleLink 7.10 (not a problem using SimpleLink 3.40)

Part Number: CC3220SF
Other Parts Discussed in Thread: SYSCONFIG

Hi, we have a fairly mature hardware and firmware platform based on the CC3220SF IC. This hardware and firmware platform has been essentially stable since 2020, running a TI-RTOS-based application developed on SimpleLink CC32XX SDK 3.40. This year I've done the work to bring our firmware up to SimpleLink CC32XX SDK 7.10 using TI-RTOS7 and TIClang. For the most part, everything works great. But I am seeing a sporadic but consistent problem across a diversity of tasks where we make a call to SD_open(CONFIG_SD_0, NULL). We have six read/write/init tasks that call SD_open(...) for various reasons, and any of these tasks can hang when the SD_open(...) call triggers the following infinite loop failure I need help resolving.

Here's the sporadic but commonly repeating pattern that results in an infinite loop in sdhost.c, line 647 and a hard-lock of the device:

<can occur in all of our SDHost-related task functions when SD_open(CONFIG_SD_0, NULL) is made>, e.g.:

72 sdHandle = SD_open(CONFIG_SD_0, NULL);

SDHostCC32XX_open() at SDHostCC32XX.c:550 0x0101D062:

549 /* Save clkPin park state; set to logic '0' during LPDS */
550 pin = PinConfigPin(hwAttrs->clkPin);

initHw() at SDHostCC32XX.c:1,127 0x0101A472:

1119 /*
1120 * Configure the card clock to 400KHz for the card initialization and
1121 * idenitification process. This is done to ensure compatibility with
1122 * older SD cards. Once this is complete, the card clock will be
1123 * reconfigured to the set operating value.
1124 */
1125 MAP_SDHostSetExpClk(hwAttrs->baseAddr, MAP_PRCMPeripheralClockGet(PRCM_SDHOST), SD_ID_FREQ_400KHZ);
1126
1127 MAP_SDHostIntClear(hwAttrs->baseAddr,
1128 SDHOST_INT_CC | SDHOST_INT_TC | DATAERROR | CMDERROR | SDHOST_INT_DMAWR | SDHOST_INT_DMARD |
1129 SDHOST_INT_BRR | SDHOST_INT_BWR);

SDHostSetExpClk() at sdhost.c:647 0x102048D0:

647 while( !(HWREG(ulBase + MMCHS_O_SYSCTL) & 0x2) )
648 {
649
650 }

I've searched E2E for possible resolution of this issue (i.e., site:e2e.ti.com SDHostSetExpClk in Google), but none of the three previously reported issues provided me with useful tips for resolving this problem.

Some additional information related this problem:

a) we never remove or insert the uSD card;
b) we are not using SDFatFS, just opening the uSD card for binary data reads and writes;
c) the problem occurs whether the NWP is enabled or in shutdown;
d) this problem occurs whether power policy is enabled or disabled;
e) when it happens, I can confirm that the device has continuously been in a power-policy disabled mode, so it's not likely to be related to LPDS or hibernation issues others have reported;
f) someone suggested adding PRCMMCUReset(X), but I don't see how it could be relevant if we're operating with power policy disabled continuously and still experiencing this problem;
g) we are only using Samsung uSD cards, either Samsung 64 GB (MB_ME64GA/AM) and Samsung 128 GB (MB_ME128GA/AM); the problem occurs using either uSD capacity
h) we are using SysConfig to configure the CC3220SF in SimpleLink 7.10 builds, whereas the CC3220SF was configured using statically-coded files in our SimpleLink 3.40 builds

For reference, SD is configured as follows in SysConfig under SimpleLink 7.10 using identical settings to the configuration we use in our firmware based on SimpleLink 3.40:

CONFIG_SD_0
Name: CONFIG_SD_0
Use FatFS: unchecked
Clock Rate: 8000000
Interface Type: SD Host
SD Implementation: (grayed-out) SDHostCC32XX
Interrupt Priority: 1 - Highest Priority

PinMux:
SD Host Peripheral: SDHost0
CLK Pin: GP16/7
CMD Pin: GP17/8
DATA Pin: GP15/6
DMA TX: Any(UDMA_CH15)
DMA RX: Any(UDMA_CH14)

Other Dependencies:

DMA On-chip DMA resource allocation
DMA Implementation: (grayed out) UDMACC32xx
DMA Error Function: dmaErrorFxn
Interrupt Priority: 1 - Highest Priority

Power Power Driver
Power Implementation: (grayed out) PowerCC32XX
Enable Policy: unchecked
Policy Function: PowerCC32XX_sleepPolicy
Policy Init Function: PowerCC32XX_initPolicy
Enter LPDS Hook Function: <empty>
Resume LPDS Hook Function: <empty>
Enable Network Wakeup LPDS: checked
Keep Debug Active During LPDS: unchecked
Enable GPIO Wakeup LPDS: unchecked
Enable GPIO Wakeup Shutdown: checked
Wakeup GPIO Source Shutdown: GPIO4
Wakeup GPIO Type Shutdown: LOW_LEVEL
RAM Retention Mask LPDS: SRAM_COL_1, SRAM_COL_2 +2
Latency for LPDS: 20000
IO Retention Shutdown: GRP_1

Park Pins Pin Park States (only the relevant ones)
Pin 6: WEAK_PULL_DOWN_STD
Pin 7: WEAK_PULL_DOWN_STD
Pin 8: WEAK_PULL_DOWN_STD

Any clue why this would suddenly be happening under SimpleLink 7.10, when we've had scores of our devices (same PCB revision, same CC3220SF ICs, same Samsung uSD cards) running without issue for years when using our firmware based on SDK 3.40 and TI-RTOS?

This is a show-stopper of a problem for us, as it happens repeatedly, hard-locking the device during a variety of operational modes, most notably during sampling mode when we require perfect fidelity of various sensor data. Watchdog recovery isn't a practical solution, as it would result in an unacceptable user experience for our end users, with sample data frequently interrupted in contexts where users can't access the device (e.g., underwater and wild terrestrial settings) and costs of deployment are very high. Finally, we can't easily go back to using the firmware based on SimpleLink 3.40, as there are numerous other firmware updates we've made which depend on SimpleLink 7.10 SDK (e.g., MQTT and AWS support), all of which work great. Only SDHost under SimpleLink 7.10 is a problem.

Thanks,
Dave

  •  I'm not aware of similar issue or a fix. It won't be easy to help as the driver is so old.

    It is probably related to the sys config implementation - please compare the driver initializations in 3.40 and in 7.10. 

    Are you using the SK_open in thread safe (e.g. protected by mutex) context?

    Maybe a different timing on 7.10 can cause some kind of race condition.

    Can you provide a simple project so we can try to reproduce this (on TI launchpad)?

  • Hi Kobi, thanks for the reply. I've attached a zip project of sdraw_CC3220SF_tirtos7_ticlang, with a for-loop that repeats the read/write/erase operation a thousand times. I experience this same problem described above when using this the SL7.10-based sdraw project with the added for-loop. 

    As an FYI, our migration from SL3.40 to SL7.10 basically involved me moving our SL3.40-based source code (originally based on out_of_box_CC3220SF for SL3.40) to a hollowed-out version of mqtt_client_over_tls_1_3_CC3220SF_LAUNCHXL_tirtos7_ticlang. But since the sdraw_CC3220SF_tirtos7_ticlang project exhibits the same problem described above, it suggests a larger issues not specific to issues in our application source code but a larger SL7.10 issue. 

    For reference, all tasks with SD_open(...) SD_close(...) logic in our source code always happen in a thread-safe context, typically using a Semaphore_pend(sdActionSemHandle, BIOS_WAIT_FOREVER) and in some cases a Mailbox_pend(mbxHandle, BIOS_WAIT_FOREVER) to bring in data. When other threads ask for SD-related actions, they typically have a Semaphore_pend(sdActionCompleteSemHandle, BWF) that halts the thread's action until the SD-related task signals successful completion with a semaphore post. The problem we're experiencing using SL7.10 occurs whether or not we use that SD-complete sem post logic. And since this happens with the attached sdraw project, it seems doubtful that this is a mutex / thread-safety issue.

    I have compared the SysConfig SD config with our SL3.40 config, and brought those into perfect alignment. The only difference was interrupt priority, which now matches the SL3.40 config exactly. I also set all SysConfig interrupt priorities to lowest, as they were in our SL3.40-based app, but this didn't resolve the issue.

    Another other thoughts you might have are most welcome. I'm worried that there may be some underlying hardware issue that's being exacerbated by something in SL7.10 that hadn't been a problem in SL3.40, but I can't see what it might be.

    Thanks,
    Dave

    sdraw_CC3220SF_LAUNCHXL_tirtos7_ticlang.zip

  • I'll try to reproduce with your code. I'll let you know if i can reproduce and if we have any fix.

  • Hi Kobi,

    Any updates from your side?

    On our end, we’ve tried a ton of new things to test possible solutions and rule out possible problems:

    1. We’ve installed other variants of the TI-driver example sdraw app, including:

    a. sdraw_CC3220SF_LAUNCHXL_tirtos7_ticlang, SL7.10 (attached above; to rule out our application) = same problem as reported above

    b. sdraw_CC3220SF_LAUNCHXL_nortos_ticlang, SL7.10 (TIRTOS7 rule-out) = same problem as reported above

    c. sdraw_CC3220SF_LAUNCHXL_tirtos_ccs, SL6.10.00.05 (SL7.10 rule-out) and TIRTOS7 rule-out) = same problem as reported above

    2. We’ve tried multiple PCB revisions of our hardware, including:

    a. CC3220SF LaunchPad wired into a micro-SD breakout board (the prototype for PCB rev. 1.0 device) running 1(a) = same problem as reported above

    b. PCB rev. 1.0 boards running 1(a) above = same problem as reported above

    c. PCB rev. 2.0 boards running 1(a) above = same problem as reported above

    d. PCB rev. 2.2 boards running 1(a) above = same problem as reported above

    e. A new acoustic-recording PCB revision using the CC3220SF running 1(a) above = same problem as reported above


    3. We've tried this using battery-powered devices as well as bench-top powered devices = same problem as reported above

    4. For extra rule-outs, I also ran 1(a) using CCS on a MacBook Pro, (rule out the Dell XPS9530 used in all tests above) = same problem as reported above

    5. I haven’t ruled out the problem being some sensitivity of the 32GB, 64GB, and 128GB Samsung uSD cards with SL6.10+ we’re using in the above tests

    6. We continue to have no problems when running our original application, which uses TI-RTOS and SimpleLink SDK 3.40 for CC32XX.

    7. I’ve added and tested the effects of adding some usleep delays before and in the middle of the problematic sdhost.c code section, but this hasn’t solved the problem. My next step is to modify that sdhost.c code section described above and use two nested loops to introduce a smarter backoff with clock line reset if there are 10 consecutive failures of the “waiting for clock to settle” loop. 

    If you have any other and/or better ideas, I’d be grateful for your input.

    Thanks,

    Dave

  • Sorry, i didn't get to this yet. Maybe next week.

    What prevents you from using SDK 3.40?

    You can only update the SP and host driver from 7.10.

  • Our new cloud-centric product uses SDK 7.10's network, OTA, MQTT, and HTTP server implementations. The SDK 7.10 cloud-centric network functionality is modeled from mqtt_client_TLS_1_3_tirtos7_ticlang for SDK 7.10. All of this network-related stuff works great, and the stuff I added for AWS support works too. Once I had all that working, I circled back to how I handled time-based indexing of binary data written to the SD card, so we could make cloud-based requests for data from any device. Once I started stress-testing one-second SD writes of sensor data and bulk reads of recorded data for streaming to AWS, this SD_open bug revealed itself to be a problem. Of all the things I expected to encounter moving from SDK 3.40 to SDK 7.10, the loss of SD host functionality wasn't one of them. Trying to back-integrate all this cloud-centric and network functionality into SDK 3.40 seems like a bad idea.

    All hardware devices under test have been flashed with SDK 7.10's current SP and are using the SDK 7.10 host driver. The sdraw SDK 6.10 test I did was a one-off on a separate board; I Uniflashed that one-off board with SL6.10's SP before testing its code.

  • I think it may be easier (at least for me to help) porting the example back to 3.40.

  • Hi Kobi,

    Attached is a .zip file of the SimpleLink CC3220SF SDK 3.40 sdraw project for TI-RTOS and CCS. I've added the loop to perform 1,000 non-delayed iterations of the write/read/compare/erase actions, exactly as it is in the sdraw example for SDK 7.10 TIRTOS7 ticlang I attached above.

    The attached project for SDK 3.40 here always runs to completion, repeatedly completing all 1,000 iterations without any kind of usleep/sleep.UtilsDelay required. This is consistent with the behavior observed in our older application using SDK 3.40 and running on a range of different PCB revisions. We've  successfully streamed hours of data off these devices over WiFi, and I'm confident the SDK 3.40 version of SDHOST in the context of TI-RTOS is solid.

    Here's the final few lines of output from the attached project as it successfully completes all 1,000 iterations:

    -- 


    SD_close complete.

    SD_open complete on loop 1000.

    There are 125042688 total sectors on the SD card.

    The Read/Write sector size is 512 bytes

    The total card capacity is 62521344 KB

    Writing the array...

    Reading the array...

    Data read from SD card matched expected values

    SD_close complete.

    --

    For this SDK 3.40 project, you'll likely need to update the dependency on your end to whatever you use for tirtos_builds_CC3220SF_LAUNCHXL_release_ccs project for SimpleLink CC32XX SDK 3.40. If you don't have a version of this handy, let me know and I can upload a .zip of my build here.

    With theSL7.10-based example I attached above, I can never complete more than 60-70 iterations of the loop before SD_open(...) is called and sdhost.c gets stuck in the infinite loop at SDHostSetExpClk() at sdhost.c:647 ("waiting for clocks to settle" infinite loop condition).

    Interesting that in the SL7.10 example, the problem is less bad if I insert a sleep(1) after the SD_close in the iteration loop. However, cutting that in half with a usleep(500000) results in the usual infinite loop failure described above within 60-70 iterations. Not sure why sleep(1) is more favorable than usleep(500000), but maybe there's some race condition happening under the hood and one second is enough time to avoid the race condition? 

    A further update on the SDK 7.10 version of the sdraw project: I've also tried modifying code in the SDHostCC32XX.c and sdhost.c files to create a series of retry loops, e.g., giving the waiting for clocks-to-settle loop 10 retries attempts, nesting that within a 10 iteration loop that resets the clock before heading into the clock-settle loop, and nesting all of that within a loop that resets PRCM SDHOST if all those retries have failed. That's 300 bites at the apple. The bad news is none of this fixes the infinite loop in sdhost.c:647 in SDK7.10.

    Thanks,
    Dave

    sdraw_CC3220SF_SDK3.40_tirtos_ccs.zip

  • ok. I'll give it a try.

    note that even if I will be able to reproduce the issue I'll probably report it to our driver team (since it doesn't seems like a straightforward fix).

    I wouldn't expect a quick response from them.  

  • OK, thanks for the update. I have a variety of other SD cards arriving tomorrow and will test to see if this is some strange interaction with Samsung SD cards vs. cards by other manufacturers. I'll post results of those tests, whatever the outcome.

  • Hi Kobi,

    Today I ran tests to see whether the SD_open infinite loop in sdhost.c happens with SD cards from different manufacturers. I ran three tests with CC32XX power policy disabled and complete power-downs between tests. Each test attempted 1,000 iterations of the actions between SD_open(...) to SD_close(...) in the sdraw_CC3220SF_LAUNCHXL_nortos_ticlang example for CC32XX SDK 7.10. I used the nortos version to try ruling out RTOS-related scheduling issues and/or race conditions in SDK 7.10. Using the sdraw nortos project for SDK 7.10, all SD cards from all manufacturers resulted in the sdhost.c infinite loop as described above, with failures happening between ~5-30 iterations across all tested SD cards. This is consistent with results observed in with Samsung SD card tests I ran using sdraw_CC3220SF_LAUNCHXL_tirtos7_ticlang posted above.

    In contrast, I observed three consecutive 100% success rates for every card tested when running 1,000 iterations of the SD_open(...) to SD_close(...) actions in sdraw_CC3220SF_LAUNCHXL_tirtos_ccs example for CC32XX SDK 3.40 (the project I attached to this thread yesterday).

    Clearly SDK 3.40 works, and SDK 7.10 does not.

    Here are results across different manufacturer's SD cards with the sdraw examples from SDK 7.10 vs SDK 3.40:

    Sandisk Ultra 128 GB MicroSD XC U1 A1 10:
    3x failures between ~16-29 iterations on sdraw noRTOS SDK 7.10
    3x success with 1,000 iterations completed on sdraw TIRTOS SDK 3.40

    Sandisk Ultra 64 GB MicroSD XC V30 I U3 A2:
    3x failures between ~19-27 iterations on sdraw noRTOS SDK 7.10
    3x success with 1,000 iterations completed on sdraw TIRTOS SDK 3.40

    Sandisk Ultra 128 GB MicroSD XC V30 I U3 A2:
    3x failure between ~14-31 iterations on sdraw noRTOS SDK 7.10
    3x success with 1,000 iterations completed on sdraw TIRTOS SDK 3.40

    Lexar 64 GB MicroSD XC V30 I U3 A1:
    3x failures between ~15-20 iterations on sdraw noRTOS SDK 7.10
    3x success with 1,000 iterations completed on sdraw TIRTOS SDK 3.40

    PNY 64 GB MicroSD XC V30 I U3 A1:
    3x failures between ~5-20 iterations on sdraw noRTOS SDK 7.10
    3x success with 1,000 iterations completed on sdraw TIRTOS SDK 3.40

    Amazon Basics 64 GB MicroSD XC V30 I U3 A2:
    3x failures between ~8-27 iterations on sdraw noRTOS SDK 7.10
    3x success with 1,000 iterations completed on sdraw TIRTOS SDK 3.40

    Assuming you can reproduce and this gets handed off to the driver team, can you give me guidance on what "I wouldn't expect a quick response from them" response means, e.g., days, a week, a month, several months? We now have multiple cloud-based product lines blocked by this, and a long delay on the TI side won't work with the product timelines we have.

    Thanks,
    Dave

  • Currently i can't even reproduce since i don't of the reference boosterpack.

    I guess the timeline can be between weeks to couple of months but we will try to push.

  • Hi Kobi,

    I solved the issue on my own, and I have a proposed solution for TI to consider for subsequent CC32XX SDK updates. I've multiply confirmed that SDK 3.40 version of SDHostCC32XX.c works and that the patch described below for the SDK 7.10 version of SDHostCC32XX.c SDK 7.10 fixes the problem I reported here.

    Using sdraw_CC3220SF_LAUNCHXL_tirtos7_ticlang for SDK 7.10, I imported SD.c/SD.h, sdhost.c/sdhost.h, and SDHostCC32XX.c/SDHostCC32XX.h from the CC32XX SDK 7.10 source tree. After fixing pathing issues to use these local SD-related .c and .h files, I rebuilt the project and confirmed that this build still caused the infinite loop in sdhost.c when using locally-compiled SD-related files from SDK 7.10.

    After checking to make sure there weren't differences in how functions were called in SDHostCC32XX.c in SDK 3.40 and SDK 7.1, I imported SDHostCC32XX.c and SDHostCC32XX.h from the CC32XX SDK 3.40 source tree into sdraw_CC3220SF_LAUNCHXL_tirtos7_ticlang, i.e., I replaced the SDK 7.10 versions of SDHostCC32XX.c and SDHostCC32XX.h with their SDK 3.40 versions. I rebuilt the project with the SDK 3.40 versions of these two files (everything else was from SDK 7.10), and the 1,000 iterations of SD actions completed successfully, as they always did in the sdraw_CC3220SF_LAUNCHXL_tirtos_ccs project imported from CC32XX SDK 3.40.

    I diff'd SDHostCC32XX.c from SDK 3.40 and SDK 7.10, and I noted significant differences in the initHw() function. In the SDK 3.40 version of SDHostCC32XX.c, lines 1022-1024 do a simple configuration of the card clock:

    1022 /* Configure card clock */
    1023 MAP_SDHostSetExpClk(hwAttrs->baseAddr,
    1024     MAP_PRCMPeripheralClockGet(PRCM_SDHOST), hwAttrs->clkRate);

    The initHw() function in SDK 7.10 is way more complex and adds a lot of tinkering with the card clock, lines 1100-1125:

    1100 /* Store current clock configuration */
    1101 clockcfg = HWREG(ARCM_BASE + APPS_RCM_O_MMCHS_CLK_GEN);
    1102 /*
    1103 * Set PLL CLK DIV to 8 in order to get 80KHz clock freuqnecy. The
    1104 * requirement to wake up the card with the initialization stream
    1105 * is 80 clock cycles in 1ms.
    1106 */
    1107 HWREG(ARCM_BASE + APPS_RCM_O_MMCHS_CLK_GEN) |= PLL_CLK_DIV8;
    1108 /* Set clock frequency to 80KHz for initialization stream */
    1109 MAP_SDHostSetExpClk(hwAttrs->baseAddr, MAP_PRCMPeripheralClockGet(PRCM_SDHOST), SD_INIT_FREQ_80KHZ);
    1110 /* Enable SD initialization stream */
    1111 HWREG(hwAttrs->baseAddr + MMCHS_O_CON) |= SD_INIT_STREAM;
    1112 /* Dummy command to send out the 80 clock cycles */
    1113 send_cmd(handle, DUMMY, DUMMY);
    1114 /* End SD initialization stream */
    1115 HWREG(hwAttrs->baseAddr + MMCHS_O_CON) &= ~SD_INIT_STREAM;
    1116 /* Reset clock cfg to initial cfg */
    1117 HWREG(ARCM_BASE + APPS_RCM_O_MMCHS_CLK_GEN) = clockcfg;
    1118
    1119 /*
    1120 * Configure the card clock to 400KHz for the card initialization and
    1121 * idenitification process. This is done to ensure compatibility with
    1122 * older SD cards. Once this is complete, the card clock will be
    1123 * reconfigured to the set operating value.
    1124 */
    1125 MAP_SDHostSetExpClk(hwAttrs->baseAddr, MAP_PRCMPeripheralClockGet(PRCM_SDHOST), SD_ID_FREQ_400KHZ);

    Given that the card clock never settles using the SDK 7.10 version of SDHostCC32XX.c but does using the SDK 3.40 version of SDHostCC32XX.c, I suspected the code in lines 1100-1125 was responsible for the problem described in this thread. I deleted the SDHostCC32XX.c and .h files I'd imported from SDK 3.40, imported clean copies of the SDK 7.10 versions of these files, commented out lines 1100-1125, and added the clock configuration code found on lines 1022-1024 of SDHostCC32XX.c from SDK 3.40. After rebuilding the project with this, the attached sdraw_CC3220SF_SDK7.10_tirtos7_ticlang_sdFix project successfully completes all 1,000 iterations of the SD actions inside the loop, using all SDK 7.10 code with the exception of the changes described in this paragraph..

    I've done multiple runs of this project, and it always completes all 1,000 iterations successfully. I believe this confirms that the clock setup code in SDHostCC32XX.c for SDK 7.10 leads to the bug that's causing the infinite loop in sdhost.c. It's not clear to me how soon after SDK 3.40 this change was made, but it may affect other SDKs starting as soon as CC32XX SDK 4.10.

    I've folded this revised SD driver into our own application, and I've not encountered the infinite loop bug in sdhost.c since, which further confirms the fix.

    As an FYI, there's also significant differences in timing and placement of Power_setConstraint(...) and Power_releaseConstraint(...) code that showed up in the diff of the SDK 3.40 and SDK 7.10 versions of SDHostCC32XX.c. I lightly followed the logic of these Power_*Constraint(...) functions in SDHostCC32XX.c for SDK 7.10, and I'm confident they have nothing to do with the bug reported here. It might be worth revisiting the changes in timing and placement of the Power_*Constraint(...) calls throughout the SDK 7.10 version of the SDHostCC32xx.c driver; it's not clear why changes were made between 3.40 and 7.10.

    I'm marking this as solved.

    Best,
    Dave

    sdraw_CC3220SF_SDK7.10_tirtos7_ticlang_sdFix.zip