CC3120MOD: API hanging. How to debug?

Colin Grant1

Part Number: CC3120MOD
Other Parts Discussed in Thread: CC3120, CC3120BOOST, CC31XXEMUBOOST

We're in the late stages of product development; things are working well connecting to AP and transferring files to a server.

However in extended tests, I find that the API is eventually hanging for simple operations like getting socket options. Using an RTOS, the rest of the system remains fully operational.

The firmware details are:

The chip number is   : 0x31000000
The FW version is    : 2.0.0.0
The NwP version is   : 3.14.0.0
Phy version is       : 2.2.0.7

Communication is over SPI at 1MHz

An example of where the API call is not returning

/// \brief get connection status
/// \return true for connected, false for error or disconnected
bool cc3120_socket::isConnected() const {
    SWO_in_out z(__FUNCTION__);
    int err = 0;
    SlSocklen_t errSize = sizeof(err);
    const int16_t ret = sl_GetSockOpt(currentSockID, SL_SOL_SOCKET, SL_SO_KEEPALIVE, (void *) &err, &errSize);
    if (ret == 0) {
        return true;
    } else {
        return false;
    }
}

The SWO class is sending debug out SWO to show entry and exit from the function (using the destructor to ensure 'out' is messaged regardless of the code flow), which is how I know that sl_GetSockOpt has not returned.

It isn't always that API function; it seems to be true of several API calls on different runs. The failure can happen after 6 minutes (rare) an hour or six hours.

Is there some API debug that can be enabled to help track the issue down?

We've suffered before from an API mismatch to the module firmware before, so I've mentioned that above and I think it should be right now (at least we're not getting receive buffer overflows any more - with guard sections in place to check now)

There's a regular maximum stack usage for each thread that has lots of headroom, so no stack crash.

The SPI track distance from ARM processor to the CC3120MOD is short and the hardware has been through EMC testing, so I don't think there's an issue with corruption there.

Any suggests on how to determine what has upset the API?

over 5 years ago

0 Jesu over 5 years ago

TI__Mastermind 22935 points

Hi Colin,

What SP and SDK version are you using? It is harder to check with FW, PHY version numbers etc. I will likely need you to obtain an NWP log to debug this but I want to make sure you are using the latest SW first as it contains our latest bug fixes.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

From simplelink.h the SL_DRIVER_VERSION is "3.0.1.60"

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

From simplelink.h the SL_DRIVER_VERSION is "3.0.1.60"

SP stands for service pack and SDK is software development kit. I did some searching and I see you took the service pack from the cc32xx v3.40 SDK. I don't think this should cause you problems but try updating to the latest service pack anyway in case it is. You can find it in either the latest CC32xx sdk or latest wifi plugin SDK.

Let me know if the problem persist after updating to SP sp_3.16.0.1_2.0.0.0_2.2.0.7.

Also in regards to debug what MCU are you using? If you are using a TI based MCU maybe I can suggest some TI tools that can help you find your issue.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

I've updated the CC3120MOD service pack and the host API code to 3.16.0.1

The host code changes from 3.0.1.60 to 3.0.1.65 are 90% whitespace and the rest are something to do with roaming that doesn't affect this project

The host micro is K64F Cortex M4 that uses the mbed v2 OS

I've run the same test where every 20s a small number (4) of files are sent to a server (just a python flask script that has been 100% reliable) and the same lock up has happened (within the isConnected function). The other threads are fully operational.

At a colleague's suggestion, in addition to showing the maximum stack usage by each thread, the code now shows the thread states; the thread talking to the simplelink API and the API thread itself are both waiting for semaphore.

I have added debug to semaphore and mutex creation and deletion, so I know that there isn't resource exhaustion there.

With debug showing in/out for functions, I know that the last call was the isConnected function shown above.

What I cannot see is what the API is up to. So back to the original question, is there any extra debug I can compile in as part of the API that would help to track down where the problem lies?

0 Colin Grant1 over 5 years ago in reply to Colin Grant1

Intellectual 260 points

Since the threads are waiting for semaphore, I've added SWO output (very low impact) to trace the semaphore and mutex usage.

When the use of the API comes to a halt the stack trace is showing that a semaphore is pending

Thread #1 57005 (Suspended : Breakpoint)    
    SemaphorePP_pend() at cc3120_semaphore.cpp:243 0x5e12    
    _SlDrvMsgReadCmdCtx() at driver.c:1,781 0x60cc2    
    _SlDrvCmdOp() at driver.c:866 0x5f7c2    
    sl_GetSockOpt() at sl_socket.c:997 0x658de    
    cc3120_socket::isConnected() at cc3120_socket.cpp:687 0x8dc6    
    cc3120HTTPSocket::is_connected() at cc3120HTTPSocket.h:59 0x14e84

The code is looping round waiting for a semaphore (on 10s timeout) that never becomes available because the IRQ isn't triggered from the module (note I have tried completely removing the normal disabling of the IRQ so I know that it isn't that the interrupt is disabled)

What might cause the CC3120MOD to not raise the IRQ in response to the simple sl_GetSockOpt call?

0 Colin Grant1 over 5 years ago in reply to Colin Grant1

Intellectual 260 points

Could the CC3120MOD debug output port provide information?

The hardware has the module's debug pins brought out so I could connect there if I knew how to provoke debug output.

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

The host code changes from 3.0.1.60 to 3.0.1.65 are 90% whitespace and the rest are something to do with roaming that doesn't affect this project

The service pack is a patch file applied to the NWP which is ROM based and we do not provide the code for. It does not imply there would be changes to the host driver.

I see you already configured stack usage is okay.

What I cannot see is what the API is up to. So back to the original question, is there any extra debug I can compile in as part of the API that would help to track down where the problem lies?

You have the host driver source code you should be able to debug what is going on in there. Anything outside of that is part of a library file or NWP code. Since you are using a non-TI IDE I cannot suggest the debug project for tirtos.

What might cause the CC3120MOD to not raise the IRQ in response to the simple sl_GetSockOpt call?

It could be that the host driver is waiting to receive a response from the NWP but I'm not sure. Could you try stepping through the host driver code and pinpointing exactly sl_GetSockOpt blocks? It could be in one of the driver.c or device.c functions that get called from sl_socket.c.

Could the CC3120MOD debug output port provide information?

Could you be more specific? Where did you see this?

Also what APIs exactly are you seeing that fail? Is it just socket related? Please give more detail here.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

> Could the CC3120MOD debug output port provide information?

My mistake, the hardware engineer brought the UART lines out and I thought he had told me they were for debug but I guess that they are just the UART interface (rather than SPI)

> Also what APIs exactly are you seeing that fail? Is it just socket related? Please give more detail here.

I have been concentrating on sl_GetSockOpt to test if the connection is still alive as the code checks this before proceeding further.

As a separate case, the code tries to resync if necessary using sl_Stop then sl_Start. One of those two has not returned in several tests.

I'll infer from your response that the host API hanging isn't something that has been seen by other customers.

I have been debugging as best I can; it can take a couple of hours to reproduce the issue which slows the progress down. So far I have changed the code to recognise a repeated semaphore wait (10s timeout) and return an error value that triggers the system needing SL_IS_RESTART_NEEDED and that looks like a promising route to recovering from the situation (leading to a hardware reset of the module plus sorting out the resources that have been allocate before starting from scratch).

Of course it would be better to not encounter the issue which is where I was hoping for extra debug compilation options or the prior experience of someone else that encountering the issue.

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

I see how it could be a challenge to catch the case before failure since it's happening randomly. So if I understand correctly, some simplelink API calls are failing sporadically? It's not specifically tied to a certain sl_API call?

Have you tried checking the host driver's async event handlers? Maybe you are getting a FATAL_ERROR or something else that could provide clues. You can also check _SlDrvHandleFatalError in the host driver to see if a failure is recognized but for some reason the handler is not getting invoked. We may have to look into some host driver objects and capture some NWP logs if the event handlers don't help.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

If I buy a TI MSP board and the TI CC3120 (BOOST?) board then I should be able to replicate use the CC3120 in a similar way to the custom hardware. That should help to show if, with the known good hardware/software combination whether or not the way our code uses the API causes issues.

Would MSP-EXP432P401R and CC3120BOOST be the right pair to choose? (There doesn't seem to be a current CC3120MOD dev board available now)

Colin

0 Colin Grant1 over 5 years ago in reply to Colin Grant1

Intellectual 260 points

Ah the

CC3120MOD-EMU-MSP432:
SimpleLink Wi-Fi CC3120MOD BoosterPack, Advanced Emulation BoosterPack and MSP-EXP432P401R LaunchPad Bundle

*is* available direct from TI. It just isn't available from authorised distributors.

Is the CC3120MOD-EMU-MSP432:a better choice to buy?

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

If I buy a TI MSP board and the TI CC3120 (BOOST?) board then I should be able to replicate use the CC3120 in a similar way to the custom hardware. That should help to show if, with the known good hardware/software combination whether or not the way our code uses the API causes issues.

This could definitely help. And if worse case you still reproduce the problem at least you have a setup that brings us 1 step closer since I have access to the same HW.

For the CC3120 BoosterPack it does not matter if you choose MOD or nonMOD device unless you have a specific reason to choose one - they will both work the same way from a SW perspective. MSP432P401R is directly compatible with the 3120 boosterpack and you should have no problem quickly ramping up after downloading our Wi-Fi plugin.

A CC31XXEMUBOOST will also allow you to flash the service pack to the cc3120 and collect NWP logs if necessary. If you already have your own equipment that can do this then the CC31XXEMUBOOST is not necessary.

Jesu

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

It's been a while since I've heard from you so I'm assuming you fixed your problem. If you have a new problem feel free to create a new thread.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

I was waiting for the CC3120MOD dev boards to arrive, which they have now.

In the meantime I've loaded the SWO debug output (marvellous as SWO has negligible impact on performance) and logged a sequence that now repeatedly leads to SL_DEVICE_EVENT_FATAL_SYNC_LOSS

I've chopped out steps that weren't needed but it is now a long way from the original issue but it would be good to understand how the code is misusing the API.

The code interacts with the module, reading the firmware details. Then, to start from a known point, the code resets the module then brings it out of hibernate (waiting a while) and gets an IRQ (the module's way of indicating it is ready)

Hopefully this snippet of SPI traffic makes sense:

RESET the CC3120 in
wifi_init in
..Reset
..Hibernate
CS=1
..wait 200ms
..Out of reset
wifi_init out
RESET the CC3120 out
Wait 2s in

Wait 2s out
Turn on the CC3120 in
wifi_PowerOn in
..out of Hibernate
..wait 300ms
wifi_PowerOn out
Turn on the CC3120 out
Wait 2s in
IRQ
Semaphore 0 release
Semaphore 0 locked ret:1
CS=0
SPItx:x65x87x78x56
CS=1
CS=0
SPIrx:x00x00x00x00xBCxDCxCDxAB
CS=1
CS=0
SPIrx:x08x00x14x00
CS=1
CS=0
SPIrx:x28x00x04x06
CS=1
CS=0
SPIrx:x00x00x00x00
CS=1
CS=0
SPIrx:x11x11x11x11
CS=1
CS=0
SPIrx:x00x00x00x31
CS=1
CS=0

After that the SPI just keeps receiving x00x00x00x00 until a modified loop returns an error code to trigger SL_DEVICE_EVENT_FATAL_SYNC_LOSS

I need to check the sync loss procedure.

Colin

0 Colin Grant1 over 5 years ago in reply to Colin Grant1

Intellectual 260 points

Some progress at making a sticking plaster to cover up the sync fail between the host and the module.

With the code changed to recognise when the semaphore is being waited on for a minute and then returns an error value that triggers the SL_DEVICE_EVENT_FATAL_SYNC_LOSS.

All the API calls fail after that and the code now responds by calling sl_Stop(200) to clear out the semaphore/mutex resources, carries out a full reset of the module complete with hibernate bring up before going on to sl_Start to allocate resources and trying again from scratch.

I'm running a test now to see if this lets the use of the module limp along.

The real question is why the sync loss happens.

The code has been successfully making a connection and uploading data. A snippet of the failing is pasted below. It starts by checking the connection status then the code tries to make the connection blocking, which never completes:

isConnected in
SPItx:x21x43x34x12
SPItx:x09x94x04x00
SPItx:x05x01x09x04
Semaphore 1 wait, timeout 10000
IRQ
Semaphore 1 release
Semaphore 1 locked ret:1
SPItx:x65x87x78x56
SPIrx:xBExDCxCDxABxBDxDCxCDxAB
SPIrx:x09x14x10x00
SPIrx:x28x01xB4x05x1Fx00x20x00
SPIrx:x00x00x05x04
SPIrx:x01x00x00x00
Semaphore 1 wait, timeout 0
Semaphore 1 locked ret:0
isConnected out

set_blocking in
SPItx:x21x43x34x12
SPItx:x08x94x08x00
SPItx:x05x01x18x04
SPItx:x01x00x00x00
Semaphore 1 wait, timeout 10000
IRQ
Semaphore 1 release
Semaphore 1 locked ret:1
SPItx:x65x87x78x56
SPIrx:x09x14x10x00x09x14x10x00
SPIrx:x09x14x10x00
SPIrx:x09x14x10x00
SPIrx:x09x14x10x00
SPIrx:x09x14x10x00

The SPI then just receives thousands (about 8,500) of the same data x09x14x10x00

The code kicks the reset sequence above and tries to recover which works.

Any clues in there as to why the micro to module sync might be lost?

(Nothing else uses the SPI bus and it doesn't get restarted for the communications to work again after the steps listed above, so I don't think the SPI is faulty)

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

I'm not used to looking at things at the SPI layer. Generally I can say that SL_DEVICE_EVENT_FATAL_SYNC_LOSS could be caused for 2 reasons:

Something wrong with the SPI lines from the MCU you're using to the CC3120 (maybe SPI clock line is too long or data lines are getting interference)
Your SW does not allow the SL task the appropriate time to run

One could be tested easily by running one of our demo examples that are known to work and you already have known working CC3120.

For two I've seen sync loss errors occur when customers set the priority of application specific tasks higher than SL task priority. Generally you want SL task to be higher priority to avoid losing synchronization to the NWP.

Out of curiosity, do you get any problems when running one of our demo examples?

Jesu

0 Colin Grant1 over 5 years ago in reply to Colin Grant1

Intellectual 260 points

When the SPI returns with the same data 8,500 times, is there anything I can send to the module to work out what state it is in?

0 Colin Grant1 over 5 years ago in reply to Colin Grant1

Intellectual 260 points

I peppered sl_Stop with SWO debug and caught a hang with this loop

#ifdef SL_PLATFORM_MULTI_THREADED
    /* Do not continue until all sync object deleted (in relevant context) */
    while (g_pCB->NumOfDeletedSyncObj < MAX_CONCURRENT_ACTIONS)
    {
        // CHANGED
        // usleep(100000);
        API_OS_SLEEP(100); // wait 100ms
    SWO_PrintStr_C(AT);// DEBUG output with file and line number
    }
#endif

At least I have something to trace through the code now

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Colin, I'm not sure what you mean by state. Our host driver is generally stateless.

If the host driver is getting stuck in that while loop it means the host driver has not released all the sync pool objects. It could be getting stuck in a SL API or maybe the host driver fails to release it and this could be a bug.

Could you provide a list of the simplelink APIs your application is using?

Also, how many threads do you have running?

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

Thanks for sticking with me on this. I'll answer in portions..

> I'm not sure what you mean by state. Our host driver is generally stateless.

Sorry not to be clear. I was meaning the state of the CC3120 module.

The host side sends a message and then tries to read the response. From reading the host code I can see that _SlDrvRxHdrRead is looking for a sync response from the module, repeatedly reading 4 bytes. From logging the SPI traffic, I can see that the same 4 bytes are being received from the module, more than 8,500 times. To me, that says that the module is in a state that doesn't match the host driver's expectation, so just reading and reading isn't going to change anything (it doesn't because I've seen it hang forever there)

So, I was wondering if the application code could use the host API to send some other message to the module to work out what it was expecting (i.e. what state the module is in). As I've changed the code to break out of the eternal loop, that would be an easy test for me to run.

I mention this because this is the root cause of the issue I'm seeing. Everything else I've been trying is to recover from that situation i.e. adding code to break out of that eternal receive loop to allow the application code to try and use the host driver to tidy up and then to reset the module and start afresh.

Obviously I'd like to work out the root cause rather than put the sticking plaster over the issue.

It could be an electrical noise issue. The PCB has the application controller close to the module; just a couple of inches with nothing much close by and nothing over the tracks. The SPI is running at 1 MHz, so quite slowly. The PCB has been through EMC testing. There aren't any test points on the SPI or interrupt (annoying for me but less opportunities for noise ingress) so we are going to solder on some short wires to the module connections and connect an analogue scope to look for observable issues (the theory is I can change the code to trigger a GPIO when the issue arises, so the scope trace will hopefully pinpoint the most recent transmissions to the module)

Colin

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

> Also, how many threads do you have running?

Lots. This is a 120 MHz processor running mbed 2 OS. The thread for the main function of the product has the highest priority, so no UI or network issues can stop the fundamental operation.

The threads are:

A main thread that does nothing but launch the other threads then loops in a blocked state.

An intrinsic mbed thread handles timer events at a thread level (rather than an ISR callback timer) for very slow rate timers.

A below normal priority thread handles writing logs to an SD card. This carries on working even in problem scenarios, evidenced by the output also being tee'd to the serial port which carries on showing normal

The main product thread runs at a high priority at 100Hz, basically a few control loops that take negligible time as a handful of readings are made and a handful of GPIOs are set.

A UI thread that runs at 2 Hz handling a tiny LCD screen

A key handling thread that polls keys (robust to EMC) at 10ms. Again very quick read of 4 GPIOs, debounce and send a signal to the UI.

The last two threads are for the host driver (sl_Task) and a networking task that every 20s tries to upload 3 or 4 files.

We have max stack tracing output every 10s, so I know that all the threads have ample headroom. My tracing has shown very light processor usage (using SWO again to track each thread entry/exit and observing the real time thread traces in a system viewer). At one point (a year ago) there was a code mistake that had 10,000 interrupts a second and the system was still running fine (operation, UI and logging).

So, while the host driver thread is not the highest priority, I feel certain that it is not being held up from servicing the CC3120 module interrupt. Also, as you mentioned, the host API looks to be stateless (mostly, apart from when it sets a status following an error) so speed of response or jitter shouldn't be a problem.

Colin

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Obviously I'd like to work out the root cause rather than put the sticking plaster over the issue.

I think we should focus on finding the root cause here. Changing the host driver to work around the issue you're having is not a good solution and also makes it difficult for you to update the host driver in the future.

Before investigating your setup further, have you had any luck reproducing these issues on the MSP432 launchpad + CC3120 boost? If you have give me the steps to reproduce so I can debug here.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

I've reduced the issue to just getting the device version every second. I expect this is a trivial thing for the module to reply to, so it heavily points at the hardware having noise on the SPI bus (IMHO) regardless of all the EMC testing it had. We will solder wires on (there aren't any test points on the board for the SPI bus) and capture what is happening. Fingers crossed that the scope's analogue trace is long enough to catch a problem (I'll trigger it from software using a GPIO)

Colin

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Sounds good. If you believe you are having HW issues I can redirect this query to the HW team and they will be able to assist you better.

Let me know how you want to proceed.

Good luck.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

Tracked down what seemed to be a hardware problem to the Tek scope High Res mode being a sampling mode that was averaging and misleading us. So the SPI bus connection looks good.

Now I have a scope and a logic analyser attached to the SPI bus. Using a GPIO to trigger when the software thinks there is an issue. The software change is in _SlDrvRxHdrRead where it repeatedly reads 4 bytes waiting for a change. Once that reaches 60 reads (an arbitrary number based on looking at some captures, 60 receives is enough for the module to respond but not so long after that a sync error will have overfilled the scope and analyser capture buffer).

Captured a failed transfer. the transmits from host to CC3120 are

x21x43x34x12

x01 5ms x94x04x00

x02x01x06x00

4.2ms

x53x43x34x12

The CC3120 responds every time with xBFxDCxCDxAB (over 200ms of that being received)

So the only oddity there was a delay between the x01 being sent and the remaining x94x04x00, probably due to another task cutting in for 5ms. The trace of the bytes is fine and the logic analyser didn't report an issue, but maybe the CC3120 gets upset by the pause and times out internally? I'll try captured some more to see if there is a common theme.

Do you know of any CC3120 module internal timeouts? (5ms is quite a long time but then again it is a clocked interface)

Colin

0 Colin Grant1 over 5 years ago in reply to Colin Grant1

Intellectual 260 points

I can confirm that the 5ms delay within 4 bytes of data being transferred (all within CS) seems to coincide with the read not updating.

I'm investigating further. It could be coincidence

Colin

0 Colin Grant1 over 5 years ago in reply to Colin Grant1

Intellectual 260 points

Hi,

I've changed the SPI driver to force a 6ms delay between the first and second byte being sent to the CC3120 module with the result that the first access always fails. So I'm working on the basis that the CC3120 module has a hidden timeout internally; changing the SPI driver to be interrupt driven to avoid any delays happening.

It would be really useful if the internal workings of the CC3120 module SPI bus could be confirmed as having a timeout and what that timeout is (I'm guessing at 5ms)

Colin

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

1. To make sure I understand, regarding the SPI communication between your host and the CC3120, you want to understand if there is a timeout mechanism built into the CC3120 in case the host does not respond in time?

2. And if so, what is that timeout so you may adjust your SPI driver to meet the SPI requirements of the CC3120?

If this is the case I will have to check with R&D team since they are the ones that would have this information. Please answer both questions so I can better assist you.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

Yes, I want to know if the CC3120 module has a timeout if the 4 bytes being sent to it (for example a sync pattern or the command bytes) contain a 5.1ms delay between bytes, leading to the module not responding when being read for the response.

Yes, I would like to know what the timeout is so I know what timing the SPI driver has to meet.

If there are other timeouts in the protocol then that is important information too.

Colin

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

Understood. I'm not sure if RnD team will have an answer for me but I can try. Just as a sanity check, do you get any issues if the host driver task has the highest priority? This would at least confirm there is a scheduling issue.

Also, you have probably seen this already but in case you haven't check section 4.13.5 in the CC3120 datasheet since it talks about host SPI timing requirements.

Have a great weekend. Let's sync again next week.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

Yes I increased the sl_Task thread to run as the highest priority thread in the system. Timers and interrupts still happen, so occasionally there is a 5ms pause in the SPI send.

The SPI interface description basically says max 20MHz and talks about duty cycle for the clock, setup and hold times. Nothing about the inter byte timing.

Have a good weekend.

Colin

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

I'm meeting with RnD tomorrow. I will ask what is the expected SPI response time on the CC3120 if any. Let me know if you have any last minute questions.

Do you ever get any fatal error events in the SimpleLinkFatalErrorEventHandler?

I would imagine you would be getting one of the following events if you are having SPI problems:

SL_DEVICE_EVENT_FATAL_CMD_TIMEOUT
SL_DEVICE_EVENT_FATAL_SYNC_LOSS
SL_DEVICE_EVENT_FATAL_NO_CMD_ACK
SL_DEVICE_EVENT_FATAL_DRIVER_ABORT
SL_DEVICE_EVENT_FATAL_DEVICE_ABORT

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

There are no fatal error events because _SlDrvRxHdrRead without slcb_GetTimestamp never returns; the read just repeats and repeats the same bytes forever.

'response time' sounds like how quickly the CC3120 module responds to a command; this isn't the issue. The issue is whether or not the CC3120 module times out in some way if there is a gap between the bytes being sent to it. So sending x21 x43 x34 x12 sync word with a 5 millisecond pause between the x21 and the x43.

My testing using an interrupt driven SPI bus is sending reliably with no delays between the bytes and I've not encountered an issue since.....yet. However, if there is one undocumented timeout in the protocol (that's my impression) then there might be another. I'd like to confirm my findings about the inter byte delay causing an issue and to hear what other situations timing is critical too.

Colin

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

Okay, yea I had the wrong idea then. It seems like your question is more in regards to timing expectations during a sequence (e.g. what happens if there is an interruption in the byte transmission during the exchange and based on what happens what delay in the sequence is required to cause this?).

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

Exactly. The bytes being sent all the right ones, wrapped up nicely inside chip select, but there is a pause between bytes which seems to make the protocol sad. SPI is a synchronous serial interface so as far as it goes electrically, I can go off and make a cup of tea between bytes and it should be valid. So there's probably something in the CC3120 module layers that has expectations on the timing and resets itself to waiting for a sync pattern again. I'd like to confirm that. I'd also like to know if there are any other timing gotchas waiting to be uncovered.

Colin

0 Jesu over 5 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

So if I had to guess, when the interrupt occurs the SPI clock stops because otherwise you would see invalid data on the sync word and fail. Some colleagues got back to me today and informed me there is a tight <5ms timing requirement on the CC3120. I don't have exact numbers on that but at least that confirms the suspicions we've been having.

EDIT: Also, we are not aware of any other "timing gotchas". The rest of the timing considerations are on a much longer interval and don't apply to something in the middle of sending a command.

Jesu

0 Colin Grant1 over 5 years ago in reply to Jesu

Intellectual 260 points

Hi Jesu,

> So if I had to guess, when the interrupt occurs the SPI clock stops because otherwise you would see invalid data on the sync word and fail

The SPI being written out was valid and the new interrupt driven SPI transmit is 100% valid - confirmed on logic analyser for many many hours now. The microcontroller is a Cortex M4 with a SPI controller that completely handles a single byte (other SPI peripheral instances have a multi byte FIFO but as luck would have it the CC3120 is attached to the one that doesn't) so the clock never pauses within a byte. No problems there.

> Some colleagues got back to me today and informed me there is a tight <5ms timing requirement on the CC3120. I don't have exact numbers on that but at least that confirms the suspicions we've been having

I'm happy to work with 5ms as a ballpark timing but I really need to know where the timing requirement applies. Is it only on the 4 bytes of sync words (there are two varieties of sync words, one for writing to the CC3120 and another for reading back the results)? Is there a timing requirement on how quickly the commands are written? The commands I've captured are 3 sets of 4 bytes written to the CC3120 (each wrapped in chip select). Some commands have additional data being sent too (variable length). Is there perhaps a 5ms max requirement between those separate transfers of 4 bytes?

Thanks for confirming the timing requirement. I hope you can get a few more details and suggest that tight timing requirements are fed back in to the documentation or errata.

Thanks

Colin

0 Jesu over 4 years ago in reply to Colin Grant1

TI__Mastermind 22935 points

Hi Colin,

No problem. I can ask for more details but could you provide a logic analyzer waveform for both cases (good and bad)? That way when I communicate this to the relevant engineers they have all the detail they need.

Thanks

Jesu

Wi-Fi

Wi-Fi forum

CC3120MOD: API hanging. How to debug?