This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3120MOD: SL_DEVICE_EVENT_FATAL_DEVICE_ABORT code

Part Number: CC3120MOD
Other Parts Discussed in Thread: CC3120, , UNIFLASH

Hello,

I'm using multiple CC3120 to communicate from our products (7 of them) to an iOS device.
One of the CC3120 is configured as an access point, the iOS device is always connected to it (static IP, no DHCP running). Other devices connect 1 by 1 to the AP and communicate with the iOS device, then disconnect.
I'm investigating multiple issues (iOS device kicked, sniffer shows "no reason" for this deauth frame), crashes from the AP after a couple minutes etc..

Currently I'm investigating a SL_DEVICE_EVENT_FATAL_DEVICE_ABORT event between 2 HTTP requests from the AP to the iOS device.

I'm having a hard time finding what it is, as the documentation lacks info on these events. Structure says
typedef struct
{
    _u32 Id;
    SlDeviceFatalData_u Data;
}SlDeviceFatal_t;


typedef struct
{
    _u32 Code;
    _u32 Value;
} SlDeviceFatalDeviceAssert_t;

But there is no mention whatsoever to what this "code" means, no enum, no note, no comment, no link to a document, nothing (and this is true for all similar structures). The documentation says:
        For pSlDeviceFatal->Id = SL_DEVICE_EVENT_FATAL_DEVICE_ABORT
                Indicates a severe error occured and the device stopped
        Use pSlDeviceFatal->Data.DeviceAssert fields
                    - Code: An idication of the abort type
                    - Value: The abort data

This is not helping. Please can you share information about this "idication" ?
Below some info about the version:

CHIP 822083584

MAC 31.2.1.0.1

PHY 2.2.0.5

NWP 3.3.99.1

ROM 0

Thank you

  • Hi,

    Is there a reason you are using a very old servicepack?

    SP 3.3.99.1 is more than 5 years old (many fixes since then) and more bothering it is not an official one since it has the .99 mark in it.

    would be great if you can use the latest and retest.

    regarding your question, you are right. there is no documentation for it since it is internal error codes for debug (with crashing line number for example but the code is NWP internals so it would tell much). We keep it so you can send us the error codes and we can debug further and understand the root cause.

    But, can you please check with latest SP first?

    Thanks,

    Shlomi

  • Hi,

    It seems that is no ServicePack inside module. Please try with latest ServicePack (is available at CC32xx SDK).

    Jan

  • Hi, sorry for the delayed response but I cannot log in with my other account, I've been struggling for weeks(yesterday was some kind of miracle), had to make a new account, but confirmation link was broken..uh! (hopefully, this will work)

    Anyway, thank you for your replies.
    Shlomi I updated with the SP (thanks to your tool)  I had on my computer, here are the new versions:
    CHIP 822083584
    MAC 31. 2.0.0.0 (<-- seems to be lower than the previous one ?)
    PHY 2.2.0.7
    NWP 3.16.0.1
    ROM 0

    The tests are worse now, the iOS device is being disconnected way more often (disconnecting, reconnecting 4, 5 times in a row sometimes) which leads to app thinking it is completely disconnected, and I still get the same issue as in the original post.

    Jan, not sure what you mean by "no service pack" ? Is it possible to clarify ?

    Thanks

  • Hi,

    NWP version 3.3.99.1 is a ROM version of CC3120MOD firmware. That means the is no service pack uploaded into module.

    You current NWP version 3.16.0.1 is almost two years old (1 Jul 2020). Please test with SP 3.22.0.1. This SP you find at CC32xx SDK (\simplelink_cc32xx_sdk_6_10_00_05\tools\cc32xx_tools\servicepack-cc3x20\).

    Jan

  • Hi Jan,

    Thanks for pointing to the correct SP, I flashed it:
    CHIP 822083584
    MAC 31.2.7.0.0
    PHY 2.2.0.7
    NWP 3.22.0.1
    ROM 0

    Still get the same error between 2 http requests after a while:

    SL_DEVICE_EVENT_FATAL_DEVICE_ABORT
    Code: 0x00000000
    Value: 0x20085B3E

    The code and value have changed though. How can I get more info about this?
    Thanks

  • Hi,

    Please wait for answer from Shlomi. But meanwhile you can try to capture NWP log see SWRU455 at chapter 20.1.

    Jan

  • Hi,
    You can find the logs here: https://pastebin.com/ABzsVMex
    Thanks

  • Hi,

    NWP log need to be at binary format not HEX format.

    Jan

  • Please download it from here, or let me know how to join file to this post if it is possible

    Thanks

  • Hi,

    Now seems NWP log have right format. Please wait for analysing from Shlomi, because I don't have tool for analysing of NWP log.

    Jan

  • Hi,

    this specific address is related to writing a minidump to the serial flash so I believe it is less relevant to your crash (and merely a by-product).

    can you please repeat the logging but this time on another pin (firmware capture and not NWP capture)? 

    the pin number is #60 instead of #62. the procedure is the same (binary mode).

    Regards,

    Shlomi

  • Hi,
    Please, download the file from here

    I captured on the pin #50 (which is called TEST_60 on CC3120MOD).

    The error was:
    SL_DEVICE_EVENT_FATAL_DEVICE_ABORT
    Code: 0x00000000
    Value: 0x20085B3E

    Let me know if you need anything else
    Thank you

  • Hi,

    At least on the MAC firmware side I do not see any crashes or asserts and it behaves as expected.

    The MAC firmware gets the commands from NWP (connect, disconnect, reset, etc) and I can see on the previous NWP log that stations are getting connected and disconnected. You mentioned that the stations are connected 1-by-1 and then disconnect.

    Can you elaborate the procedure exactly? i.e. how many stations you have connected at the same time? are many different stations you have? do you get an event on iOS for disconnection without triggering it from the device?

    Regards,

    Shlomi

  • Ok so, there are 7 devices + 1 iPad.
    1 device is the access point (AP), the iPad connects to it, then 1 device connect to the AP, sends its data to the iPad, disconnects, a 2nd device connects to the AP, sends its data...
    So there are maximum 2 devices connected to the AP at any time (1 iPad + 1 device), never more.
    Meanwhile, the AP asks the ipad (HTTP request) if it is over. iPad replies 403 if not over, 200 when it's done. After a while (couple minutes maybe), between 2 HTTP requests, the error pops 'SL_DEVICE_EVENT_FATAL_DEVICE_ABORT', crashing the AP.

    During these logs, the iPad didn't disconnect multiple times as I mentionned in the first post. It is another separated issue we also need to fix. I will make some other tests and logs if there wasn't anything in this one. Maybe I stopped logging too early ?

    Thanks

  • I see.

    I do see that there are times where more than one station is connected but it should not make a difference as the AP can host up to 4 clients.

    do you also have an air sniffer by any chance?

  • You mean 1 ipad + 2 devices ? I will double check maybe there is some overlap.

    I have an old usb dongle based on an RTL8187L (ALFA Network AWUS036H I think) but I couldn't get anything but broadcast frames on Kali (in monitor mode). Do you have a any known working reference you could share ?

    In the meantime, do you need any more logs when it happens?  I reproduce 100% of the time

    Thanks

  • yes, 2 devices + ipad.

    but, i don't think this is the issue.

    the sniffer is not mandatory for now.

    i will take a second look at the logs and see if I can conclude more.

    Shlomi

  • Hi,

    can you please rerun the test and capture NWP logs again (pin #52) with the attached debug servicepack?

    Shlomisp_3.23.99.0_2.7.0.0_2.2.0.7.bin

  • Hi Shlomi,

    Please find the logs here.
    The error was:
     SL_DEVICE_EVENT_FATAL_DEVICE_ABORT
    Code: 0x00000000
    Value: 0x20085B46

    Thanks

  • Hi,

    Can you try the new one? it has more debug messages.

    Shlomi8461.sp_3.23.99.0_2.7.0.0_2.2.0.7.bin

  • Sure, there are 2 files this time, in the first one, the iPad kept being kicked out of the network (disconnected/reconnected). So i'm sharing it with you as well in case you find something. Here.

    Thanks

  • Hi,

    From the logs it seems that at some point, for not known reason, messages are pushed to a message queue that is supposed to get pulled by the hostap but for some reason I do not see the messages being pulled and thus it crashes eventually.

    Let me see if I can add some more debug messages to try and catch the root cause.

    Regards,

    Shlomi

  • meanwhile, would be good if you can check if this issue exists with no servicepack at all.

    for this you need to clear the servicepack field in Uniflash.

  • Hello,
    Good to hear we're in the right direction.
    Regarding the no SP, wasn't it what Jan D pointed out at the very begining ? There was no SP ?
    Thanks

  • no, you had a very old SP programmed.

    please eliminate completely the SP (and you would be able to see the ROM version when printing it).

    You should see something like:

  • Ok, I now see:
    CHIP 822083584
    MAC 2.0.0.0
    PHY 2.2.0.0
    NWP 3.0.1.4
    ROM 0

    I couldn't reproduce the bug because the app kept failing (I guess one callback was never called so the AP never sent its file to the iPad, iPad was kicked over and over etc..). The iPad couldn't stay logged in long enough for the bug to appear. Here are the logs just in case.

    Thanks

  • so maybe it happens because the SP has many fixes inside and you loose stability if you don't use the fixes.

    The main purpose of testing it is to see if a bug was introduced in one of the fixes along the way but we cannot seem to verify it.

    please find a new SP to test with.

    I am sorry to bug you but this is the fastest way to debug.

    sp_3.23.98.0_2.7.0.0_2.2.0.7.bin

  • No problem, thanks for helping!
    Here are the logs.

    Thanks

  • This one was harsh, lots of disconnections, but it did crash eventually. Here are the logs.

    Thanks

  • Hi,

    from the logs it does seem that the hostap task no longer pull the messages and at some point it crashes. It is not clear why.

    it is hard to debug this location with patches but I intend to get back to it early next week (Sunday as my weekend is Friday-Saturday).

    Regards,

    Shlomi

  • Well noted, have a nice weekend.

  • Hi,

    I went over it again and I have another debug candidate to test with.

    please note that in order for it not to crash, you need to disable mDNS in station mode!!!

    you can do it from Uniflash as follows:

    The servicepack is also attached.

    Regards,

    Shlomi

    sp_3.23.96.0_2.7.0.0_2.2.0.7.bin

  • Hi Shlomi,

    I disabled mDNS in station mode (only) and flashed on all the devices. It crashed again. Here are the logs.

    Thanks

  • sorry, had a minor mistake in the SP.

    see new one 4452.sp_3.23.96.0_2.7.0.0_2.2.0.7.bin

  • Here are the logs.
    Just to be sure, do I need to update all of the devices, or only the AP ? (all were up to date here)

    Thanks

  • Still not much that I can see.

    this one is really hard to debug.

    can you tell please how do you disconnect from the AP? do you do it manually via an API from the application running on each simplelink connected device?

    if so, do you use profiles, auto/fast connect or manually connect to the AP from each device?

  • Ok, so on the station side, they connect using (simplified):

    ipV4.Ip = STAMODE_IP_ADDR + position;
    ipV4.IpMask = APMODE_IP_SUBNET;
    ipV4.IpGateway = APMODE_IP_GATEWAY;
    ipV4.IpDnsServer = APMODE_IP_DNS;
    
    if(sl_NetCfgSet(SL_NETCFG_IPV4_STA_ADDR_MODE, SL_NETCFG_ADDR_STATIC, sizeof(SlNetCfgIpV4Args_t), (unsigned char *)&ipV4))
    {
      return RET_ERROR;
    }
    
    if(sl_Stop(0) < 0)
    {
      return RET_ERROR;
    }
    
    if(sl_Start(NULL, NULL, NULL) < 0)
    {
      return RET_ERROR;
    }
    
    secParams.Key = (_i8 *)wpakey;
    secParams.KeyLen = strlen(wpakey);
    secParams.Type = SL_WLAN_SEC_TYPE_WPA_WPA2;
    sl_WlanConnect((_i8 *)ssid, strlen(ssid), 0, &secParams, 0)

    When their file is sent or if there is an error, they disconnect using (simplified here too):

    HTTPClient_disconnect(hHttp);
    HTTPClient_destroy(hHttp);
    sl_WlanDisconnect();
    sl_Stop(255);

    I don't think there is any profile used.

  • Thanks for the details.

    what about the AP side? what is the high level flow there?

    There are two more tests I would do to see if it can shed some light:

    • you mentioned they connect 1-by-1 although I could sometimes saw 2 stations connected.
      is it possible just for the test to limit the number of station to 2 (the ipad and another station) and test if it makes any difference? you can do it by:

    _u8 max_ap_stations = 2;
    sl_WlanSet(SL_WLAN_CFG_AP_ID, SL_WLAN_AP_OPT_MAX_STATIONS, sizeof(max_ap_stations), (_u8 *)&max_ap_stations);

    • don't send any HTTP data. just connect, ping and disconnect.

    I am trying to simplify the tests (so maybe I can reproduce) and see whether it happens in the simplest setup.

    Regards,

    Shlomi

  • you mentioned they connect 1-by-1 although I could sometimes saw 2 stations connected.

    • Indeed, it was a mistake. There is only 1 station allowed until the AP has finished sending its file, then 2 stations are allowed at the same time. I will try allowing only 2 stations max but I'm not sure I can because the iPad is actually requesting station to connect, and the app dev was outsourced.

    • So I thought the issue was happening when the file was large enough to make the process lasts for a while (at least it did with the firsts SP). But now it seems that even with a very small file (couple kB), it crashes after 2 stations have disconnected. The third station (not counting the iPad) never connects.

    Regarding the code on the AP side:

     if(sl_Start(0, 0, 0) < 0)
      {
        return RET_ERROR;
      }
    
      if(sl_WlanSetMode(ROLE_AP) < 0)
      {
        return RET_ERROR;
      }
    
      ipV4.Ip = APMODE_IP_ADDR;
      ipV4.IpMask = APMODE_IP_SUBNET;
      ipV4.IpGateway = APMODE_IP_GATEWAY;
      ipV4.IpDnsServer = APMODE_IP_DNS;
    
      sl_NetCfgSet(SL_NETCFG_IPV4_AP_ADDR_MODE, SL_NETCFG_ADDR_STATIC, sizeof(SlNetCfgIpV4Args_t), (unsigned char *)&ipV4);
    
      if(sl_Stop(CC3120_STOP_TIMEOUT) < 0)
      {
        return RET_ERROR;
      }
    
      CC3120App_Events_Init();
    
      if(sl_Start(0, 0, 0) < 0)
      {
        return RET_ERROR;
      }
    
      while(!is_ip_acquired == false)
      {
        _SlTaskEntry();
      }
    
      if(sl_NetAppStop(SL_NETAPP_DHCP_SERVER_ID) < 0)
      {
        return RET_ERROR;
      }
    
      if(sl_Stop(CC3120_STOP_TIMEOUT) < 0)
      {
        return RET_ERROR;
      }
    
      CC3120App_Events_Init();
    
      if(sl_Start(0, 0, 0) < 0)
      {
        return RET_ERROR;
      }
      while(is_ip_acquired == false)
      {
        _SlTaskEntry();
      }
    
      /* Configure the Security parameter the AP mode */
      if( pwd && pwd_len)
      {
        SecType = SL_WLAN_SEC_TYPE_WPA_WPA2;
        if(sl_WlanSet(SL_WLAN_CFG_AP_ID, SL_WLAN_AP_OPT_SECURITY_TYPE, 1, (_u8 *)&SecType) < 0)
        {
          return RET_ERROR;
        }
    
        if(sl_WlanSet(SL_WLAN_CFG_AP_ID, SL_WLAN_AP_OPT_PASSWORD, (_u16)pwd_len, (_u8 *)pwd) < 0)
        {
          return RET_ERROR;
        }
      } else 
      {
        SecType = SL_WLAN_SEC_TYPE_OPEN;
        if(sl_WlanSet(SL_WLAN_CFG_AP_ID, SL_WLAN_AP_OPT_SECURITY_TYPE, 1, (_u8 *)&SecType) < 0)
        {
          return RET_ERROR;
        }
      }
      /** Configure the SSID */
      if(sl_WlanSet(SL_WLAN_CFG_AP_ID, SL_WLAN_AP_OPT_SSID, (_u16)ssid_len, (_u8 *)ssid) < 0)
      {
        return RET_ERROR;
      }
    
      if(sl_WlanSet(SL_WLAN_CFG_AP_ID, SL_WLAN_AP_OPT_CHANNEL, 1, &channel) < 0)
      {
        return RET_ERROR;
      }
    
      if(sl_WlanSet(SL_WLAN_CFG_GENERAL_PARAM_ID, SL_WLAN_GENERAL_PARAM_OPT_COUNTRY_CODE, 2, str) < 0)
      {
        return RET_ERROR;
      }
    
      if(sl_WlanSet(SL_WLAN_CFG_GENERAL_PARAM_ID, SL_WLAN_GENERAL_PARAM_OPT_AP_TX_POWER, 1, (_u8 *)&appower) < 0)
      {
        return RET_ERROR;
      }
    
      /** Resart */
      if(sl_Stop(CC3120_STOP_TIMEOUT) < 0)
      {
        return RET_ERROR;
      }
    
      CC3120App_Events_Init();
    
      mode = sl_Start(0, 0, 0);
      if(mode < 0)
      {
        return RET_ERROR;
      }
    
      if(ROLE_AP == mode)
      {
        /* If the device is in AP mode, we need to wait for this event before doing anything */
        while(is_ip_acquired == false)
        {
          _SlTaskEntry();
        }
      }
      else
      {
        return RET_ERROR;
      }
    
      return RET_OK;


    FW dev was also outsourced, so hopefully I didn't forget anything but it seems overly complicated to just start an AP, with all the stops and starts.

    Once the file is sent, the AP just polls the iPad every 10 sec with GET request on /status.

    Is there a way to simply ping ?

    Thanks

  • what if you don't really use the app running on the ipad and simply have it connected to the AP and then the devices connect and ping (since the IP addresses are static and known anyway)?

    pinging from SL device is done by sl_NetAppPing() API (you can see an example on the netapp.h header file of how to use it).

    the point as I mentioned is that as simple the setup is, the better it is since it means that it would be easier to reproduce.

  • Sure I will do that, probably not today, but I'll get back to you when it's done

    Thanks

  • great, thanks.

  • Hi,
    I tried to make a simple example. 2 devices, 1 AP, 1 STA.
    I start the AP, then I start the STA, which connects to the AP and once it's connected, sends a ping the AP (using ping example). It crashes immediately on the station side. logs here.

    static void PingTest()
    {
        SlNetAppPingReport_t report;
        SlNetAppPingCommand_t pingCommand;
    
        pingCommand.Ip = SL_IPV4_VAL(10,0,1,1);      // destination IP address is 10.0.1.1
        pingCommand.PingSize = 150;                   // size of ping, in bytes
        pingCommand.PingIntervalTime = 100;           // delay between pings, in milliseconds
        pingCommand.PingRequestTimeout = 100;        // timeout for every ping in milliseconds
        pingCommand.TotalNumberOfAttempts = 5;       // max number of ping requests. 0 - forever
        pingCommand.Flags = 1;                        // report only when finished
    
        sl_NetAppPing( &pingCommand, SL_AF_INET, &report, pingRes );
    }


    What am I doing wrong ..?
    Thanks

  • not sure why I cannot interpret the log which looks corrupted but from the station log the abort is probably related to not disabling mdns on the station side.

    can you please check you did it?

  • I re-generated the image with mDNS disabled, flashed, and added the following line at the beginning to be sure:
    sl_NetAppSet(SL_NETAPP_MDNS_ID, SL_NETAPP_MDNS_CONT_QUERY_OPT,0 , 0);
    Here are the logs

    Also, on the AP side, is there any reason to get multiple events SL_NETAPP_EVENT_IPV4_ACQUIRED as you can see in the logs above ?
    Thanks

  • maybe the log is not completed but all i can see is just the first station connecting and hence an IP acquire event and then I can see the ping starting API. That's it.

    Have it reproduced with this setup?

    I don't see any issue with this log (but again it is probably cut).

  • Weird, I stopped the log well after the crash, it shouldn't be missing the end ..
    Yes it still happens, I got:
    (14:43:17.776) Send ping
    (14:43:18.076) SL_DEVICE_EVENT_FATAL_DEVICE_ABORT
    (14:43:18.092) Code: 0x00000000
    (14:43:18.105) Value: 0x000D2DBC

    So 300ms after calling ping function with those params

        pingCommand.PingIntervalTime = 100;           // delay between pings, in milliseconds
        pingCommand.PingRequestTimeout = 100;        // timeout for every ping in milliseconds
        pingCommand.TotalNumberOfAttempts = 5;       // max number of ping requests. 0 - forever

  • the 0xD2DBC crash is only because of the mdns.

    this started happening when i created servicepack version 3.23.96.0.

    have you disabled mdns on both, AP and station?

    if so and you still get it, maybe it is better to roll back to servicepack version 3.23.97.0 just to see if it still happens with only ping.