This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3200: EAP reauthentication fails

Part Number: CC3200

Hi.

We had seen this issue in the past in another customer's setup and now we are re-investigating this as it happened to another setup. 

It appears that our device will loose connectivity after about 500'.

Here is what we saw:
- The device after 500' does an eap re-authentication
- This succeeds according to the radius logs, plus there are no retries, so apparently all parties involved (radius, AP, device) know that this succeeded.
- The device is still connected to the AP. This is depicted in both the AP's client status and the device's wifi status
- There is no TCP/IP communication. The device knows that websocket is not connected and tries to connect to ws. This is shown in the device's statistics
- After 600 connection retries the device will trigger the wifi to reconnect and everything will get back to normal. The same thing will happen if the AP disconnects the device.
Although we could make the retries shorter and avoid the long web socket disconnect periods we need TIs help to determine why the re-authentication does not work although it succeeds.
The 8 hours seem to come from this:


3.5
. Exported and Calculated Key Lifetimes

   The following mechanisms are available for communicating the lifetime
   of keying material between the EAP peer, server, and authenticator:

      AAA protocols  (backend authentication server and authenticator)
      Lower-layer mechanisms (authenticator and peer)
      EAP method-specific negotiation (peer and server)

   Where the EAP method does not support the negotiation of the lifetime
   of exported EAP keying material, and a key lifetime negotiation
   mechanism is not provided by the lower layer, it is possible that
   there will not be a way for the peer to learn the lifetime of keying
   material.  This can leave the peer uncertain of how long the
   authenticator will maintain keying material within the key cache.  In
   this case the lifetime of keying material can be managed as a system
   parameter on the peer and authenticator; a default lifetime of 8
   hours is RECOMMENDED.

Some questions on our side:
- Does TI triggers the EAP reauthentication
- If yes is there any way to specify when this will happen
- Does it handle the reauthentication regardless of who triggers it
- Are the new keys derived from the reauthentication being used
- Is there some way to know that this happened

Thank you




  • Hi George,

    Firstly your questions:

    - Does TI triggers the EAP reauthentication
    Yes we can if our keys expire.
    - If yes is there any way to specify when this will happen
    No, there is set timeout for this. I will get back to you on the exact timeout value.
    - Does it handle the reauthentication regardless of who triggers it
    Yes, The AP can also trigger the Reauth.
    - Are the new keys derived from the reauthentication being used
    New keys are derived during the reauth process.
    - Is there some way to know that this happened
    No not currently.

    From your initial post, it seems that your issue isn't with the reauth event, but with your inability to reach your server afterwards. Do you have a sniffer logs of this happening? Is there anyway on the AP to see if those packets are getting out of the local network?

    Best Regards,

    Vince 

  • Vince,

    please find attached a sniffer capture over wifi. This shows single sided communication. At the same time we had another device which does not exhibit the issue. This is still under observation though as although it had passed the 500' maybe the timeout is different. As this device is at about 600' this shows that it is not AP triggered.

    After the EAP reauthentication there is no TCP/IP communication, ping also stops at that moment which is why I consider it an EAP issue. Even if DHCP expires after that there is no renewal (at the moment this happens DHCP is still valid).

    The good news are that I have one more clue, there is a SL_DEVICE_FATAL_ERROR_EVENT created. Unfortunately I was not logging the full event info but we should have it after about 8h.

    Best regards

    GZeap_tls_fail.zip

  • Hi.

    It turns out the SL_DEVICE_FATAL_ERROR_EVENT was not triggered at the EAP reauthentication. It was due to the DHCP failing after wards (SL_ERR_SENDER_DHCP_CLIENT).

    Here is the SlDeviceEvent (in decimal and little endianess):

    1 0 0 0 156 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    So at the time of EAP reauth the process seems to succeed but there is no TCP/IP communication.

    My suspicion is new keys not being applied correctly. Have you been able to dig into this any further?

    Thanks

    GZ

  • George,

    Can you convert your slDeviceEvent and get just the sl_DeviceReport Struct? You should get a int8 for Status and a Sender corrosponding to who sent the error:

    /* Send types */
    typedef enum
    {
    SL_ERR_SENDER_HEALTH_MON,
    SL_ERR_SENDER_CLI_UART,
    SL_ERR_SENDER_SUPPLICANT,
    SL_ERR_SENDER_NETWORK_STACK,
    SL_ERR_SENDER_WLAN_DRV_IF,
    SL_ERR_SENDER_WILINK,
    SL_ERR_SENDER_INIT_APP,
    SL_ERR_SENDER_NETX,
    SL_ERR_SENDER_HOST_APD,
    SL_ERR_SENDER_MDNS,
    SL_ERR_SENDER_HTTP_SERVER,
    SL_ERR_SENDER_DHCP_SERVER,
    SL_ERR_SENDER_DHCP_CLIENT,
    SL_ERR_DISPATCHER,
    SL_ERR_NUM_SENDER_LAST=0xFF
    }SlErrorSender_e;

    I'm still looking at how we can debug further. Are you able to get NWP logs? Your conclusion is plausible, as if the EAP rekey fails, or uses an old key, DHCP would fail. I might be able to create something that tells us if we are reusing the old key, but the info would be printed in NWP Logs.

    Best Regards,

    Vince

  • Also,

    The sniffer doesn't appear to have much useable data. The data is encrypted, and other than the QoS data being sent out constantly I don't see anything that stands out. Do you have the ability to get a unencrypted sniffer capture?

    Regards,

    Vince 

  • Vince,

    the sender is 12 the dhcp client and the status is 156 decimal.

    I do not have an unencrypted sniffer capture, but the fact that is is single sided and the device tries to broadcast something which apparently is not understood by the AP is what leeds me to believe the keys do not match.

    I will let you know of any more information we can gather. Have you been able to replicate this on your end? Do you see the same EAP reauthentication on your setup? Does it succeed?

    Best regards

    GZ

  • George,

    What specific EAP protocol are you using? EAP-TLS?

    Trying to reproduce now.

    Best Regards,

    Vince 

  • Right, EAP-TLS.

    Thank you

    GZ

  • Thanks George,

    Haven't been able to reproduce the issue. Can you get the server config from the enterprise setup? Want to make sure our setup is the same.

    Best Regards,

    Vince 

  • Hi Vince,

    we have been able to reproduce the issue in our customer's lab environment. I have attached the resulting NWP log file produced by a device that did the reauthentication and failed to connect to our server afterwards. The device did also participate in the pattern described in another E2E ticket that we opened recently (see link below).

    Please take a look at these logs and get back to us with results as soon as you can. Thanks!

    7633.nwpLog0000.bin.zip

  • Hi Lutz,

    Thanks for providing the log. I am able to see the issue you are describing. From the logs, i see the reauth from the AP, I see us perform the reauth successfully, but all TCP traffic after this fails. You then close the socket, and do a gethostbyname() which fails everytime because of a retry timeout. After X amount of retries, you reset and reconnect to the AP and are able to send data successfully.

    I'm working with Michael from the other E2E thread on recreating and diagnosing the issue. Any information you can provide to Michael on how the Radius server in your lab setup works that would be very helpful. We should be able to recreate it once we get the Radius server configured correctly.

    If you have a sniffer capture of this issue as well, that would be helpful in case we can't reproduce. We will get back to you with our findings sometime early next week.

    Best Regards,
    Vince 

  • Hi Vince,

    this particular issue does not actually require a special setup. I was able to see this exact pattern when I used an off-the-shelf FreeRadius installation as Radius server. I ran that server using the default configuration in my local network. The AP that I used was an older TP-Link consumer model (TL-WR941ND).

    The test devices that I used were configured to do eap-peap0-mschapv2. In an effort to narrow down the cause of the problem I ran the same test using 2 different service pack versions: 2.9.0.0 and 2.12.99.6 (aka the engineering service pack that we received from you earlier in 2019).

    The result of this testing was that ALL devices were doing reauthentications after ~500 minutes. This is our first problem. We are seeing this same pattern across different customer sites and local tests. Can you tell us if this time interval is used by the CC3200 as a default in case that neither the AP nor the AAA server provide an authentication timeout? We need to figure out a way to remove the periodic reauthentications altogether - at least for installations where they are not desired by our customers.

    The second finding of our tests is that the TCP/IP connection issue only shows when using the engineering service pack (2.12.99.6). Please note that for all of these tests we have disabled the server certificate verification:

    uint8_t t = wifi.secServerVerify;
    sl_WlanSet(SL_WLAN_CFG_GENERAL_PARAM_ID, 19, 1 ,&t);

    I have taken sniffer captures for all of the above in my local setup. The problem is that all communication during and after the reauthentication is happening within a secure tunnel. So the only thing that you can see in these captures is a change of the communication pattern (QoS packets). Not the contents of the actual reauthentication events.

    Please get back to us as soon as you can and let us know in case you need any more information.

    Thanks,

    Lutz

  • Lutz,

    We are still trying to reproduce this issue locally. I have a few questions for you:

    1. Can you describe more how 2.9 Service pack does not exhibit the issue? This service pack would have none of the changes we had made in the engineering service pack, thus skip server verification would not be persistent. You should then be getting disconnected on a reauth event, and having to reconfigure the skip server verification and perform a wlan connect. Is this the case?

    2. Can you describe your Test setup for the freeradius enterprise network? If possible, please let us know the AP you are using, and send up the configuration files you modified in freeradius to get the reauth to occur.

    Thanks,

    Vince 

  • Hi Vince.

    We observed that with EAP TLS so I do not think that the server verification applies and in any case the certificates all matched.

    I do not thing the timeout comes from freeradius. Actually I tried using in the freeradius configuration Session-timeout = 60 but it didn't seem to make a difference. So it must be triggered from something else. If it is not the device itself then maybe it is the AP. In our setup we used TPLink APs. I am not sure what the customers are using.

    Lutz do you happen to know?

    Best regards

    GZ

  • Hi Vince,

    1, What I've seen with version 2.9.0.0 is that my test devices started a reauthentication after being connected for ~500 minutes. As far as I could see the reauthentication succeeded and did not cause any noticable downtime. That means that the device stayed connected to our websocket and we did not count any reconnects at the time of the reauthentication. Since the skip server verification setting does not persist with this service pack this is indeed a puzzling outcome.

    2. I have attached the full FreeRadius folder as installed on my Mac for your review. The AP that's used is an old TP-Link consumer model (TL-WR941ND).

    Is there any news that you can share regarding the periodic reauthentications? Since we're seeing that these occur across different networks and AP vendors we suspect that they could be a hard coded default in your NWP stack. Can you confirm?

    Thanks,

    Lutz

    freeradius-server.zip

  • Hi Lutz,

    The Reauths that I see are due to either the radius server or AP settings. On the radius side, you'll find a session-timeout parameter = 500, this should reauth all devices every 500 mins. 

    Once this is configured, the AP has to be configured to accept session configuration options from the Radius server. In our cisco router, that was under security,advanced security, and then timers. You can either use the radius server config, or set an new interval of your own on the AP. I would check these parameters at the different networks to see if this is cause of the reauths.

    BR,

    Vince

  • Hi George,

    I saw the same behavior with the configuration not working. The AP has to be configured to accept the session timeout parameter for this to work.

    BR,

    Vince 

  • HI Lutz, George,

    We have identified a possible cause of the issue, and we are currently planning on testing the engineering service pack before providing it to you. I'll provide this to you over email as soon as possible.

    BR,

    Vince