This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC2538: Sometimes (very rare) a device “stops to communicate”

Part Number: CC2538
Other Parts Discussed in Thread: Z-STACK

Hello,

We are working with a zigbee network based on the CC2538 for more than 2 years (stack 1.2.0). In the last month we are experiencing problem(s) with very few devices that stopped sending packets and responding to remote requests.

These are devices working in “test” customers, not in our lab, so we have low feedback of the exact conditions the loss of communications occured.

We were able to reproduce the same symptoms reported by customers (devices stop repond to remote orders), but, not with the same conditions. To reproduce the same symptoms we had to use an autotranformer to lower/zero/increase the line voltage + massive communications to devices “stop responding” to remote orders. The devices keep working (local tasks working, read keys, …) with the problem. We are almost sure that the customers devices have not suffered reset of any kind (powerdown, watchdog, brownout) because makes local log in an external flash.

Not that we have already code to forbid writing to the NV memory when VDD is lower then 2.7V.

This is what we reached after a “line voltage attack” + massive communication to a device:

Symptom A - The device constantly reads from the NV memory a wrong NWKKEY (0x00, 0xFF…). The device transmits packets, but other devices reject the packets. A normal powerdown/powerup to the device “solves” the problem and the NWKKEY starts read correctly. 

Symptom B - The device restarts the framecounter. The device transmits packets, but other devices reject the packets because framecounter is old. A normal powerdown/powerup does not solve the problem. All other devices must be switch off to forget the framecounter and start accepting the packets.

Symptom C - The device sends INTERPAN packets (we do not use INTERPAN). The device sends packets from his ID (correct PANID and correct short address), but instead of sending a message to coordinator in the same PANID send to a coordinator to PANID=0xFFFE. Powerdown/powerup does not solve problem. The device must be put out of network and joined again to the network.

(We attached a log of sniffer of our device, short address 0x060E, trying to communicate with PANID=0xFFFE, packet number 47 and following)

 Symptom D - The device thinks that is out of network, so it does not TX any packet, devstate = DEV_HOLD. After powerdown/powerup the device is OK as nothing happened. One of customer device behaved like this.

 

From 4 devices reported till now none of them reported the same problems after being powerdown/powerup or rejoined (1 month ago). The 4 symptoms described above were provoked by us.

Does anyone experienced similar problems, “device stops to communicate”? Anyone knows if there are functions in the code of the z-stack that can originate this “loss of communication”? Anyone can help with our analysis of this problem?

 

Best Regards

 

nalvesDispositivo fora de rede apos sair de PROG.psd

 

 

 

  • Hi Nalves,

    We are looking into this, I will post any updates here.
  • Hi Nalves,

    In Z3.0 Release notes says there was a fix on the Nv items for Nwk Key on CC2538, does symptom A happens upon flashing the device, if so, then Z3.0 fix may solve this.

    The related change for this fix is in the osal_nv.c

    static uint8 writeItem( uint8 pg, uint16 id, uint16 len, void *buf, uint8 flag )
    {
    
     ...
    
          if ( chk == calcChkF( pg, datOff, len ) )
          {
            hdrData[0] = chk;
            flashWrite(OSAL_NV_PAGE_TO_PTR(pg) + hdrOff + OSAL_NV_HDR_CHK,
                       OSAL_NV_HDR_ITEM, (uint8 *)(hdrData));
            readHdr( pg, hdrOff, (uint8 *)(&hdr) );
    
            if ( chk == hdr.chk )
            {
              hotItemUpdate(pg, datOff, hdr.id);
              rtrn = TRUE;
            }
          }
        }
    
    ...
    
    }

  • Hello Luis,

    Glad to know that someone that supposed had the same problem. Did you has the same problem to have that conclusion? I test it but the problem happens, althout with less frequency. When the problem happens, the NV read result for ZCD_NV_NWK_ACTIVE_KEY_INFO (0x003A) was 0x00, 0xFF .... and i don't know why this value...

    Best regards
    Nalves
  • Hi Nalves,

    Are you saying that even with the fix that Luis proposed, you're still seeing the same issue with the NWK key in NV?
  • Hi Jason,

    The fix proposed by Luis, didn’t solve the problem of Symptom A, but I felt improvements because it became harder to reproduce the problem.

    However, with this propose, I understood that z-stack doesn’t work well, when osal_nv_item_init() and osal_nv_write() returns NV_OPER_FAILED when OSAL_NV_CHECK_BUS_VOLTAGE is lower than the minimal value.

    To fix it, i changed the condition, “if ( !OSAL_NV_CHECK_BUS_VOLTAGE ) return NV_OPER_FAILED;” for:

    while ( !OSAL_NV_CHECK_BUS_VOLTAGE );

     //if ( !OSAL_NV_CHECK_BUS_VOLTAGE )

     //{

     //  return NV_OPER_FAILED;

     //}

    in the first line of the functions: osal_nv_item_init(), osal_nv_write() and osal_nv_read(). Whith this exchange, any call to NV memory only starts, if the minimal voltage "OSAL_NV_CHECK_BUS_VOLTAGE" is checked.

    With this change, I'm not able to reproduce any more problems described in the first post, however, I don’t understand how a bad start, with a wrong NV item initialized, only have impact in a few days after the start-up of device. I’m saying this, because at the test field, the devices that had problems, only show the malfunction after a few days of normal operation.

    best regards,

    nalves

  • Hi Jason,

    Adicional to this information, I checked that in ZDSecMgrNwkKeyInit(), if some the inicialization of the following items fail (return NV_OPER_FAILED)

    - osal_nv_item_init( ZCD_NV_NWKKEY, sizeof(nwkActiveKeyItems), (void *)&keyItems );
    - osal_nv_item_init( ZCD_NV_NWK_ACTIVE_KEY_INFO, sizeof(nwkKey), &nwkKey);
    - osal_nv_item_init( ZCD_NV_NWK_ALTERN_KEY_INFO, sizeof(nwkKey), &nwkKey );

    that are the hot items(hotPg[], hotOff[]), they will be set with 0x00.

    The NV memory can be currupted with hotPg[], hotOff[], iqual 0x00, and as this values never more are inicializaded and device never recover the comunication with network, without a restart!

    Best regards
    Nalves
  • Hi Nalves,

    Can you clarify what outstanding issues you still have? Just want to make sure I fully understand before I answer again.
  • Hello Jason,

    Ok, let me clarify you. We are developing a system based in Zigbee comunication between devices with HA profile. e are using the C2538 and Z-Stack 1.2.0. We have this solution working almost fine at 3 years. At a few week ago we detect that some devices apparentlly stop comunicate with other devices in network (Stop sending packets and stop responding to remote requests). In our laboratories we ry to reproduce the syntoms of issues reported and we can reproduced turn it on and turn it off constantlly at the same time sending packets requests, and after few secounds, the device stop communicating. internally the device Works fine, responde at key pressed turn on and turn off local actuator, etc, but in network it is dead.


    We do it some changes in Z-Stack, one of them was the Minimum power voltage to initialize and write information in the NV memory. We change it from 2.0V to 2.5V.


    Our perception, is that exist some problems with NV memory operation because we have diferente symptoms (described at 1st post) for the same problem and we thought that was is related with parameters that are readed / writed from NV. 
    Hope this help. If you doubt persists, please feel free to ask me.

    Best regards
    Nalves

  • Hi Nalves,

    We have one other patch available that deals with low voltage situations and NV operations. I will need to send it to you via a private message, so I will send you an e2e friendship request so we can proceed.