This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Simpliciti Join/Link problem

Other Parts Discussed in Thread: SIMPLICITI

Hi,

I designed and coded a Simpliciti based system that is configured for an AP and 32 EP's. It was a requirement that the AP initiate a join/link when a new device was to be added to the system. The Join/Link mechanism is closely coupled, and I discovered that if many ED's make a request to join/link simultaneously, the protocol could not handle this and sometimes join/link requests were lost by the AP. For the Joins that were lost this is not  a problem as further joins are replied to by the AP. However if a link request is lost, then the AP will never release the linkid for that lost link. It keeps it in a joined but not linked state. Since it is joined, further joins have no effect, and the semaphore is not passed up to the application, so the application, Further link requests after  the link has timed out on the AP are ignored as they are not part of the join/link mechanism. I did include a random backoff in the ED's in response to the broadcast join/link trigger, however once in a while they still collide causing the problem.

Just wondering if someone clever at TI has a suggestion for this, rather than hacking the join/link code to discard table entries for failed join/links?

Regards

Simon Buchwald

  • Hi Simon,

    your question was not really clear.

    You mentioned:

    simon buchwald said:

    It was a requirement that the AP initiate a join/link when a new device was to be added to the system. 

    Does this mean that the AP is the one which calls SMPL_Link() and the ED calling SMPL_LinkListen() during initialization? What does the AP do actually to initiate the join/link?

    However you mentioned also:

    simon buchwald said:

    I discovered that if many ED's make a request to join/link simultaneously

    I am guessing it is the ED which sends the Link Request, however i am not sure whether i understand why the AP would not send Link Response if the first Link Request  from an ED collides with another message from another ED.

  • Hi Leo,

    Thanks for your response. It is all rather complicated and I tried to explain it as tersely as possible.

    For a Simpliciti system, it is the ED's that make the Join/Link requests. In this system, the ED's are enclosed in a housing with no buttons, so there is no way to manually initiate a join/link. I configured the AP to disable the join in its usual mode, and then on user menu command, open up the join link in the AP, and then send a broadcast message to all ED's. All ED's are always powered. The ED's receive the broadcast joinlink control message, process it, and then initiate a joinlink poll to the waiting AP. I copied modified the randomdelay backoff function to give 1 of 128 in 10ms granularity, and the ED's then poll the AP. As I watch the packets play out on the sniffer, I can see that all ED's always send their join and then link requests, however, the AP sometimes does not respond to a link request. When this occurs, the waitforlink code in the AP always times out after the default configured 5second timeout.

    And yes, hence my question to TI, why does the AP not send a Link Response to an ED which notionally "collides" with another message from another ED from the AP's point of view? It is clearly something going on in the AP, since the sniffer can always see the missed packet. As yet I have not been able to figure it out, I can make it happen easily if I take out the random backoff and make many ED's initiate join/link almost simultaneously in response to the AP's broadcast joinlink radio packet command.

    (Actually the system is somewhat more complex, as I modified the frequency table to have 6 channels, 100 to 125, one as a control channel for the join/link, and the other 5 as user selectable data channels as there are to be multiple AP's within reasonable proximity, and each will be configured to a one of the clear channels. However the missed link packet issue occurred on my initial system which does not flip between different frequencies. I did this because I discovered that two AP's with many ED's in close proximity interfered with the useability of the system, since the AP was always doing radio code, presumably  discarding unwanted packets rather than responding to user interface commands and displays. I did not use AGILITY as it always does a freqency poll before join/link and I did not want this to interfere further so I implemented a manually selectable channel scheme.)

    Hope that helps clarify my question.

    Regards

    Simon Buchwald

  • Hi Leo,

    Did my reply give you more detail? How do I go about getting assistance on this issue? 

    Regards

    Simon Buchwald

  • Hi Simon,

    sorry for the late respon. I can understand the problem better now based on your last reply.

    Looking into the SimpliciTI code v1.1.1/v.1.2.0, i think i find out the cause of this problem.

    If you look into smpl_send_link_reply() in nwk_link.c (this function is used to send LinkReply from the node calling SMPL_LinkListen() upon receing LinkRequest message from the other node sending LinkRequest message by calling SMPL_Link()) - the LinkReply message is sent with TX Option MRFI_TX_TYPE_FORCED - look for nwk_sendFrame(pOutFrame, MRFI_TX_TYPE_FORCED).

    Looking into the code, basically there are two possible conditions how the LinkReply is sent in the smpl_send_link_reply():

    - first, if the LinkRequest is duplicated (the link is already established - connection state is CONNSTATE_CONNECTED - from the local node point  of view). This is coded as follows:

    /* is this a duplicate request? */
    remotePort = *(MRFI_P_PAYLOAD(frame)+F_APP_PAYLOAD_OS+L_RMT_PORT_OS);
    if (pCInfo=nwk_isLinkDuplicate(MRFI_P_SRC_ADDR(frame), remotePort))
    {

    .............

    if (pOutFrame = nwk_buildFrame(SMPL_PORT_LINK, msg, sizeof(msg), MAX_HOPS-(GET_FROM_FRAME(MRFI_P_PAYLOAD(frame),F_HOP_COUNT))))
    {

    /* destination address is the source adddress of the received frame. */
    memcpy(MRFI_P_DST_ADDR(&pOutFrame->mrfiPkt), MRFI_P_SRC_ADDR(frame), NET_ADDR_SIZE);
    #if defined(SMPL_SECURE)
    nwk_setSecureFrame(&pOutFrame->mrfiPkt, sizeof(msg), 0);
    #endif /* SMPL_SECURE */
    nwk_sendFrame(pOutFrame, MRFI_TX_TYPE_FORCED);

    }
    return SENT_REPLY;

    }

    - The second one if the LinkRequest is completely new received:

    if (pCInfo)
    {

    /* yes there's room and it's not a dup. address. */
    memcpy(&pCInfo->peerAddr, MRFI_P_SRC_ADDR(frame), NET_ADDR_SIZE);

    .....................

    if (pOutFrame = nwk_buildFrame(SMPL_PORT_LINK, msg, sizeof(msg), MAX_HOPS-(GET_FROM_FRAME(MRFI_P_PAYLOAD(frame),F_HOP_COUNT))))
    {

    /* destination address is the source adddress of the received frame. */
    memcpy(MRFI_P_DST_ADDR(&pOutFrame->mrfiPkt), MRFI_P_SRC_ADDR(frame), NET_ADDR_SIZE);
    #if defined(SMPL_SECURE)
    nwk_setSecureFrame(&pOutFrame->mrfiPkt, sizeof(msg), 0);
    #endif
    if (SMPL_SUCCESS != nwk_sendFrame(pOutFrame, MRFI_TX_TYPE_FORCED))
    {

    /* better release the connection structure */
    nwk_freeConnection(pCInfo);

    }


    }
    else
    {
    /* better release the connection structure */
    nwk_freeConnection(pCInfo);
    }
    }

    /* we're done with the packet */
    return SENT_REPLY;

    }

    In your case, i believe it should be the second one. Could you check e.g. by placing a breakpoint in the debug mode whether my assumption is correct?

    If my assumption is true, then it is the reason why the problem happens is again due to the fact that the LinkReply is sent with MRFI_TX_FORCED option, so the node doesn't really care if the message collides with another messages on the air.

    The easiest way for solving this problem in my opinion is to change the transmit option from MRFI_TX_FORCED with MRFI_TX_TYPE_CCA. Could you please try that and let me know the result?

  • Hi Leo,

    Thanks for your reply, I was on vacation when you posted it and have now just got back to work.

    I put the breakpoint in, where you requested, and it does stop there, however I do not see how this gives any useful logical information, since every valid link reply should stop there where there is no duplicate address. I wrote a "delete all" function, and so that the sPersistInfo table is empty at the start, so the code executes this case for every device joining. The problem which you will see in the log, is that the AP does not reply to some link requests

    I tried the suggestion, and in fact changed both of those to TYPE_CCA, but there is no difference in the performance. I took a Sniffer log of the over the air transactions, and have attached it for you to consider further.

    Regards

    Simon Buchwald

    JoinLinkProblem.psd
  • Hi Leo,

    I spent the last 12 hours debugging this, not a pretty sight, there are actually several problems:

    Firstly:

     I put debug statements, "Got Here N" after each logical block in smpl_send_link_reply. What I found is the code is synchronised with a non protected semaphore, sListenActive. Basically smpl_send_link_reply is executed in the receive interrupt, and LinkListen sets and resets sListenActive asynchronously to the interrupt. What I suspect is that LinkListen and smpl_link_reply get out of sync somehow, and smpl_link_reply ditches the reply because sListenActive is not set. Well that's the debug behaviour I found anyway. This has some strange stuff going on, because of 6 joinlink requests, if the second one is missed, it comes out at the end and linklisten fails. To get around this I commented out the following code: if (!sListenActive) return SENT_NO_REPLY; and I have not seen the issue again since.

    This is ok in my system, as I explicitly enable the join window when joining, and disable it when finished. Nevertheless its a nasty bug.

    Secondly:

    I also put debug at the beginning of the RX irq, and printf the received data length, so I know that the AP is getting the join/link requests because the length is correct every time. Sometimes, the receive interrupt ditches the link request before it even gets to smpl_send_link_reply. I have not actually debugged this yet because it is hard to reproduce, but I have a Simplici log and serial debug output that matches.

    Any chance someone at TI responsible for Simplicti can confirm my suspicions please?

    Regards

    Simon Buchwald

  • Hi Leo,

    Any chance you could take a look at this again sometime soon please?

    Regards

    Simon Buchwald

  • Further to this, the ED code did not work, as multiple ED's joined to each other when they shouldn't have.

    I made a conditional compile, so the check is only done for ED's, while the AP is always listening, and there is no window where the main code turns off listening while a join/link could be made, but missed.

    It would be very nice if TI could possibly find some spare CPU cycles to attend to this issue.

    Regards

    Simon Buchwald

  • I think I am the gruy really understand you and your problem; because I have the same one!!!

    and I found the cause; I modify the nwk.c file; and it works fine new.... try it.

    I think this is a bug, must be fixed!!! It's very easy to occur when AP is busy, losing the first Link request....

    the ED's will never Linked, but it's always Joined!!!!

    nwk.c line 379 is the key point I think;
    /******************************************************************************
    * @fn nwk_isLinkDuplicate
    *
    * @brief Help determine if the link has already been established.. Defense
    * against duplicate link frames. This file owns the data structure
    * so the comparison is done here.
    *
    * input parameters
    * @param addr - pointer to address of linker in question
    * @param remotePort - remote port number provided by linker
    *
    * output parameters
    *
    * @return Returns pointer to connection entry if the address and remote Port
    * match an existing entry, otherwise 0.
    */
    connInfo_t *nwk_isLinkDuplicate(uint8_t *addr, uint8_t remotePort)
    {
    #if NUM_CONNECTIONS > 0
    uint8_t i;
    connInfo_t *ptr = sPersistInfo.connStruct;

    for (i=0; i<NUM_CONNECTIONS; ++i,++ptr)
    {
    // if (CONNSTATE_CONNECTED == ptr->connState)   // this line is the cause of the problem, I fixed it by the green one
    if (CONNSTATE_CONNECTED == ptr->connState || CONNSTATE_JOINED == ptr->connState)
    {
    if (!(memcmp(ptr->peerAddr, addr, NET_ADDR_SIZE)) &&
    (ptr->portTx == remotePort))
    {
    return ptr;
    }
    }
    }
    #endif

    return (connInfo_t *)NULL;
    }

  • Dear Mr jiang

        Could you give me your email? I would like to make friend with you.my email is 22626031@qq.com

  • hi Leo,

         Our system is based on the AP and a number of EDs, we have use nv_obj_write_nwk_cfg () and save the connection information 'sPersistInfo' to the flash of cc1110. It is workable for "simple peer to peer" in this way . But for 'AP as Data Hub', the system will be unable to work. The BUG described as follows:
          Even if the AP and EDs are stored 'sPersistInfo', at the next power cycle, ED can not receive any message from the AP without SMPL_Init () successful .
           We hope that once our ED and AP have once successfully JOIN and Link ,we don’t need SMPL_Init () successful and re-Join and link at the next power cycle.
            In this case, in addition to storage 'sPersistInfo', what other information do we need to store?

    Regards

  • Hi Watson

    Did you ever find a solution for this error.

    I'm using Simpliciti 1.1.1 and also have this link join problem. I did change my code in nwk.c as  suggested, but I still get that all of the ED's will join but only some will link. I can see the link request received by the AP but it always detect a duplicate end then SENT_NO_REPLY is sent back.

    Any help will be much appreciated

  • Hi there simon buchwald and Jize JIANG

    Did you ever get this to work seemlesly, I'm currently struggling with this same issue but cant seem to get it to work without a hitch. I will give more project details as soon as you reply

    Regards