This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

XL-TM4C1294XL: EEROM writes issue, revision 3 MPU second Exosite client http 204 Sync fails.

Guru 55963 points


Hi,

We going crazy then discover the EEROM not writing new CIK from Exosite. Forever reads the initial CIK written to EEROM using factory loaded software.

After flashing replacement SW with (mods) EEROM writes with code from TivaWare_C_Series-2.1.0.12573 fail.

Resolve:

(Programmer: Added a test to stop recursive EEROM writes during looping cycles resets of MPU, also writing same Meta data repeatedly) 

4/2/15: Exosite second device MAC (EK-TM4C1294XLi3) receives response http: 204 after a server read request should receive http:200.

  • The new CIK assignment from Exosite never gets written into EEROM and the old CIK remains.

    Code 9 HTTP status 401 (NoAuth) is the return when the client tries to then authenticate.

    void
    Exosite_SetCIK(char * pCIK)
    {
      if (!exosite_initialized) {
        status_code = EXO_STATUS_INIT;
        return;
      }
      exosite_meta_write((unsigned char *)pCIK, CIK_LENGTH, META_CIK);
      status_code = EXO_STATUS_OK;
      return;
    }

  • Hello BP101,

    Structurally the EEPOM Write/Erase/Reads have no issues. However under context of the application as you just pointed out it may be an issue.

    Regards
    Amit
  • Hi Amit,
    Admit was tempted to disable further writes to EEROM once a CIK was written by (exoHal_WriteMetaItem()) - out of wisdom opted not to disable further writes for this very reason.

    Something may have possibly changed in the EEROM write timing? Revision 1 silicon MPU writes EEROM with build code but seemingly revision 3 chip fails to re-write once it has been written. Vanilla firmware version 2.1.0.12573 will not write the EEROM either.

    Possibly a write EEROM in revision 3 silicon we only get a 1 time burn then further writes are locked out?
  • Appears as if the programmer guy didn't disable writes to EEROM yet qualified the Ethernet client be enabled prior to burning the EEROM. Funny thing at first thought anyone would suspect that he in disabled false without the authenticated CIK..

    Not sure why the IOT vanilla code was not assigning a CIK it doesn't test this flag, suspect it was after 5:45pm receive packet drops were killing the (tStat) SYNC cycle. When it rains it poors pours!

    Now recall putting test to stop repeated burning EEROM when the MPU was in a constant reset each every Exosite Post. MPU resets were a symptom caused by LWIP memory pool Heap being corrupted.  

    void
    exoHAL_WriteMetaItem(unsigned char * pucBuffer, int iLength,
            int iOffset)
    {
    	//
    	// Check the client has been initialized prior to writing CIK.
    	//
    	if (HWREGBITW(&g_sExosite.ui32Flags, FLAG_ENET_INIT) == 1)
    	{
    		return;
    	}
    

  • Hi Amit,

    Now having issues with PCB's being freed from the memory pool Heap after transmitting packets into LWIP either with (memory allocator) or (mem_free), both methods corrupt the Heap. Appears to be same MPU reset condition in revision 3 silicon when ever the PBUF Heap becomes corrupted.

    Since we Now have a CIK in both Launch Pads EEROMS, Exosite SYNC with exact same flashed SW succeeds revision 1 silicon yet fails in the revision 3 launch pad.

    Attached log shows the default LWIP memory allocator method -  after transmitting packets the HTTP sever response, 204 (OK). Then immediately after socket failure (-1) TCP debug posts an error as it should via LWIP (stats). We reduced the circular receive buffer from 4096 down to 1024, transmit from 8192 down to 4096 and still have same issue. MAC changed for security - please remove this LOG for Exosite security reasons.

    The odd thing is DHCP event is being triggered just after reading the EEROM Meta data, suspect no free PCB's is causing EMAC to freak out.

    Have any idea what would cause Heap crash blocking further receives (Server didn't respond) in all 10 retries? - HTTP 204 response must be buffered, right?

    Many Thank U's :)

     

  • Hi Amit,

    Now having issues with PCB's being freed from the memory pool Heap after transmitting packets into LWIP either with (memory allocator) or (mem_free), both methods corrupt the Heap. Appears to be same MPU reset condition in revision 3 silicon when ever the PBUF Heap becomes corrupted.

    Since we Now have a CIK in both Launch Pads EEROMS, Exosite SYNC with exact same flashed SW succeeds revision 1 silicon yet fails in the revision 3 launch pad.

    Attached log shows the default LWIP memory allocator method - after transmitting packets the HTTP sever response, 204 (OK). Then immediately after socket failure (-1) TCP debug posts an error as it should via LWIP (stats). We reduced the circular receive buffer from 4096 down to 1024, transmit from 8192 down to 4096 and still have same issue. MAC changed for security - please remove this LOG for Exosite security reasons.

    The odd thing is DHCP event is being triggered just after reading the EEROM Meta data, suspect no free PCB's is causing EMAC to freak out.

    Have any idea what would cause Heap crash blocking further receives (Server didn't respond) in all 10 retries? - HTTP 204 response must be buffered, right?

    Many Thank U's :)

  • Hello BP101,

    I think you need to share the log with exosite as it suggests a server side problem and not client side. Also the qs_iot has not been giving any issues so far in the testing we have done after last set of fixes. I am not sure but how come you are getting "all the errors"?

    Regards
    Amit
  • Hi Amit,

    First of all would it not be incorrect for TI to assume the entire TCP stack belongs to Exosite port 80 alone. Some kind of additional application layer often is a given in Cloud environments. The entire Exosite concept falls apart if we can not use part of the TCP stack for other network functions on the application layer. This amounts to stress testing the TCP stack with other PCB bindings. TM4C1294 is only at 23% MPU utilization with the added Telnet TCP/UDP port 23 binding active and serving the client.

    The TCP Heap is built from (.bss) memory function controlled by DMA access via the EMAC,  has little to do with the Exosite host server at TI.  According to past Exosite support team contact they suggest is TI's responsibility to support the server side but this in my opinion is not sever related. 

    Below we can see the heap crash halts or locks MPU around 2000 second count often high as 5000. Very few packet drops noticed here when the heap after several (mem_free) calls mysteriously overflows. The heap becomes unstable when it is put under stress and appears to Panic under numerous (TIME_WAIT) states never actually frees PCB during (tcp_close) so heap (max:) grows very rapidly.

    TCP
    xmit: 22226
    recv: 22780
    fw: 0
    drop: 56
    chkerr: 0
    lenerr: 0
    memerr: 0
    rterr: 0
    proterr: 56
    opterr: 0
    err: 0
    cachehit: 0
    
    MEM HEAP
    avail: 32768
    used: 3468
    max: 6932
    err: 0
    
    MEM RAW_PCB
    avail: 0
    use00us0Tm000m000eP0a0exoHAL_ExositeEnetEvents(): << Event DHCP (Break) >>
    
    connect_to_exosite(): << Continue: Socket Closed -1 >>
    

  • Better explain a bit more detail what that log is showing.

    The EMAC receiver of revision 3 MPU appears to be shut/blocking shortly after the Heap blows up almost as if the DMA arbiter has lost priority % focus on the (pcb) in memory.  There appears to be heap corruption right after the first transmission of packets on TCP port 80.

    Notice a few times in logs and this log above near top - 4 RAW (pcb). There should never be any RAW (pcb) to begin with when (LWIP_RAW) is not enabled.



    MEM RAW_PCB avail: 4 used: 0 max: 0 err: 0

     

  • BP101 said:
    We going crazy

    If the attempt here is to portray, "urban dialect" should not, "We be going crazy" prove  more normal/customary?

  • Blending protocol - being part of a mass globalization effort!

    Perhaps TI should have tested SW at slower 2MBPS down load speed.

    9000 seconds later, logged on FB talking with friends & wham-O, TCP crash & burns never returns.

    MEM HEAP
    avail: 32768
    used: 17858
    max: 48526
    err: 0
    
    MEM RAW_PCB
    avau0a0Tm0u00eAm0u00ePm00000exoHAL_ExositeEnetEvents(): << Event DHCP (Break) >>
    
    ASSERT FAIL at line 650 of C:/Software/Tivaware/TivaWare_C_Series-2.1.0.12573/third_party/lwip-1.4.1/src/core/pbuf.c: pbuf_free: p->ref > 0
    connect_to_exosite(): << Continue: Socket Closed -1 >>
    exoHAL_SocketOpenTCP(Next): << The Exosite http server connection FAILED >> 
    
    ASSERT FAIL at line 339 of C:/Software/Tivaware/TivaWare_C_Series-2.1.0.12573/third_party/lwip-1.4.1/src/core/mem.c: mem_free: mem->used
    exoHAL_SocketOpenTCP(-1): << eth_client_lwip: Assert TCP Disconnect >>

  • Hi Amit,
    Appears Exosite server is sending out incorrect http response 204 to the second connected client and he just happens to have revision 3 silicon. After a read request receive bytes cycle client should be getting an http 200 response. That status was not appearing when the heap often crashes during the servers http return response cycle yet it was seemingly an incorrect response.

    Thanks for time efforts to get the answer of why, when and how this http failure occurs.

  • Hello BP101,

    We have been discussing this with the Exosite Team and working with them to address such issues. We see responses that are not expected.

    Regards
    Amit
  • Support is arguing the point http 204 status should be treated as (2xx), yet 200 is received from the server in most every Sync cycle. The read flag is trigged on http 200 not 204 and the 2nd client will never sync. Note to say most every cycle - found often a 204 will be receive by client and halt the Sync cycle. Added 2 debug headers into the Post cycle to enable Exosite debuggers for support to follow the http transactions.

    Should ever the http status variable resemble a phantom artifact; inferring the receive packets never actually enter the ring buffer then falsely report each of all 10 retries of http transaction status that would be a true conundrum.
  • Exosite support genius discover the location data port Alias was set to (town, state) instead of (location) the host server was returning http 204 often crashing the Heap. When the heap was stable the Exosite server connect retry count (10) would exhaust normally.

    The debug LWIP logs TCP/IP did not show this happening or we just missed it but that data port Alias had been corrected changed back after modifying it incorrectly week ago.

    Oddly this issue did not effect the first device because the location Alias was correct. Either way an user added Alias name missing on a server data port does not seem to stop Sync yet the (location) Alias will and does.