This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

EK-TM4C1294xl has a 46% CPU utilization when used with IOT project.

Guru 56238 points

Other Parts Discussed in Thread: EK-TM4C1294XL

Have noticed very high utilization of the CPU when this IOT project connects or is connected to Exosite Cloud end point. The reason why and fix is posted the code shown below. 

Have also found the (exoHAL_ExositeEnetEvents()) function  has several issues with socket connection determination loops occurring on a constant basis. The IOT project code fails to set the endpoint as connected and drives CPU utilization at 46% even though it has once connect to the Cloud end point. This ping pong events triggering is transparent goes mostly unnoticed until adding numerous UART printed messages sent to Com3 showing each step in the linear Exosite connections tree.

This same project meant as a skeleton for adding other triggered events of Cloud end points makes it extremely if not impossible to share the TCP stack with another protocol such as a UDP client and then be able to establish a valid Exosite connection to verify the EEROM saved CIK. The problem is the Exosite endpoint connection flow does not like to be disrupted set idle just after DHCP returns the target an IP address. Not everyone wants to immediately connect to Exosite host every time DHCP assigns an IP address. This issue in itself defeats the entire reason to use Exosite in order to report variable events into the cloud by the Launch Pad.

This IOT project also does a master interrupt disable MID in (request.c) which will utterly disrupt all running Timers keeping track of externally connect hardware. In order to send Exosite Sync (tStats) events gathered from additional added variables this MID becomes counter intuitive.

Could TI PLEASE do some more work on this IOT package it really has some good bones yet has issues to compile the (tStat) when additional events or functions are added to (qs_iot.c). The user is then forced to move all the (tStat) extern prototypes into (stats.h) in order to have a successful build with out getting 10 structure not defined for the 10 (tStat).

 

 

  • Hello,

    Thanks for the feedback.  We will take a look at the examples and the suggestions you have provided.

    Regards,

    -c

  • Hi Craig,

    Just found an incorrectly named variable in the export prototype. Might explain some issues of the TCP socket connection which uses this same connect flag. Seemingly the client should only bind to TCP once per connection and tear down during the reset. Think that might account for the very high CPU usage?

     

  • This link has some screen shots of the (tStat) structure causing a failure to build project without errors after adding some other functions or moving the Exocite startup sequence into a console command function invoked anytime after DHCP returns a valid IP address to the client.

    http://e2e.ti.com/support/development_tools/compiler/f/343/p/337819/1181389.aspx#1181389

  • This UART prints shows the flow in real time when Exocite is invoked from a console command (callhome). The code crashes the MAC and resets the controller near the bottom. Don't believe it supposed to do that and have increased some of the delay times in TCP sockets reflecting a 5ms Systick trigged LWIP timers now being called outside of (eth_client_lwip.c).

     

  • Have been trying for over a week now to get to the bottom of this issue finally some light at the end of the tunnel. Hopefully this input helps to find why Exosite sync appear to be corrupting the stack pointer. Did also find and remove a misplaced  return under first (else) in exoHAL_SocketOpenTCP() which effectively pops the stack nothing being done here and not documented. The compiler lets the return slip but was not causing the reset.

    Test exoHAL_SocketConnect now returns correctly a (1) however this line below is executing after the CIK is read from metadata. The socket number of variable wrapped in the return via (exoHAL_ServerConnect(i32Connected)) is used during the next test.

    if(SyncWithExosite(g_psDeviceStatistics)). After moving the external function prototypes (tStats) to (stats.h) and bool SyncWithExosite(tStat **psStats). There is no actual function transfer of stats which appears to be more of an entire structure transfer.  

  • BP,

    Great investigative work.  However, i am not sure that i follow all the changes you made and in the end what was done to experiment and what was really the final fix.

    Can you please summarize the code fixes you made so we can capture them and hopefully put them in the next release.  

    For sure i see that passing the entire array of structs rather than pass a pointer was undesirable. What other fixes were needed in the end?  

    Did you resolve the high CPU utilization?

    Dexter

  • Hi Dexter, The exoHAL_SocketOpenTCP() flag was not returning the value of the true socket value because the prototype export variable was incorrectly prefixed with UL. Unknown to me at first, DNS name resolve actually takes a few passes through exoHAL_ExositeInit() while looping returns -5 (IN_PPROGRESS) then 0 (ERR_OK). However I first believed the flow was linear progressive not loop progressive intensive during the DNS phase which leaves the event trigger handler repeatedly traversing (lwiplib.c) TCPconnect() after a connection is established. Hence the 46% CPU usage reported.

    Possibly setting the connect flag = 1 after DNS resolves testing this flag and not immediately clearing the DNS/WAIT flags from(1) to (0) until after the socket is closed might stop the looping through TCPconnect().

    In my opinion the only way to trouble shoot this flow effectively in real time is by installing visual feed back messaging reports of the values being returned in every call back. So far was able to connect to the server but sync is failing or hangs a few minutes then acquire a new CIK is attempted after first closing the socket resetting the connection. Then acquire a new CIK returns HTTP status 0 and prints connection help messages and reports some garbled events messages as if the stack suddenly released.

    So much for circumventing the logon Exosite procedure and trying to first acquire an IP address and later issue a console command to Sync with Exosite. 

  • CPU usage is now 23% without stopping the callback of clearing Flag DNS Busy (0) using a test in exoHAL_ExositeInit() testing if connect flag == 1 a previously established connection then (return). The problem with that is once again DNS was taking 2 passes to name resolve -5 to 0 and I had stopped DNS from his mission goal.. The usage was nailed to less than 1% but could not effectively connect to the Exocite server.

    These more recent screen shots of the Exosite connect to server flow have the changes shown above.

    Wait several minutes for this response:

  • Open Socket handle now receives the correct callback and works to connect:

  • DNS was the issue  which required several passes to in order to complete a Host name resolve. The Exosite client was progressing even when DNS had a -5 in progress status. Now the client waits for DNS status 0 and then moves on but sets the INIT and WAIT flags and on clears then on socket close.

     

  • Made a few enhancements to the events handler were DNS and Exosite connect are diligently instantiated.

    Difficult is the test for DNS -5 status, actually status 0 infers returned ACK though testing for only 0 would not work at all. 

  • Activate CIK is now getting http: status 6179 returned from the Exosite server after client sends ACK shortly after receiving 50 bytes. >50 bytes is an odd number assumed (CIK of 40 +12 byte http header = 52 bytes) 59 bytes are sent to the server. Strange is the key request which changes when ever DHCP lease expires on the client.

    Anyone have an idea what http 6179 status indicates? There appears to be no handler for this status code found in(exosite.c),  (Exosite_Read()) or (get_http_status()).

    BTW: Had to add a short delay after socket close  and TCP rest or the events would trigger a new connection before TCP actually disconnected instantly restarting the client to provision a new CIK.

     

  • Once again it appears moving all the (tStat) prototypes for (Exosite structure) in to (requests.h) allows the 5.1.4 complier to build clean. Doing that somehow breaks the stats index pointer ability to sequence the total number of stats in the write buffer sent to the server. Oddly the EEROM is read into meta data with all stats in the array is shown on top box.

    That http status 6179 is either the local socket port number or the Exosite server error return for an invalid CIK sent to the server, at times this value is http status (0). The system delay timing of the (Exosite_Read()) data variable has some link to this status. The default Systick interval time of the LWIP timers loop in IOT were set (/10) far to slow an interval loop. The LWIP Systick interval is now (/200=5ms) for UDP (*puc) bindings providing real time Ethernet GUI updates that will report statistical data in 1 second interval into the Cloud for community review.

    Could really use some answers why the TI compiler has issues to build clean when adding additional (tStat) or other code functions are added into or around (qs_iot.c),

     

  • At this point I believe that your code is significantly different from the example application that we publish.  It is therefore very difficult to assess what is happening in your application.

    If your primary purpose is to add a tStat and publish it to exosite and use qs-iot as the baseline then have you looked at our wiki article which describes exactly how to do this?  http://processors.wiki.ti.com/index.php/CLP_Exosite_Data_Sources

    In developing this wiki article i have not experienced any compile time issues and have not heard of issues like you describe.

    It may be useful to go through this exercise starting with the qs-iot application unmodified and then come back to your application.

    Dexter

  • Have done additions of a few tStat alerts to the basic IOT project with success. However IOT is vanilla code having no user application. There is now an application assembly located at the top of the (qs_iot.c) which may be the culprit for having to move the 10 tStats struct. The compiler insisted the assembly being at the top above the once existing struct references. 

    One big problem is created when a user initializes the Exosite TCP stack binding UDP during or shortly after DHCP assigns an IP. Not everyone wants to connect to the Exosite server just after assigning the target an IP address. Executing only exoHAL_ExositeInit() to assign the Launch Pad an IP address triggers a few events in the process so the linear progression of the Exosite logon sequence wanting to immediately verify a CIK gets interrupted. The basic IOT code can not gracefully handle re-entrance without triggering ENET events. That confuses logon progression when the user issues the Exosite_Init("texasinstruments", "ek-tm4c1294xl", IF_ENET, 0); from the beginning of the IOT server logon code sequence.

    Tracing the bytes sent to the server and received back suggest something is not correct in how the stats index pointer for the server response is built. Possibly (ringbuffer.c) is having issues with the higher speed 120Mhz processor and retrieving flash meta data surrounding a 5ms LWIP timers?

    Vanilla IOT Systick servicing Eth0 LWIP timers was set at 1 second intervals, versus now being at 5ms. This 1 second interval was far to slow for the UDP 23 real time GUI client. The statistical graphs were moving like a snail at 1 second Systick interval.

    .

     

  • Not sure how this was even working and not locking up NVIC. Seemingly sending Ethernet real time data packets at 5ms has timing problems when the vector handler interrupt is not cleared. That locks the EMAC receiver register or others and freezing any connected GUI. This screen shot is from IOT software and there was an Ethernet interrupt priority set to 0xe0 for the EMAC0 triggered interrupt. The yellow box shows the added interrupt clear for the HW vector not being cleared by ui32Status clear.

    The green box un-pends the SW triggered interrupt, in the past was HW enabled interrupt vector set in (Stellariswarenif.c). Some may argue the data sheet text regarding un-pending of any NVIC interrupt.

     

  • Seeing lots of TI code SW interrupts not clearing the PEND. Not a problem if the interrupt handler function can execute every instruction before NVIC is interrupted once again. This case Systick LWIP timer of 5ms and a (lwip_interval_timer) set 200ms. On several occasions during MPU reset something was causing the EMAC to hangup during DHCP IP address assignment. Never did get a conclusive answer as to what was causing the hang but It hasn't returned since UNPEND this interrupt.