Hello!
I appear to have a spurious (unintended, unserviced) interrupt bringing my AM3358 to its knees and if nested interrupts is turned on, then it is overflowing the HWI/SWI stack with endless nested unserviced interrupt. I need help tracing down what it is and where it is coming from, and possibly how to turn it off (since I don't THINK I turned it on!).
What I'm working with: Win7-64-bit,
Dev Env: CCS 6.1.0, CCS 6.1.2 and CCS 6.1.3 (I started with 6.1.3 and then reverted back to see if it would remedy the problem -- it hasn't.)
Platform: Custom board with MYIR brand MCC-AM335X-Y board with AM3358, 250MB RAM and other electronics that seem to be working perfectly.
Packages: SYS/BIOS 6.45.1.29, UIA 2.0.5.50, AM335x PDK 1.0.3
History:
In order to fill in some of the gaps in the AM335X TRM regarding its touchscreen controller, I did a lot of silicon testing with a small application under RTOS with 2 tasks: Idle (doing no custom steps yet), and UiTask() which I am developing into a user-interface thread with an 800x480 TFT LCD panel. To do this, I basically logged into an 8MB array of bytes in RAM, and then after a certain number of ADC events, I halted touchscreen/ADC activity and tabulated the results. This was very successful and I never ran into any problems other than determining how the touchscreen controller was behaving in certain aspects that weren't fully spelled out in the TRM (the results of this are published in another Forum posting). I captured the interrupt with two HWIs set up to capture Interrupt #16 (TSC_ADC_GEN), and another to capture Interrupt #115 (TSC_ADC_PEN), and both of the (separate) ISRs were well tested, did their thing and exited quickly. (Only loops were to read the FIFOs [2 or 3 deep, depending on the FIFO].) The tests included testing with only #16 enabled, only #115 enabled, and both enabled at the same time (which interestingly produced a cleaner set of interrupts with the PEN UP interrupt being the last of the PEN interrupts, but ONLY when both were enabled -- the other configuration produced a dozen or so spurious PEN-ASYNC and PEN-SYNC interrupts after the PEN-UP interrupt).
Roll forward to adding a great deal of IP in terms of a windowing system (I left the 8MB array there in the .bss section) and added 14 full-size screen buffers (an array uint32_t arrays of about 20 MB in size). I kept running into ROV tool complaining about stack overflow, which didn't seem right because I have no recursive logic and my call-stack depth is maybe 10 at most (and this windowing system has been well tested elsewhere).
I tried increasing stack size to no avail, since this apparently was not the right target. Stacks increased: 1) SYS/BIOS > Runtime > System (Hwi and Swi) stack size, increased to 32768, (Heap size 65536, though I am not using any heap right now -- all RAM is defined statically at the moment.). 2) UiTask() stack size 8192 (should be far more than enough -- I figured I could cut it back when I saw how much unused stack space there was).
I have the HWI module set up to [x] Initialize stack, and [x] Check for stack overflow, and the Dispatcher set up to enable interrupt nesting. This was true during the testing above, though I know for sure my 8MB "log" array was not being overwritten -- more on this below.
Increasing the stack size only seemed to add more delay until the program hit an exception following which the ROV tool showed a large number of errors under ROV > BIOS > Scan for errors. And the first one in the list was, of course, stack overflow. (I believe all that REALLY means is that something has overwritten the initialized (unused) stack values, which COULD be a real stack overflow or a runaway pointer.)
I placed a breakpoint in the Interrupt #115 ISR, but observed all this happened and the breakpoint was not being hit!
I started cutting back the program, to less and less code actually being executed, and now I have it back to even SIMPLER than the test environment that I was using to test the touchscreen controller: Idle task and UiTask() that ONLY does the set-up steps for the touchscreen controller and turns it on (with interrupts). And then it goes into this endless loop just for testing purposes (to eliminate other causes). In fact, the following is the TOTALITY of the application in the UiTask() (again Idle task is empty):
UART_printf("UiTask: launched.\n"); TAM_Initialize(); // Initializes touchscreen controller and turns on its interrupts.
// This is virtually unchanged from successful testing, except PEN-SYNC interrupt is no longer enabled in favor of PEN-ASYNC. for (;;) { UART_printf("AFTER TOUCH init endless loop...\n"); Task_sleep(1000); }
Just to get things going, I have large delays between touchscreen steps, pretty much the same as when I was testing. Writing into the 8MB array was removed, so that 8MB array is just sitting there.
Revelation 1: I went into RAM to confirm the 8MB array was all 0's, and to my surprise, it wasn't! Filled with some other binary value (all 8 MB!). The behavior is similar to an erring recursive function where the stack grows and overwrites everything -- OR -- something like a run-away pointer in an endless loop, which as you can see (above) is not coming from my application.
Revelation 2: Just before this array is a "count" variable that I was using as an index into the 8MB array during TSC testing. It too was being overwritten. So I set a HARDWARE WATCHPOINT to detect when this value was being written to, and re-tested. Sure enough, the routine that initializes the BSS to 0's wrote a zero to it, and finally, the next time it was overwritten was at the ENTRY POINT TO Hwi.c::Hwi_dispatchIRQC() function! And by time this HW WATCHPOINT stopped the execution in the debugger, all 8 MB of the array had already been overwritten. (Apparently the HWI/SWI stack grows downward.)
Revelation 3: if SIMPLY UNCHECKED the ENABLE AT STARTUP for the HWI #115 (the one I'm working with currently), then all of the above problems go away! No stack overflow. No array or RAM being overwritten. And the for (;;) {} loop spelled out above executes forever without any problems. When paused, ROV > BIOS > Scan for errors shows no errors, and the message "AFTER TOUCH init endless loop..." rolls out forever at 1-second intervals.
Revelation 4: On a hunch, I turned back on [x] Enable at Startup for the HWI #115 and UNCHECKED [ ] Enable interrupt nesting. Voila! Suddenly, the breakpoing inside the #115 ISR is now being hit! Yeay! Again, no stack overflow. No array or RAM being overwritten. And the for (;;) {} loop spelled out above executes forever without any problems. When paused, ROV > BIOS > Scan for errors shows no errors.
So it would appear that there is some spurious interrupt that is bringing the AM3358 to its knees (apparently by endlessly nesting an unserviced interrupt).
I see that in Hwi.c::Hwi_dispatchIRQC() function that just before the ISR is called, there is a Log_write5(), then interrupt nesting is enabled, then the ISR is called, and after the ISR, a Log_write1() call before the rest of the function executes. I am not at all sure where the data from the Log_write5() is going, but is it possible to place a breakpoing in this dispatcher function and look at variables or registers to determine where the trouble is coming from (the dispatcher is optimized), or track down the contents of the Log_write5() to determine where the spurious interrupt is coming from???
I would really like to get to the bottom of this, because obviously it has cost me a great deal of time up to this point, and in the end, I WOULD like to nest interrupts. Technically, I MAY not need to, but the above indicates I have some interrupt that is firing that I am not aware of, and THAT needs to get handled!
Reverting back into the stack-overflowing configuration, after arming the touchscreen interrupt, the 'irp' argument to the dispatcher function is arriving with the value 0x80011b84 (once = address of TAM_Initialize() function in MAP), and repeatedly 0x8001D9B4 (with the breakpoing in my touchscreen ISR NOT BEING HIT). The closest thing to that value in the MAP is the address of the ti_sysbios_family_arm_a8_intcps_Hwi_dispatchIRQC function at 0x8001d838. So I'm not sure what this means. Return address? I'm hoping it is a clue as to what the spurious (or unintentional, unserviced) interrupt is.
Help!
Kind regards,
Vic