This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi All,
I am trying to debug an issue where I expect to read data from the SSI ISR handler when a receive timeout interrupt is triggered, but no data is available. When we read the masked interrupt status it is 0. This should indicate that no interrupt was triggered so why did the processor jump to the interrupt handler?
What we're trying to do:
We use the Timer 0A interrupt to start an SSI transfer using the Tivaware SSIDataPutNonBlocking API. The SSI speed is set to 1MHz and the receive timeout interrupt is enabled so ~48uS later (16 bits SSI + 32 clock timeout) the SSI interrupt triggers. I can verify the timing works as expected with my debugger. The issue is very random, we can do 1000s of data transfers without issue and then one will fail. We can run one system for days without issue and another system will only work for a few minutes.
Debugger Screenshots:
Relevant Code snippets:
#define ADC141_GPIO_PERIPH SYSCTL_PERIPH_GPIOH #define ADC141_PERIPH SYSCTL_PERIPH_SSI2 #define ADC141_GPIO_BASE GPIO_PORTH_BASE #define ADC141_SSI_BASE SSI2_BASE #define ADC141_SCLK_PIN GPIO_PIN_4 #define ADC141_CS_PIN GPIO_PIN_5 #define ADC141_SDO GPIO_PIN_6 #define ADC141_GPIO_SSICLK_CFG GPIO_PH4_SSI2CLK #define ADC141_GPIO_SSIRX_CFG GPIO_PH6_SSI2RX #define ADC141_GPIO_SSICS_CFG GPIO_PH5_SSI2FSS #define ADC141_BAUD 1000000 #define ADC141_PROTOCOL SSI_FRF_MOTO_MODE_3 #define ADC141_MODE SSI_MODE_MASTER #define ADC141_DATAWIDTH 16 /* Initialization code */ SysCtlPeripheralEnable(ADC141_GPIO_PERIPH); SysCtlPeripheralEnable(ADC141_PERIPH); GPIOPinConfigure(ADC141_GPIO_SSICLK_CFG); GPIOPinConfigure(ADC141_GPIO_SSIRX_CFG); GPIOPinConfigure(ADC141_GPIO_SSICS_CFG); GPIOPinTypeSSI(ADC141_GPIO_BASE, ADC141_SCLK_PIN | ADC141_SDO | ADC141_CS_PIN); SSIConfigSetExpClk(ADC141_SSI_BASE, SysCtlClockGet(), ADC141_PROTOCOL, ADC141_MODE, ADC141_BAUD, ADC141_DATAWIDTH); SSIIntEnable(ADC141_SSI_BASE, SSI_RXTO); IntPrioritySet(INT_SSI2, 0xE0); IntEnable(INT_SSI2); SSIEnable(ADC141_SSI_BASE); void TIMER0A_Handler(void) { TimerIntClear(TIMER0_BASE, TimerIntStatus(TIMER0_BASE, false)); SSIDataPutNonBlocking(ADC141_SSI_BASE, 0); } void SSI2_Handler(void) { uint32_t countSamples = 0; int32_t recv = 0; uint32_t ssiStatusRegister = HWREG(ADC141_SSI_BASE + SSI_O_SR); uint32_t ssiRawInterruptStatus = HWREG(ADC141_SSI_BASE + SSI_O_RIS); uint32_t ssiMaskedInterruptStatus = HWREG(ADC141_SSI_BASE + SSI_O_MIS); if((ssiStatusRegister & SSI_SR_RNE) == 0) { __asm("BKPT 0"); } // Read from the SSI receive FIFO, returns number of elements read countSamples = (uint32_t)SSIDataGetNonBlocking(ADC141_SSI_BASE, (uint32_t *)&recv); SSIIntClear(ADC141_SSI_BASE, SSIIntStatus(ADC141_SSI_BASE, false)); /* snip */ }
I suspect that the problem is you are clearing the SSI interrupt status register at the very end of your interrupt service routine. Writes to peripherals are buffered so it is possible that the interrupt request has not been cleared before interrupts are re-enabled by the return. Adding a dummy read of SSIIntStatus should fix the problem as that read will not happen until after the previous peripheral write has completed.
Hi Bob,
I removed code from the end of my ISR handler that I didn't think was relevant to the issue. The System Analyzer window should have also shown two SSI interrupts if it was re-entered. I have re-run the test anyway with your suggestion and it did not resolve the issue. The full uneditted SSI handler code is here:
void SSI2_Handler(void) { uint32_t countSamples = 0; int32_t recv = 0; uint32_t ssiStatusRegister = HWREG(ADC141_SSI_BASE + SSI_O_SR); uint32_t ssiRawInterruptStatus = HWREG(ADC141_SSI_BASE + SSI_O_RIS); uint32_t ssiMaskedInterruptStatus = HWREG(ADC141_SSI_BASE + SSI_O_MIS); if((ssiStatusRegister & SSI_SR_RNE) == 0) { __asm("BKPT 0"); } // Read from the SSI receive FIFO, returns number of elements read countSamples = (uint32_t)SSIDataGetNonBlocking(ADC141_SSI_BASE, (uint32_t *)&recv); SSIIntClear(ADC141_SSI_BASE, SSIIntStatus(ADC141_SSI_BASE, false)); (void)SSIIntStatus(ADC141_SSI_BASE, true); // If no samples available from FIFO, then something is seriously wrong. // Assert will spin in infinite loop and indirectly cause WDG reset if (countSamples == 0) { __asm("BKPT 0"); } /* Sign extend differential ADC reading. The sign bit is in bit 14, so we * shift up and then down to do signed extend 14 bit reading to 32 bits */ recv <<= 18; recv >>= 18; ADCSample.sampleData[ADCPort] = recv; /* On the interrupt for the last input channel to be sampled, we send the * structure containing samples for all multiplexed input channels to the * queue for the Gen2 CS task */ if (ADCPort == (ADC141_NUM_CHANNELS - 1)) { (void)OSALQueueSendFromISR(adc141SampleQueue, &ADCSample); } // Select the next ADCPort to read in round-robin fashion and after // the last channel wrap around to first channel ADCPort++; if (ADCPort >= ADC141_NUM_CHANNELS) { ADCPort = 0; } MAX4052Select(ADCPort); }
OK, that is interesting. Can you move the breakpoint that is within the "countSamples == 0" before the call to SSIIntClear() and actually stop on the breakpoint and then capture an image of the SSI2 registers?
It seems the peripheral view window is not a reliable way to read the registers. I am relying on storing them to local variables and checking them in the watch window.
Here's two screenshots. The first shows the values I expect in the registers: Status = 0x7, Raw Interrupt Status = 0xA, Masked Interrupt Status = 0x2. The second shows the registers when the countSamples==0 breakpoint is hit: Status = 0x3, Raw Interrupt Status = 0x8, Masked Interrupt Status = 0x0.
Eric Wilson85 said:It seems the peripheral view window is not a reliable way to read the registers
That's not good. What debugger are you using?
Are you running an RTOS? Could any other thread be writing to SSI2 registers?
What is the timing of the TimerA interrupt? Is it possible that you ever start a second SSI2 transmit before you have had the chance to read the SSI RX FIFO? (Not sure if that would be an issue, but this looks like some sort of corner case condition.) Does the problem occur at code startup, or after several successful transmissions and receptions? I noticed all of our SSI examples clear the receive FIFO before starting. Sorry for all of the questions, but this problem has me stumped so I am trying to get more information that might lead to some solution.
Bob Crosby said:That's not good. What debugger are you using?
We're using Keil uVision with a uLink Pro. I guess I will need to contact Keil to understand why the peripheral viewer is not giving me the results I expect.
Bob Crosby said:Are you running an RTOS? Could any other thread be writing to SSI2 registers?
We are running FreeRTOS. Searching the code base for any other mention of SSI2 does not show any other code accessing it. There could, of course, be something accessing memory that it's not supposed to but I'm struggling to come up with anything that would do exactly what we're seeing.
Bob Crosby said:What is the timing of the TimerA interrupt? Is it possible that you ever start a second SSI2 transmit before you have had the chance to read the SSI RX FIFO? (Not sure if that would be an issue, but this looks like some sort of corner case condition.) Does the problem occur at code startup, or after several successful transmissions and receptions? I noticed all of our SSI examples clear the receive FIFO before starting. Sorry for all of the questions, but this problem has me stumped so I am trying to get more information that might lead to some solution.
TimerA triggers every 250uS. We expect the SSI transfer + timeout to complete in ~48uS - 50uS and I can confirm this timing through the System Analyzer window and trace data. This leaves an extra 200uS headroom where some other interrupts could come in but we're just not seeing that. Anecdotally the issue is more frequent within a few minutes of startup typically within the first 100k or so samples. If it was an issue of clearing the FIFO I would expect the issue would be the first sample, doesn't hurt to try though.
Greetings,
We can only agree w/vendor Bob's assessment of 'Interesting.' And the detail (especially the logic flow) presented by (both) poster & Bob must be applauded.
We are impressed that such issue was reported upon 'multiple systems' (not just one) - as the bane of diagnosticians is, 'Single Board/Device Anomaly.'
For the sake of 'completeness': (most issue analysis here (perhaps) 'underplays the 'big-picture' - this (may) be a 'system (not just an MCU issue!')
Our small tech group prefers to, 'Reduce our code to, "Only the issue at hand" - (while especially disabling any/all Noise, Power Drive or 'known' RF sources) - and then "Launching anew & carefully observing, 'Frequency (i.e. recurrence) of the offense(s)." (on multiple occasions - such 'discipline' (along w/the 'awareness' of system issues) has enabled efficient issue resolution...)
Since you are using SSI2 as a master, you should get a receive timeout interrupt after each transmission you make. Can you tell if we are getting an extra interrupt, or we are getting the right number of SSI2 interrupts, but the FIFO is empty? If you don't already know, maybe you can add a global signed variable that increments each time you transmit and decrements each time you receive a byte. That variable should alternate between 0 and 1. If it starts incrementing, then we know the receive timeout interrupt was correct, but the FIFO was previously emptied.
Where IS the '***LIKE***' Button?
Yet - the variability of failure occurrence argues (perhaps strongly) for potential 'System Issues.' (earlier noted...)
Bob Crosby said:maybe you can add a global signed variable that increments each time you transmit and decrements each time you receive a byte. That variable should alternate between 0 and 1. If it starts incrementing, then we know the receive timeout interrupt was correct, but the FIFO was previously emptied.
Well now this is interesting! I have done two things:
1. I have massively stripped down the system. At this point only timer0A and SSI2 interrupts are running. FreeRTOS is removed and all that is left after initializing is a while(1) at the end of main().
2. Added the variable as you suggested.
What I see is that the counter slowly increments. I set a breakpoint in the timer interrupt if the counter is greater than a small threshold and I can see from the system analyzer that the SSI interrupt did not occur. But how could the FIFO be emptied already? I have removed every other part of the application.
cb1_mobile said:For the sake of 'completeness': (most issue analysis here (perhaps) 'underplays the 'big-picture' - this (may) be a 'system (not just an MCU issue!')
- should there be, 'Mention of the 'remote SPI device' and its 'connection, separation distance & adequacy of powering?'
- and - at least temporarily - the switching to another of the MCU's SPI Channels (to determine if this (unusual) effect persists?)
- Further - the randomness of this distress (may) suggest that 'Power, Noise, or (unwanted) RF levels' prove suspect. All should be, 'Reduced as much as is reasonably possible.'
- Poster's MCU appears (Not) that populated w/in the 'LPad.' Might the '123 LPad' be pressed into service - as a 'quick/eased' means - to determine the 'extent of this issue?' (i.e. might this an 'MCU Class' issue?)
The remote SPI device is an ADC141S626. The chip select, clock and mcu receive pins are connected, the mcu transmit pin is not connected as the ADC only needs the clock train to transmit it's data to the mcu. The chip is ~1 inch from the MCU. Determining if the power is adequate will require I connect my scope which will take some work.
RF is a possibility though I have no in house equipment to determine the levels of my particular test board
I think I have a 123 lauchpad or two lying around. I will try to dig them out to test if I can eliminate some other possible issues first.
Could this somehow be related to reads by the Keil debugger? Is it possible to run this stripped down version toggling a pin if the count gets larger than 1 and then run it without the debugger attached monitoring the pin with a scope?
Bob Crosby said:Could this somehow be related to reads by the Keil debugger? Is it possible to run this stripped down version toggling a pin if the count gets larger than 1 and then run it without the debugger attached monitoring the pin with a scope?
Indeed it is! If i simply close the SSI2 peripheral view window prior to starting the application then I can read >2M samples without issue. As soon as I open the window again the breakpoint to check that the interrupts are balanced is hit. So it is a "system" issue of sorts.
Now the question I need to answer is if I can't trust this debugger to not interfere with the system is there one I can trust? Or will every debugger for this MCU present the same basic problem?
I don't know about the Keil debugger, but I know that TI's Code Composer Studio can suspend the CPU to do a read. The issue there is that system is no longer real time. Using the DAP (debug access port) the debugger can read memory (including peripherals) without affecting the CPU execution. However you need to be careful of any consequences of the read.
Excellently Diagnosed Greetings,
And bravo to those who diagnosed 'System Issue.'
To Bob - I recall that in the past (w/the LM3S devices) our use of the 'Register View' (from w/in IAR) caused similar 'errant readings.' IIRC this occurred (especially) w/in the ADC Register Group. Staff checked earlier - we could find, 'No such cautions' involving this (potentially) Error Generating (yet highly unexpected) condition. And ... importantly - this 'errant reading condition' occurred w/ALL debuggers (at least FTDI, Keil, IAR, Segger & under CCS). Again - there existed 'special warnings of 'Unique Register Sensitivity' (if you 'monitor that/those Registers - you 'spoil/invalidate' the data) - I believe that these have lessened w/time - and staff after nearly an hour of checking - could find no MCU Manual mention!
Might you know - and inform/advise - 'When & Where Normal Debug Usage can be expected to 'Wreak such Havoc?' (It appeared 'Not Everywhere' - at least in the past...) Such would be very much appreciated!
What remains concerning (still) is the fact that our past issue was 'Invariable' (i.e. always occurred). Yet poster here presents an issue which reveals 'Thousands of Successes' - thus (some) 'FUD' (necessarily) remains - does it not?
Yet more...
Follows two key notings from this poster:
"The issue is very random, we can do 1000s of data transfers without issue and then one will fail. We can run one system for days without issue and another system will only work for a few minutes."
As I recall now - it was the 'Past Usage of the Watch Window' - which 'Spoiled (SOME yet not ALL) of the values being watched!' (i.e. 'some' registers were deemed 'sensitive!') I recall this under (both) IAR & Keil - we were (never) big fans of (other) than a Vendor Agnostic IDE (CCS was thus disallowed by our key clients & investors...)
May we ask, 'Is THIS our thread's issue?' And - does the guidance to, 'Exercise extreme caution w/Use of the Watch Window - continue today?' And - are 'certain' Registers (especially) prone to 'such sensitivity?'
Just maybe - might this subject deserve, 'Making it into your (planned/teased) 'Update of the API?' (as a residual guideline...)
CB1,
You raise some very good questions. I will try to explain what I think is going on based on my knowledge of what's "under the hood" of the TM4C, but I am speculating on the actual implementation on IAR and Keil debuggers.
Using JTAG, there are two ways to read anything in the address space of the TM4C devices. The first method is to scan the M4 CPU jamming in instructions to do reads or writes and them restoring the initial state of the CPU. This is done with a global "suspend" signal asserted that keeps the peripherals from changing state (there are some programmable exceptions). This is the method TI's Code Composer normally uses and all debuggers use this method to read the internal CPU registers. The problem with this method is that it necessarily stops the CPU for a period of time.
The second method is to use a feature of the ARM DAP (debug access port). In this mode the DAP acts as a bus master which is controlled by the JTAG scan chains. Think of it as a mini DMA for JTAG. Using the DAP, the JTAG interface can sneak read and write cycles without stopping the CPU. The downside is that these read and write cycles are done without "suspend" being asserted so they have the same impact as if the uDMA had done them. In this case, a DAP read of the SSIDR popped a value out of the SSI RX FIFO. I suspect this is the mode being used by the Keil and IAR debuggers.
So why the random nature of the issue seen by Eric? I suspect that the Keil debugger used by Eric was periodically updating the watch windows while his code was running. If it just so happened that the debugger read the SSIDR register in the short time between when a transmission was received and when the FIFO was read by the CPU in the interrupt routine, it would inadvertently clear the FIFO. How often that happens would be a function of the interrupt latency and the update rate of the debugger.
What registers can be affected? Any register that is changed by a read could be affected. The obvious candidates are receive FIFOs.
(One day after ... Holiday Greetings Bob,"
Thanks much - very insightful - your "under the hood" description is (VERY) much appreciated. (especially by 'young staff (escaped) from their colleges - to hone their tech-skills (for college credit (of course) while 'extracting' (yet more) from our (bit miniaturized) Ft. Knox...)
We remain 'unsettled' by the huge variation (in error occurrence) this poster has reported. Again - reported was, "Running for days w/out issue" AND "Worked for only a few minutes!" We are not 'so sure' that this issue can be 'hung upon' (only) the Debugger.
Several of the Register Descriptions (w/in the 'cascade of specific Register Details' - w/in an MCU Peripheral) provide 'special note of Register Sensitivity!' (of course we've not yet been able to find - & concretely present for your comment and/or 'under the hood' flash-lighting...)
If our poster still lurks - were, 'All such tests conducted w/the Debugger ALWAYS ATTACHED?' And - if so - was that (and more importantly) IS THAT a 'Proper Procedure' - in light of Bob's keen findings? (Might 'Bob' kindly/when able - respond as well as the poster? Much thanks...)
[edit] 09:48 CST Staff reports that IAR's 'LIVE WATCH' enables the 'Live (i.e. REAL TIME) Viewing' of those Registers we deem 'KEY' - and appears to do so while AVOIDING the (dreaded) Register Alteration! (we have NOT (ever) attempted to 'LIVE WATCH' any of the FIFOs - we have today launched 'deep dive' into paid IAR's description of the various Register Watch Mechanisms ... and shall report...) Hopefully there may exist 'findings' which prove beneficial to MANY (Not just IAR users)...
cb1_mobile said:We remain 'unsettled' by the huge variation (in error occurrence) this poster has reported. Again - reported was, "Running for days w/out issue" AND "Worked for only a few minutes!" We are not 'so sure' that this issue can be 'hung upon' (only) the Debugger.
I suspect that it was different watch windows that were open in these two runs. I think Eric is the only one who can comment on the difference.
Hi All,
I am back in the office today after some days off and can report on some things.
When the unit "ran for days without issue" the unit was not connected to a debugger
When the unit "only worked for a few minutes" the unit was always connected to a debugger with the SSI registers watch window open and the "Periodic window update" function turned on. I believe Bob's conclusion is correct that the debugger's timing just happened to line up perfectly with the interrupt timing and read the data register before the code has a chance.
HOWEVER, I left the unit over the holiday with a debugger attached, the SSI watch window closed and the periodic update function off. I have landed on the countSamples == 0 breakpont again after ~678M samples (about 48 hours of runtime). From the trace data it appears that the SSI handler was called twice, the second call understandably did not have any data in the FIFO. I will investigate further and post another reply when I find something.
Hi Again,
I am now quite certain about our second issue. If the SSI interrupt is cleared prior to reading the data from the receive FIFO then the interrupt may re-trigger. We had experimented with moving the SSIIntClear() call around originally but did not see a difference because of the issue with the debugger. Based on my understanding of the datasheet, I didn't expect this behavior. As I understood, the receive timeout counter should start on the EMPTY -> not-EMPTY transition yet it seems that the counter runs any time there is both data in the FIFO and the interrupt is not pending.
The ISR code I am experimenting with is below. If SSIIntClear is called early the breakpoint will be hit after a few seconds because of the interrupt re-triggering. Any other point and the breakpoint will not be hit, including if SSIIntClear is not called at all.
#define SSI_INT_CLEAR_NONE 0 #define SSI_INT_CLEAR_EARLY 1 #define SSI_INT_CLEAR_AFTER_READ 2 #define SSI_INT_CLEAR_LATE 3 #define SSI_CLEAR_POINT SSI_INT_CLEAR_LATE void SSI2_Handler(void) { uint32_t countSamples = 0; int32_t recv = 0; #if SSI_CLEAR_POINT == SSI_INT_CLEAR_EARLY SSIIntClear(ADC141_SSI_BASE, SSIIntStatus(ADC141_SSI_BASE, false)); (void)SSIIntStatus(ADC141_SSI_BASE, false); busyWait_US(15); #endif // Read from the SSI receive FIFO, returns number of elements read countSamples = (uint32_t)SSIDataGetNonBlocking(ADC141_SSI_BASE, (uint32_t *)&recv); #if SSI_CLEAR_POINT == SSI_INT_CLEAR_AFTER_READ SSIIntClear(ADC141_SSI_BASE, SSIIntStatus(ADC141_SSI_BASE, false)); (void)SSIIntStatus(ADC141_SSI_BASE, false); #endif // If no samples available from FIFO, then something is seriously wrong. // Assert will spin in infinite loop and indirectly cause WDG reset if (countSamples == 0) { __asm("BKPT 0"); } else { ssiTxRxCounter--; } /* Sign extend differential ADC reading. The sign bit is in bit 14, so we * shift up and then down to do signed extend 14 bit reading to 32 bits */ recv <<= 18; recv >>= 18; ADCSample.sampleData[ADCPort] = recv; /* On the interrupt for the last input channel to be sampled, we send the * structure containing samples for all multiplexed input channels to the * queue for the Gen2 CS task */ if (ADCPort == (ADC141_NUM_CHANNELS - 1)) { (void)OSALQueueSendFromISR(adc141SampleQueue, &ADCSample); } // Select the next ADCPort to read in round-robin fashion and after // the last channel wrap around to first channel ADCPort++; if (ADCPort >= ADC141_NUM_CHANNELS) { ADCPort = 0; } MAX4052Select(ADCPort); #if SSI_CLEAR_POINT == SSI_INT_CLEAR_LATE SSIIntClear(ADC141_SSI_BASE, SSIIntStatus(ADC141_SSI_BASE, false)); (void)SSIIntStatus(ADC141_SSI_BASE, false); #endif }
Page 956 of the datasheet indicates that the receive time-out interrupt should be cleared just after the FIFO is read.