This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software: TI-RTOS
Hello all,
I'm using FPU in a SWI context (priority 15). When doing a comparison between 2 floats always expected to be equal sometime (about one over some million) I get comparison fail.
Placing a hw breakpoint shows that S0, and sometime also S1 report uncorrect value.
I've attached a screenshot where CPU is halted after comparison fails.
There S0 and S1 are loaded from addresses contained in R0 and R1, both locations contain the correct values, 30.152075 that is 0x41F13773. However, in S0 now I see 0 which is the reason breakpoint is hit.
By tracing flow with some homemade ram logger I see that problem occurs only if this SWI function is interrupted by a HWI, that I would consider not using floating point unit.
I'm using tirtos_tivac_2_14_00_10.
All I can think about is something that FPU regs are not saving at context switch and someone could use them.
Is there any way to tell OS to save them when preempting SWIs?
BR.
Lorenzo.
Hi Todd,
thank you for your quick response.
The Hwi is managed by the kernel, actually the interrupting HWIs are 2: (INT_CAN0_TM4C129 and INT_CAN1_TM4C129); their priority is defined below.
#define HWI_PRIORITY_CAN0INT (3 << 5)
#define HWI_PRIORITY_CAN1INT (4 << 5)
There are other 2 HWIs, constructed at runtime trough Hwi_construct(...), UART and ADC, they have (7 << 5) and (1<< 5) as priority; so no HWI is zero-latency interrupt as Hwi_disablePriority is equal to 32.
There is no ISR registered outside TI-RTOS.
When bk is hit, and also in any other moment, the FPCCR register shows ASPEN bit clear, should it be set or it's OK?
I've checked the disassembly of ti_sysbios_family_arm_m3_Hwi_dispatch__I and conditional instructions under __TI_VFP_SUPPORT__ are really present.
I'll try to recap in order to be as clear as possible.
Just last addendum, I put in the CAN HWIs a trace of S0 and S1 by calling at function entry and exit the following 2 fnc:
.global __get_S0
__get_S0:
vmov r0,s0
bx lr
.global __get_S1
__get_S1:
vmov r0,s1
bx lr
Both of them reported always the correct value (0x41F13773 = 30.152075)
BR
Lorenzo.
Hi Lorenzo,
First, can you confirm which compiler you are using. From the snapshot it looks like IAR, but please confirm.
Also, can you bump the priority up from (1<5) to (2<5). I know it should be fine since the Hwi_disablePriority is 32, but it's an easy test:)
Based on your description, we think your CAN ISR is corrupting the stack. What are the local variables in the ISR? Maybe a buffer is being overwritten and corrupting the D0-D7 registers that are saved on the stack? An easy test is to add a char buffer[32] as the first local variable. Initialize all the elements to some known value (e.g. 0xa5) first thing in the ISR. At the end of the ISR, confirm the buffer values are still that value. Note: you may need to bump up the system stack in case you are close to the top of it. You can check ROV->Hwi->Module to see the peak to see if you are close.
Todd
Hi Todd,
the compiler I'm using is TI ARM v5.2.2; the snapshot is taken from Trace32 Lauterbach which is my debug system.
I've moved priority from 32 to 64.
I've added also the 32 bytes guard buffer as first local variable.
The local variables were:
CanDrvChannelId canDrvChannelId = (CanDrvChannelId) arg; CanDrvChannelStructPtr canDrvChannelPtr; UInt32 canDrvChannelBase; CanMsgObjId canMsgObjId; UInt32 canMsgObjInts;
where:
/* CAN driver channel identifiers */ typedef enum { CAN_DRV_CHANNEL_PROCESS = 0, CAN_DRV_CHANNEL_SERVICE, CAN_DRV_CHANNEL_ID_MAX } CanDrvChannelId, *CanDrvChannelIdPtr;
CanMsgObjId is another enum ranging from 0 to 31 and
CanDrvChannelStructPtr is a pointer to a structure containing the base address of the relevant CAN peripheral and also some counter to be updated by HWI to provide outside some statistics.
Of course, I've checked I'm not running out of stack.
I also added an automatic check on guard buffer at the end of HWI: This way I count to intercept the problem with a hw breakpoint where corruption is asserted.
for (i=0; i<=7;++i) { if (buffer[i] != 0xA5A5A5A5) { corruption = TRUE; } }
The new FW is just up and running. Since a failure typically happens in 4-24 hrs, I'll let you know something more tomorrow.
Lorenzo.
Hi Todd,
the problem occurred again, please find the attached presentation.
In my check at the end of Hwi the unsigned long buffer[8] was found integer.By the way, when problem occurs I see those values as partially corrupted: only 3 word instead of 8. By looking at what is close the remaing A5 words I thin that the missing 5 are the one with higher addresses (0x200014D8--0x200014EB. I've recognized TIMER1_BASE in that area.
At 0x20001530 I see LR pointing to my interrupted SWI, (please find disassembly on the right); that is fully compatible with a corruption of S0-S7 in FPU context: in fact at this point S1 has still to be written, instead S0 is going to be recovered from corrupted stacked copy.
I also found in yellow what in my opinion was the stacked fpu context.
In light blue is the current SP.
The only positive side is that problem occurs systematically in some hrs, so I can take all data needed once cpu is halted in the bk.
Just a remark, I use nested interrupt.
regards.
Lorenzo.
Lorenzo,
“By the way, when problem occurs I see those values as partially corrupted:”
I interpret this statement to mean that something within the body of the Hwi function is corrupting the stack.
Am I understanding this correctly?
Alan
Hello,
I meant that after control returns to SWI, the 0XA5A5A5A5 words are corrupted. But I wouldn't think that corruption happens inside the body of HWI since that function ends with:
for (i=0; i<=7;++i) { if (buffer[i] != 0xA5A5A5A5) { corruption = TRUE; }
and I set a bk in corruption assignment statement. I got the problem without stopping there.
So, my guess is that part of memory is corrupted later in the dispatcher or somewhere before restoring SWI. Might be that RTOS can behave that way if not configured correctly?
Lorenzo.
You mention that zero-latency interrupts are NOT being used so this maybe won't help, but does applying the code change listed in TIVA SYS/BIOS FPU context switch corruption with zero latency interrupts prevent the problem?Lorenzo Verniani said:I'm using tirtos_tivac_2_14_00_10.
All I can think about is something that FPU regs are not saving at context switch and someone could use them.
SYSBIOS-208 was raised for the bug in the referenced thread, which was fixed in SYS/BIOS 6.46.00.23. tirtos_tivac_2_14_00_10 uses SYS/BIOS 6.42.01.20 and so has that bug.
Alan DeMars said:Lorenzo,
You say you have nested Hwis. Are all of the Hwi functions instrumented with the corruption buffer check? Perhaps the condition only appears when Hwi nesting occurs.
Alan
Chester,
I'll try to apply the change to the Hwi dispatcher and let you know. As I've post earlier, next test will run on 7th January.
Regards.
Hello,
I've tried to insert guard buffer also in Uart and ADC hwis.
More important, I changed the hwi dispatcher section taking care about FPUregs stacking as suggested by Alan in the post linked by Chester.
from
.if __TI_VFP_SUPPORT__ vstmdb {d0-d7}, r1! ; push vfp scratch regs on appropriate stack vmrs r2, fpscr ; push fpscr too str r2, [r1, #-8]! ; (keep even align) tst lr, #4 ; context on PSP? ite NE msrne psp, r1 ; update appropriate SP moveq sp, r1 .endif
to
.if __TI_VFP_SUPPORT__ sub r2, r1, #72 ; back up by 9*8 bytes tst lr, #4 ; context on PSP? ite NE msrne psp, r2 ; update appropriate SP before pushing moveq sp, r2 vstmdb {d0-d7}, r1! ; push vfp scratch regs on appropriate stack vmrs r2, fpscr ; push fpscr too str r2, [r1, #-8]! ; (keep even align) .endif
After more than 28 hrs of test problem has not yet occurred. Typically 6-10 hrs were enough to see it.
By the way, I can't yet figure out what happens: I'm not using zero latency interrupts. I have just only a swi (using fpu) preempted twice by 2 CAN hwis.
I'll continue the test and keep you updated.
BR
Lorenzo.
Hi Todd,
it's still working perfectly!
I'm going to keep testing till Monday.
Then I'll mark the post as resolved.
Thanks for now.
P. S. Is there any planned date for releasing a new TI_RTOS for Tiva? ☺️
Lorenzo.
Lorenzo Verniani said:it's still working perfectly!
Great!
Lorenzo Verniani said:Then I'll mark the post as resolved.
Thanks
Lorenzo Verniani said:P. S. Is there any planned date for releasing a new TI_RTOS for Tiva? ☺️
Unfortunately no. We don't have any schedule patch release for TI-RTOS for Tiva. You'll need to maintain the bug fix in the version you have.
Todd