This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi all,
I'm looking for some troubleshooting help.
I'm debugging part of our existing code base, so I am not the person who wrote this code, so I need some insight into an error.
We have an SD Card connected via SPI.
Running through the code I am able to read and write from multiple files. I can read and write from a persistent log file.
But In my code I write a 1024 byte array to a measurement file, 0:/LOG/V0000000.TXT, everything works fine.
I then loop through my code again (there are other subroutines and separate tasks that I've commented out.) and rerun my measurement. Everything seems to happen just fine,
Then I go to remount the SD card, and I get a HWI Exception as seen below.
I assume I have some sort of reference error though I am not sure how to parse the information below. Using the debugger I've tracked the error to the first time I touch the SD Card after the program loops. I've verified that I'm not touching the SPI lines, or other GPIO elements, I do make a call to the UART as part of the measurement, (it all works fine) I've tried not unmounting the SD card, and just opening a file, and I get the same error. Prior to this file write I write to other existing files, and they seem fine (though their file names are constant.) I'm at my wits end with this. It was working fine, until I updated a different part of the program... (which doesn't touch on the SPI Bus.)
I've run out of ideas for what might be causing it. Can anyone shed light on what this error suggests the issue is?
ti.sysbios.family.arm.m3.Hwi: line 1148: E_hardFault: FORCED
ti.sysbios.family.arm.m3.Hwi: line 1225: E_busFault: PRECISERR: Immediate Bus Fault, exact addr known, address: 06050611
Exception occurred in background thread at PC = 0x00011016.
Core 0: Exception occurred in ThreadType_Task.
Task name: {unknown-instance-name}, handle: 0x20009850.
Task stack base: 0x200098a0.
Task stack size: 0x1000.
R0 = 0x06050605 R8 = 0xffffffff
R1 = 0x00000000 R9 = 0x00000000
R2 = 0x00000000 R10 = 0xffffffff
R3 = 0x2000a5d0 R11 = 0xffffffff
R4 = 0x00000000 R12 = 0x20009710
R5 = 0x2000a598 SP(R13) = 0x2000a568
R6 = 0x00000000 LR(R14) = 0x0000258b
R7 = 0x06050605 PC(R15) = 0x00011016
PSR = 0x01000000
ICSR = 0x00418803
MMFSR = 0x00
BFSR = 0x82
UFSR = 0x0000
HFSR = 0x40000000
DFSR = 0x00000001
MMAR = 0x06050611
BFAR = 0x06050611
AFSR = 0x00000000
I've tried looking through documentation, but it hasn't been much help, Anyone who can point me in the right direction for troubleshooting, it would be of great help.
Hi,
I sent it to a concerned engineer. We will get back to you ASAP. Please bear with us.
Thanks,
PM
Hi,
If your SD card code was working before, and you didn't change any of the SD card code then the issue might lie not with any of the SD card functionality but with the underlying RTOS state. Are you using TIRTOS? If so, I suggest you use the RTOS Object Viewer to examine the state of the RTOS and see if there is anything obviously wrong that you can identify. Some instructions on how to launch it can be found here:
https://e2e.ti.com/support/wireless-connectivity/wifi/f/968/p/733294/2707429#2707429
In particular, I suggest you examine your task stack sizes, and heap memory usage. If you added extra code to your program, you could have exceeded the amount of memory available, causing program corruption and thus the bus fault that you see once the MCU tried to access an illegal memory address.
Please let me know what you see with the RTOS object viewer.
Regards,
Michael
Hi Michael,
I have tracked the heap memory, but that didn't seem to be the issue. (See Image Below.)
However, further troubleshooting has found that there was an NVM write occurring that seemed to be the cause of the failure. By commenting out the NVM write and reads I was able prevent the error from occurring running some 200 iterations of the read write cycle on the SD card without issue.
But I am still troubled.
After 2 days of adding everything else back with the exception of those NVM writes (they store adaptive settings for our instrument.) The functionality continued without incident for over 1000 cycles. So I added the NVM writes and reads back with the intention of troubleshooting the failure. My issue is that now I am unable to reproduce the failures. I'm still attempting different combinations of scenarios to see if I can identify why this failure occurred in the first place.
We have at least 4 documented occurrences of this failure occurring in field testing units over the past year. It seems to me that it is something that occurs when there is a confluence of more than one factor, but what those factors are, I'm unable to isolate at the moment. I am not confident in my assumption that the NVM write is directly responsible for the error (I will continue to verify that the code is behaving) or if by removing the NVM writes I simply inadvertently altered either a memory location or a persistent failure.
So I have two main questions in my attempt to troubleshoot this further.
1) Do you know of a mechanism in which an NVM write could cause a failure such that when the SD driver attempts to mount the file system an error occurs but not prior to this, (multiple UART communications and memory writes occur to store the data from the instrument write.)
2) Where or how might I get an Idea of what elements are stored in what parts of memory such that a busfault as thrown in my original question can be diagnosed. I tried looking at previous forum threads but suggestions there don't seem to be applicable (ie I can't find a file hw_memmap.h etc.)
I apologize if either of these have an obvious answer, I was dropped on this issue due to a change in manpower and embedded is not my primary area of expertise.
Cheers,
GE
Hi,
It's good to hear that you've narrowed down the issue you were seeing to the NVM accesses.
It's not so good to hear that adding the NVM accesses back in doesn't result in the issue reproducing itself again. This makes me think that the root cause of the issue doesn't have to do with the NVM accesses.
For these NVM accesses, are you using the NVS driver? Also, are you using the internal flash, an external flash device, or something else as your NVM?
I suspect that when you reintroduced your NVM code, the compiled output changed to the point where the failure might not occur in the same way as before. With the sort of bus fault you are getting, it is possible in the new binary the same error case is occurring, but the address read is a legal address within the memory map of the device. This executable memory map can be seen within the MSP_EXP432P401R_TIRTOS.cmd linker file, and within section 1.4 of the TRM: ti.com/lit/slau356
One thing you could do is look at the PC and LR of your error case, and see if you can identify what functions may be causing the bus fault directly. If you look at your project's debug folder with the .out, there should also be a .map file. This file will map every compiled object to a memory address. When you combine this with looking at the raw assembly code, you should be able to track down what instruction in which function causing the abort. From there, you can look at the register values to see how your MSP432 got into its bus fault state.
You can load your old .out binary if you still have it manually through the debugger. See this E2E post for pointers: https://e2e.ti.com/support/legacy_forums/embedded/tirtos/f/355/t/557599?how-to-re-establish-debug-connection-after-hibernate-reset
Once your old binary is loaded, you can look through the memory browser and assembly code to try to trace back what went wrong. If you cannot reproduce the error with your new binary, then that is probably the most straightforward method to debug your issue.
Regards,
Michael
Hi,
I assume that you have resolved your issue since I have not heard back from you. If not, feel free to post a response to this thread, or open a new thread regarding this issue.
Regards,
Michael
**Attention** This is a public forum