We have been working on DM6446EVM. The silicon revision for DM6446 processor is 1.2.
We have been running our own Operating System, networking stack and application on the board using Code Composer Studio and XDS510 emulator. While testing our application for longer duration, we have been seeing a problem for quite some time. Our program counter jumps to an undesired memory location which eventually results in an exception. The error can take from a few minutes to a few hours to appear. This test application is just a networking application that continuously transfers data using the TI EMAC ethernet controller at a high throughput.
While investigating, we have found that at the time of crash, the program counter holds a value that is distorted version of the expected value. For example, if the expected program counter value was 0x80011270, the program counter contains the value 0x82011270. We have thoroughly verified that its not our code that is changing the value of program counter in such fashion. As for any operating system, there are stack pop operations that modify the value of program counter but the memory location from where such values are popped contains correct values.
Furthermore, we have found that when the program counter is found to have this incorrect value, if we try to readjust the value to the correct one(0x80011270) and then try to execute the instruction, the program counter again returns to the incorrect value (0x82011270). However, if we update the memory location pointed by 0x80011270 i.e. update the opcode or flush instruction cache at this point, the processor is able to execute the instruction at 0x80011270.
It seems that something is wrong with either the DDR2 memory or the SDRAM controller that is controlling the DDR2 memory. We are using the GEL files available from Spectrum Digital website at:
http://c6000.spectrumdigital.com/davincievm/revf/
We have also tried the older GEL file placed at:
http://c6000.spectrumdigital.com/davincievm/revd/
We would like to know if there is something that can go wrong with hardware configuration or there is something related to the processor that can cause this issue.
The processor errata document does not contain any mention to an issue that looks similar. We would appreciate any help in this regard.