This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM6446 EVM DDR2 Memory Undefined Behaviour

We have been working on DM6446EVM. The silicon revision for DM6446 processor is 1.2.

We have been running our own Operating System, networking stack and application on the board using Code Composer Studio and XDS510 emulator.  While testing our application for longer duration, we have been seeing a problem for quite some time. Our program counter jumps to an undesired memory location which eventually results in an exception. The error can take from a few minutes to a few hours to appear. This test application is just a networking application that continuously transfers data using the TI EMAC ethernet controller at a high throughput.

While investigating, we have found that at the time of crash, the program counter holds a value that is distorted version of the expected value. For example, if the expected program counter value was 0x80011270, the program counter contains the value 0x82011270. We have thoroughly verified that its not our code that is changing the value of program counter in such fashion. As for any operating system, there are stack pop operations that modify the value of program counter but the memory location from where such values are popped contains correct values.

Furthermore, we have found that when the program counter is found to have this incorrect value, if we try to readjust the value to the correct one(0x80011270) and then try to execute the instruction, the program counter again returns to the incorrect value (0x82011270). However, if we update the memory location pointed by 0x80011270 i.e. update the opcode or flush instruction cache at this point, the processor is able to execute the instruction at 0x80011270.

It seems that something is wrong with either the DDR2 memory or the SDRAM controller that is controlling the DDR2 memory. We are using the GEL files available from Spectrum Digital website at:

http://c6000.spectrumdigital.com/davincievm/revf/

We have also tried the older GEL file placed at:

http://c6000.spectrumdigital.com/davincievm/revd/

We would like to know if there is something that can go wrong with hardware configuration or there is something related to the processor that can cause this issue.

The processor errata document does not contain any mention to an issue that looks similar. We would appreciate any help in this regard.

  • I just went through the Advisory 1.3.14 in the processor errata document (http://www.ti.com/litv/pdf/sprz241n) and found out that both ARM and EMAC are at the same priority (4) by default. When I decreased the priority for EMAC to 5, the issue became more visible occurring after every few seconds.

    Changing the priority to 3 seems more stable but I can only be sure when the tests have run for atleast another 12 hours without  a crash.

    Does anybody have an Idea what could be the reason? Is there a conflict occurring between the two bus masters for RAM access that is somehow causing this stall?

  • An update: the issue still occurs if I set EMAC master priority to 3 and the ARM CFG priority to 4. Therefore I have the issue occurring less frequently (from minutes to hours) when EMAC has priority higher or equal to the ARM CFG master and it occurs almost within a few seconds  when EMAC priority is lower than ARM CFG.

    I am interested in finding out what is going inside the System Interconnect that is causing this issue. Right now what we are seeing is a kind of stall that is occurring due to some sort of conflict or race condition between the bus masters and manifests itself as the issue that I am observing but only someone with an insight into hardware internals can tell for sure.

  • I also found Advisory 2.1. 2 Bus Priority Inversion Can Affect DDR2 Throughput.

    I have tried reducing the PBBPR value to 0x20. In one case, I configure the EMAC and ARM masters at same priority. the application still crashed after 5 hours.

    In a previous attempt, I tried PBBPR value at 0x20 with EMAC at a higher priority then the ARM CFG master. In this case, the system got more unstable than before with random halts and exceptions.

    Right now, I am experimenting with PBBPR set to 0x0. Ideally no starvation should take place after this change but if it does not work, I shall be left without any other tuning option.

     

  • Hi,

    Have you tried slowing down the DDR frequency to determine the cause of the problem (DDR or System related like priority)?

    Best regards, Zegeye

  • Hi Zegeye,

    I have tried running at DDR frequencies 162 MHz, 135 MHz, 126 MHz and 80 MHz. The issue appears at all of these.

  • Adding 54 MHz DDR frequency to list of failures as well.

  • Hi,

    Thanks for making rounds of experiments. I have discussed this issue with an expert he belives this has to do with improper EMIF configuration and this has been the issue with many customers. I was informed that sample configuartion should be available in the DM6446 documentation.

    Best regards, Zegeye

     

     

  • Hi Zegeye,

    We are using the GEL file supplied by Spectrum Digital which means that the EMIF is set to the default values set in this file. I would investigate if the values need to be changed.

    I am interested in knowing how EMIF settings are causing this problem since we are running our code from DDR2 RAM which is connected to the DDR controller rather than EMIF. Moreover, we are using the Code Composer Studio to connect and initialize the target therefore bypassing flash based boot code. Is it that somehow incorrect EMIF configuration affect DDR2 controller as well?

     

  • It seems that I confused EMIF with EMIFA. Since DDR2 controller and EMIFA are both components of EMIF, I assume that you are pointing towards incorrect DDR2 initialization since only that is applicable here.

  • Hi,

    I have discussed your issue with others and this is what they are proposing.

    1st, this is a very mature device and they are surprised you are running into issues like this. For this reason, they have suggested that this issue you are facing is most likely s/w related as opposed to h/w. You will need to make sure that you are using the latest board support package, SDK, etc ...

    You also want to find if there is a TI office near by for a local support.

    Your other avenue is to contact Spectrum Digital support.

    Best regards, Zegeye

  • Hi Zegeye,

    I believe that I have been able to solve the issue though the real cause still remains to be unknown. We have developed our own BSP so we could not use the Board Support Packages from TI but I consulted the ubl and uboot code for the board to use the, PLL settings, DDR2 timings and EMIF configurations as used by these bootloaders. That didn't solve our problem.

    In the end we found out that we were placing our interrupt vectors in the SRAM at address 0x02000000. Moving the interrupt vectors to DTCM at 0x00008000 resolved the problem and the crash doe snot appear after executing the test for longer duration (we are still testing it though).

    That begs the question why the issue was occurring with interrupt vectors on SRAM. Is it because it is an external chip connected to EMIFA interface and hence slower than we would expect an internal SRAM to be? I hope that we do not end up finding out that the issue only got hidden due to the timing difference between the memories and can still appear sometime for a long duration run.