RM48L950: RM48L952: Strange issue related to CAN/DMA system Continued

Westin Sykes

Part Number: RM48L950
Other Parts Discussed in Thread: HALCOGEN

This is a copy of a post which did not get a resolution and then was locked: e2e.ti.com/.../2331703

I'm having a really strange issue which I have been debugging for over a week now, so I thought I would see if anyone here had any ideas. I will explain the relevant parts of my system then explain what I am seeing.

I am using DMA along with CAN together to receive CAN messages. In this setup, all of the hardware mailboxes are dedicated to an individual CAN message except for the last one. The last one is set to trigger DMA to transfer the received message into a buffer which is periodically processed. This DMA channel has another DMA channel in its chain register. This secondary channel writes and indicator of what index in the CAN message buffer was written last.

We have an external real time clock which feeds a 32kHz tick into the NHET module. We use the HTU in order to retrieve the time current time.

This system has been tested and working for a long time, but we have one piece of code which has issues when any message gets into the last CAN mailbox and should be transmitted with DMA. When it receives one of these messages, the message is not transmitted, the time is never retrieved, and the memory browser in the debugger shows BAD0BAD0 for all values when refreshed until the code execution is paused.

This is the code we use to get the time. After receiving a CAN message and DMA starts processing it, it gets stuck at the while loop.

htuREG2->GC = htuREG2->GC & 0xfffeffff;  // Disable transfer unit
while((htuREG2->BUSY0) & 0x01000000);
SubSecTimestamp t = RTCBuffer;
htuREG2->GC = htuREG2->GC | 0x00010000;  // re-enable the transfer unit

In addition to this, the CAN message never shows up in the destination buffer. From what I can tell from the DMA registers, it thinks the CAN message has been transferred to the buffer and it is waiting on the chained channel to complete. There are no other DMA channels active, and it should only copy one byte, so it shouldn't take any time.

This behaviour happens when interrupts are enabled or disabled, so I don't think some interrupt is causing issues when the CAN message is received.

The thing that is very strange about this to me is that DMA and the HTU seem to both be effected, even though I don't know any way they are linked.

Here are register states for DMA before and after sending the CAN message.

dma_before_crash.txt

dma_after_crash.txt

The nhet and htu registers both look the same before and after the crash while waiting in the while loop I mentioned above.

nhet_htu_after_crash.txt

Thanks in advance. Let me know if any other information is helpful.

over 7 years ago

0 Omid (TI) over 7 years ago

TI__Intellectual 1470 points

Chuck Davenport

0 Westin Sykes over 7 years ago in reply to Omid (TI)

Expert 1315 points

Since this post, I did find some new information. In order to debug the code, I started removing every non-vital piece to keep the CAN code working. After significantly stripping it down, I commented out a line of unreachable code, and the problem resolved itself. This line that was commented was the last piece to be compiled in a compilation unit, so I assumed that the compilation unit was removed causing things in flash to move around and somehow affect the issue.

Later on, I found a workaround that we are not comfortable using without knowing that the root cause of the issue was. I found that changing DMA to use 32 bit reads and writes instead of 64 bit seemed to make the problem go away. I prefer to use 64 bit mode for performance, but I don't think 32 bit mode could break anything.

0 Chuck Davenport over 7 years ago in reply to Westin Sykes

TI__Guru 59540 points

Hello Westin,

My apologies for this getting overlooked previously. I am going to bring an associate into this as well who may also have some insight or advice to offer.

I will discuss it in more detail with him as well so we can collaborate to see if we can figure out what is happening.

0 Westin Sykes over 7 years ago in reply to Chuck Davenport

Expert 1315 points

Okay, thanks. It looks like it was partially my fault for not commenting with more information when I had it.

Westin

0 Chuck Davenport over 7 years ago in reply to Westin Sykes

TI__Guru 59540 points

Westin,

I wanted to ping this thread to let you know that I haven't forgotten about it. I have been reviewing it and trying to think of what might be causing it as it seems they would be unrelated so cause and effect seems unusual to me.

The only thing that seems to make since is that it is related to accesses that aren't aligned to the 64bit boundaries by the DMA. This would explain why changes in the number of messages or from 64bit transfers to 32bit transfers seem to resolve the issue. However, the fact that you are able to remove some unreachable code and it changes the behavior is even more unusual since program space isn't really involved at all in the DMA access or CAN messages.

Let me know if there are any new discoveries or insights on your end and I will keep looking at this and discussing possibilities with my colleagues on my end.

0 NeilBerry_at_Parker over 7 years ago in reply to Westin Sykes

Expert 1995 points

Can I ask why you are using DMA and not HalCoGen CAN Mailboxes?

0 Westin Sykes over 7 years ago in reply to Chuck Davenport

Expert 1315 points

Chuck,

When DMA is in 64 bit write mode, must the address it is writing to be 64 bit aligned? I will need to look to see if its possible it was misaligned. I think that could explain why removing the commenting out unreachable code fixed it. If the compiler was not optimizing out the unreachable code, but it was the last thing used from the compilation unit, it could have removed the compilation unit. If the compilation unit had global variables, it could have shifted my buffer which DMA wrote to. If it happened to push it onto a 64 bit boundary and that was the problem, it would have fixed the problem.

I will look into it more when I get a chance.

Thanks,

Westin

0 Westin Sykes over 7 years ago in reply to NeilBerry_at_Parker

Expert 1315 points

We did not want to be limited by the number of hardware mailboxes the MCU has. In order to get around this, we use hardware mailboxes assuming there are enough and then start using DMA if we need more. As far as I know, the HalCoGen CAN mailboxes could not give us this feature.

0 NeilBerry_at_Parker over 7 years ago in reply to Westin Sykes

Expert 1995 points

You might be able to setup just two mailboxes (one for TX and one for RX). It takes coding more than just setup using HalCoGen, but is possible to do. I have it on a RM46 HDK working that way.

0 Westin Sykes over 7 years ago in reply to NeilBerry_at_Parker

Expert 1315 points

If we continue having issues with our system, I might look into that. For now, when using 32 bit DMA, everything seems to work very well and takes advantage of the performance gains from using all the hardware mailboxes.

0 Westin Sykes over 7 years ago in reply to Westin Sykes

Expert 1315 points

After some testing, it looks like the issues was the alignment of the DMA buffer. It apparently always happened to be on a 64 bit alignment during my tests. I found that the code with the CAN issue had the buffer on a 32 bit aligned address. I was able to fix it by using the alignment pragma. Thanks for the help.

Arm-based microcontrollers

Arm-based microcontrollers forum

RM48L950: RM48L952: Strange issue related to CAN/DMA system Continued