PROCESSOR-SDK-AM437X: Randomly occurring Exceptions relating to Null Pointer in DMA Callback Function

paul jacomb

Hi All,

We are using PDK1.0.4 with selected files from PDK1.0.7 in order to give support for DMA SPI Receive, which was not supported by the PDK1.0.4 issue/release. (Ti RTOS not Linux).

The reason for our not adopting a later PDK version in it's entirety is down to the fact that we made significant special modifications to the PDK1.0.4 release, which has effectively meant that for the most part we are stuck with this release.

We have the SPI DMA transmit on one processor working nicely, and now have the SPI DMA receive working on the other processor connected via SPI.

Everything seems fine for an apparently random time between a few seconds and 70 minutes (this is the longest we've seen the system run before crashing). We execute the DMA callback routine many, many times (normal rate in our test code is 10Hz) so in the case where it fails after 70 minutes that's 42,000 calls completed successfully before failing.

No other events are occurring which could be affecting this.

Are there any known problems with PDK1.0.7 which might be causing this behaviour?

Any pointers or suggestions would be most welcome.

regards

Paul Jacomb

over 5 years ago

0 Rahul Prabhu over 5 years ago

TI__Guru** 115880 points

Paul,

Could you please indicate if the SPI Master mode or SLave mode setup with the driver? Also, does the issue occur at a certain SPI speed? Can you please describe the master slave setup and what device/HW is used in the setup.

I am checking for know issues with SPI DMA for the version you indicated. I recommend that you look at the commit for SPI_v1.c in the git repo to see if your version may need one of these fixes.

git.ti.com/.../SPI_v1.c

Hope this helps.

Regards,
Rahul

0 paul jacomb over 5 years ago in reply to Rahul Prabhu

Prodigy 60 points

Hi Rahul,

Thanks for showing an interest! Both Master and Slave ends have been observed crashing in this way. We have changed the SPI speed from 3Mhz Clock rate up to 16MHz and this doesn't appear to affect the outcome.

We are using AM4377BZDNA80 processors on each end. Further investigation shows that the crash only seems to occur when we have both SPI0 and SPI1 active, SPI0 has one processor as Master, and the other SPI (1) has the other processor as master.

hope this helps.

regards

Paul Jacomb

0 Rahul Prabhu over 5 years ago in reply to paul jacomb

TI__Guru** 115880 points

Paul,

I am working with the developer in evaluating if this behavior has been previously been reported. while I didn`t come across a issue with the SPI driver, it does appear that such an issue has been reported with our UART driver recently :

e2e.ti.com/.../2814893

This makes me think that there may be a similar issue that we need to check for with other drivers. I will let you know if this is confirmed. In the meantime can you look at the fix provided for UART driver and see if that works for you with MCSPI. The fix required the driver to record the TCC being used with each instance.

Regards,
Rahul

0 paul jacomb over 5 years ago in reply to Rahul Prabhu

Prodigy 60 points

Hi Rahul,

Thanks for your contribution. We are still experiencing the problem, which is still randomly occurring after between 0 and about 90 minutes. Further investigation has shown that the problem always occurs on the receive side rather than the transmitter.

We were using SPI0 and SPI1 simultaneously with CPU-A being master on SPI0 and slave on SPI1 and obviously vice versa for CPU-B. Whilst experimenting with both SPI's active, we saw these exceptions occurring on both ends. However, by restricting activity to one SPI only, and disabling the other, we find that the problem only occurs on the receiving end.

Can you please expand on what the TCC is?

Many thanks

Paul Jacomb

0 Rahul Prabhu over 5 years ago in reply to paul jacomb

TI__Guru** 115880 points

Paul,

TCC is the transfer controller used in the EDMA for transferring the data.

Regards,
Rahul

0 paul jacomb over 5 years ago in reply to Rahul Prabhu

Prodigy 60 points

Rahul,

Thanks for your contribution to this problem.

We have resolved the issue, the problem was found to be a circular buffer (used for supplying messages to the SPI) which was not being correctly handled, resulting in occasional illegal length messages being generated, which were causing the DMA to write to areas of memory which were vital to the normal operation of the system.

Our board has been running now for nearly 18 hours without any fault.

Many thanks

Paul Jacomb

Processors

Processors forum

PROCESSOR-SDK-AM437X: Randomly occurring Exceptions relating to Null Pointer in DMA Callback Function