This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
I kept thinking about this issue and I'm not sure it's a mem-to-mem copy problem. I have a feeling that an event happens at this moment and it just so happens that it's during the loading (copying) of the PRU code and the combination of both may cause the DSP to hang.
Coming back to the toggling of the port pin, I've seen since the start of this method that after approx 128 toggles, there's a period of approx 1.5 us where the toggling stops and then restarts again until the end of the copy of the PRU code table. It is during this "stall" period that the DSP may hang. I thought this might be a cache issue so before the loop, I invalidated all of L1P and the PRU code table in L2 (both calls wait until invalidation is complete). These calls add approx 8.5 us of processing time before the loop starts. I then observe that the same "stall" period during the copy is not shifted in time by +8.5 us and not falling at about 128 iterations but instead occurs approx 39 us from the call of the first of the two cache invalidate functions. In other words, I don't think the "stall" is a function of the number of loop iterations but a function of time since DSP reset.
To continue on this path, I added "nops" in the loop so that every iteration would be longer. I observe that the "stall" is about at the same instant since reset and is not coupled to the number of iterations.
I went on doing more tests and for each a different compilation of my DSP application. This has the effect of changing the moment since reset where the application calls the PRU load function. If the load function is called after "stall", there are not problems.
I don't have a precise measurement but this "stall" (I don't know how else to qualify it) seems to occur about 400 to 430 ms after the start of my DSP application. My app does not use the watchdog.
Note also that I have my own bootloader that reads a serial SPI flash containing the DSP app in AIS format, loads it and finally jumps to it. This custom bootloader exists because it also allows the reprogramming of the SPI flash. The bootloader is loaded in Shared RAM (L3), L1P and L1D memories remain in full cache configuration. The bootloader does not use the watchdog but uses one interrupt for SPI Rx. Interrupts are disabled before jumping into the DSP application.
Could there be something left running or not correctly reset/configured in the bootloader that can cause this problem? Any other ideas?
Thanks,
SC
This is going to be a difficult problem to diagnose over email :).
Few suggestions/questions
1) Do you think the issue would be reproducible if you were to replace the PRU code with just a simple DSP mem to mem copy to see if the problem still exists?
2) When you say you are not able to reproduce this emulator, what do you mean by that? WIth the emulator connected the problem does not exist?
3) Once the toggling stop, have you tried to connect to the processor via emulator to see where the DSP program counter and/or if it is truly in weeds does it give you an error connecting to JTAG, if there is an error, what is it?
4) The difference on L2 vs Shared RAM vs SDRAM could simply imply difference in behavior due to difference in access latency L2 being the smallest and SDRAM being the longest or caching? Can you see if decreasing the speed of the processor have any effect on the failure?
5) Can you try disabling caching to Shared RAM and EMIFB , this can be done by controlling the MAR bits for those memory regions
6) Are you absolutely sure the interrupts/exceptions are all disabled?
7) What is the DSP doing during the time of failure, are their events/data transfers that are happening around the vicinity of the hang?
Regards
Mukul
In additon,
Cheers,
Gagan
Hi,
Thank you both for your suggestions. I tried replying here the two weeks following your replies but my message could not be posted (nothing happed when I clicked "post").
I finally tracked down the problem. There was a bug in the bootloader were interrupts were not always disabled. The main application code was then booted and all required peripherals were configured with interrupts enabled... with the effect of unexpected behaviour!
I fixed the bootloader and added interrupt disactivation at the beginning of the application code.
Best regards,
SC