This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hello,
For our project we have implemented our own SPI driver (GSPI).The driver works as a slave and receives data from an external board using a clock speed of 20Mhz. The implementation uses the DMA in Ping-Pong mode, since we have other tasks running in the system and wanted to avoid unncessary overhead. The environment is FreeRTOS with the latest service pack and SDK.
The use-case that fails, involves a transfer of1440 bytes. The primary DMA channel is setup to transfer 1024 bytes and the alternate to transfer 416 bytes, then the peripheral is enabled. It is secured the master does not start sending before the setup is performed, the SPI peripheral is enabled and the DMA configured (the master starts sending only after CC3220s sets a pin to high).The start data is present in the buffers, also the primary DMA is always succeeding in transfering the data from the bus.
Sometimes, the alternate channel misses some bytes(1-400 bytes, varies from execution to execution).
If I enable the SPI overflow ISR then I can see the Interrupt is triggered in the cases when the alternate fails to perform the transfer. The SPI FIFO contains data in these cases.
The TX FIFO is disabled, the TX DMA is not used, we are only interested in reading data from the bus for this use-case.
If, before starting the transfer, I suspend all FreeRTOS tasks and busy-wait for the dma to finish, the transfer will succed almost always (fails ~1 in 20000). If I don't suspend the tasks, it fails 1 in 80 tries.
There are no other participants connected on the SPI bus, just the master and one slave (CC3220S). The DMA error ISR is not triggered. Checking the DMA registers 1 second after the transfer is started (the timeout we implemented for a transfer) or during the RX overflow ISR shows the following:
The current theory we have is that some other task is running, accessing the AHB and taking priority from the DMA (manual states the DMA uses idle bus cycles and that the CPU always has priority). This leads to FIFO overflow and then loss of data. The dramatic improvement we get by suspending other tasks seems to support this. However the other tasks we have running perform no significant access to the AHB (taking 4 adc samples every 1 second, very small length UART prints). There is also the SL Task where it looks like some SPI polling transfers are performed that migh keep the AHB busy.
In order to adress this we have increased the DMA arbitration size to 8 words (word is 1 byte) and SPI RX FIFO trigger level to 8 also, so it matches. We have tried setting the UDMA_ATTR_HIGH_PRIORITY property to the DMA and also UDMA_ATTR_USEBURST. These have shown no significant improvement. We have also tried lowering the speed to 10Mhz, also no improvement.
Could you provide some insight on some points we might have missed ?
Is it actually feasible that the DMA loses arbitration to the CPU enough for the SPI RX FIFO to overflow ?
Hi Radu,
In my testing I have not experienced data loss due to suspected bus contention with 20MHz SPI + DMA. This was despite me running a video streaming application where not only was a video encoder was sending the CC3220 SPI data at 20MHz, but the same RX data was also concurrently passed to the NWP for Wi-Fi TX over the internal 20MHz SPI. However, it is possible that you could be seeing bus contention due to the scheduler or some other FreeRTOS interrupt preventing the DMA peripheral from performing the ping pong buffer switch in time to prevent the FIFO from overflowing. My tests were done on TIRTOS, with very little functionality other than video streaming.
Some thoughts I have on the scenario you describe:
Regards,
Michael
Hi Michael,
Thank you for the quick reply!
We actually also have HD video streaming working with the same driver, but in another use-case. In the video use-case we get the data from SPI (20Mhz also) and then send it the user over a socket, so it's also going to the NWP. That use-case works without issues. However, it does not work concurently, as you have mentioned (more about this at the end).
However, it is possible that you could be seeing bus contention due to the scheduler or some other FreeRTOS interrupt preventing the DMA peripheral from performing the ping pong buffer switch in time to prevent the FIFO from overflowing.
I don't think this can be the case, because when transfering 1440 bytes, we set both the primary and alternate structure of the DMA. There is no need for any further buffer switch to be done by software because the entire data can be handled by DMA primary and alternate. Basically, the software just waits for the DMA to process both primary and alternate structures and then the transfer is done. The switch from primary to alternate is done automatically by the hardware. It is a good question if having interrupts disabled (for some critical section in the code) would affect the DMA peripheral's internal switch from primary to alternate, but I expect that not to bet the case. If the transfer would exceed 2*1024 words then it would become relevant if the ISR is serviced in time to reconfigure the primary or alternate, whichever has completed transfering its chunk of data.
1. The code is completely custom, I can provide it privately
2. I'm not sure I follow the point. The TX FIFO is disabled, so there are no ISRs, not even pending. They TX ISRS are disabled. The SPI peripheral just outputs the value that is found in the TX register repeatedly for as long as the clock is present. Normaly that value is 0 but I changed it to some known value and I can see it on the bus.
3. Actually, there is a pattern, I'll attach some files so you can compare. I've rearanged the first failed packet to make it easier to see, I'll also add some comments in those files. The pattern always shows one byte getting corrupted, some correct data, some missing data and then some correct data until the end when there's some corruption again. In case of the corrupted transfer, at the end, there will be some data from the previous transfer. This is because the buffer is not cleared between succesive reads of 1440 bytes, so you can ignore it that part. Also, we normally abort after the first failure but I disabled the abort so the pattern is more visible in the logs.
4. The CS is asserted for the entire transfer. I actually checked the integrity of the bus signals and those are ok. Considering the failure rate drops dramatically by suspending tasks on CC3220S and that the video streaming part works without issue, I think we can rule this out. I can still take the traces if you think it's necessary.
5. Not without considerable effort
6. When we started developing this project, we actually tried to use the stock driver, but it didn't work out for us. We hit some bugs (I've seen some fixes related to this in the SDK releasenotes in the meantime) but we also needed a flow control mechanism tighly integrated in the driver (to initiate transfers from both slave and master). We were also worried about performance, so we went with the ping-pong approach.
The ping-pong approach worked well for a while because we were busy-waiting in a high prio task for the SPI transfer to finish (almost the same as suspending all other tasks since we were waiting in a high prio task). When the busy-wait was removed, the isssue I'm describing surfaced.
Actually, since I've started this thread, I've found out an event that is directly correlated to this failure:
The SPI driver will setup the two dma transfers for the PRI and ALT DMA structures and then wait in a non-blocking manner using OS mechanisms (this happens in the SPI task context). We use an IOT conectivity solution that receives a keep-alive message every 10 seconds and then sends a reply (all this over an UDP socket ~ 160bytes). The IOT code runs in its own task and basically waits on a socket for keep-alive messages to arrive. Whenever a keep-alive message arrives, SPI transfers immediately after will fail as described.
I suppose there is some data exchange over LSPI in this case because of the IOT task socket read/write. When you mentioned "concurently" did you mean that both the NWP SPI (LSPI) and user SPI (GSPI) were transfering data at the same time, or was there some sort of serialization ?
If, instead of suspending all tasks, I just suspend the IOT task, the failure becomes very infrequent (same as suspending all tasks).
In our video streaming use-case (the one that's working) the reading of data using GSPI and the IOT solution network access are actually serialized due to architecture, so the IOT solution doesn't perform network access while the SPI is transferring data.
In case there really is a bus arbitration conflict, is there a way to pause al NWP interractions ? There is a hardware spinlock driver but it's not documented in the TRM and the implementation seems to be focused on the SPI used for the flash.
Another question would be if memory layout is relevant. The RAM is composed of 4 banks and it seems each of the banks is connected to the AHB separately. Did you experience any congestion issues that were solved by controlling for such factors such as RAM bank used for concurent reads/writes ?
Regards,
Radu
Hi Radu,
With regards to the hardware traces, I only wanted to make sure that you have performed that sanity check on your end. If you do not see any irregularities on the hardware SPI signals that correlate with the corrupt segments, then there is no need to provide those captures to me.
Also, the zip file that you have provided seems to be inaccessible. If you could reupload that so I can confirm the nature of the corrupt data pattern, that would be great.
One other thought that has come to mind - while you are using your custom driver for the GSPI, are you using this for the LSPI as well, or is the LSPI connecting the MCU to the NWP still using the stock SPI driver from the SDK?
As for the NWP behaviors you bring up in the end of your post, the test I did was effectively serialized. The data in from the video encoder would be passed to the NWP, and there wasn't anything else the NWP was doing other than send this video data over UDP so it's somewhat different from your case where the NWP could do some data transfer independent of the GSPI.
There is no way built-in easy way to pause NWP interactions. You would need to modify the host driver yourself to block on some custom mutex you insert that would prohibit the host MCU from using the LSPI during your GSPI transfers.
I don't know if the RAM layout of the device is relevant to the issue, as I do not have the needed expertise with the device architecture. I suppose that may be something that you could test on your end, simply by creating a new memory region in your linker settings and assigning your GSPI buffers to that memory region.
Regards,
Michael
Hello,
We use the custom driver for GSPI only, the rest is stock.
I have reattached the archive.
Thanks,
Radudata_fail.zip
Hi Radu,
Thanks for resending, I'll take a look.
Another question: for the FIFO trigger and the DMA ARB size, have you tried setting it to 1 byte like how it is in the original SPI driver? Does that make the corruption more or less likely?
Regards,
Michael
Hello Michael,
I have tried. We actually have had it like that when we found the issue. I changed it at some point along the way to 8 and also tried to use the burst attribute of the DMA in hope that it would would clear potential arbitration issues, but it didn't prove to be a solution.
I would say the 1 was slightly better than 8 but I don't have precise metrics for support this. At this point it seems that suspending the task that is reading from the socket is working to solve the SPI issue, but this is causing functional issue that won't be acceptable.
Thank you for supporting on this !
(I'll try to send you the driver code in private)
Radu
Hi Radu,
Thanks for sending me your code. I'll review that and see if there is anything concerning. I'll let you know what questions I have for you once I finish reviewing that.
Regards,
Michael
Hi Radu,
I reviewed the code, and do not see anything that may the cause of your issue.
As you mentioned, given that this issue only seldom occurs it is most likely not some obvious misconfiguration. If that was the case, then the code would fail much more often.
Something that is interesting is that the failure in your case happens at the DMA primary -> alt transfer switch. In my testing, the types of corruption that you observe in your buffers where it seems like only a few bytes total are missing and shifting the rest of the data generally happened when the data transferred was greater than 2 DMA buffer sizes, i.e. the SPI ISR handler had to reset and queue additional DMA transfers. As such, in my case it was due to the ISR not loading subsequent transfers in time.
Another oddity is that you only ever see corruption on the alternate channel, and never on the primary. I wonder if the priority of the alternate channels is less than that of the primary channels? If that were the case, then while the primary channel is running on the GSPI, it will have priority over the LSPI, but not when the alternate channel is being used. This may be testable, If you were to run a software DMA channel transfer between two RAM locations instead of using Wi-Fi. If you were to have this software DMA transfer running continuously and the same type of corruption was seen, then it would be a strong indication of that prioritization issue. A further test would be to have the software transfer use DMA ping pong as well, and see if the issue only occurred when the alternate of the SPI DMA and the primary of the software channel was active.
On another note, assuming that the DMA ping-pong issues cannot be resolved, would workarounds like implementing a checksum per packet work?
Finally, is it possible to make the CC3220 the SPI master? In master mode the CC3220 will have control over the clocking of the interface and so should stop clocking the interface if the FIFO is full, avoiding this issue.
Regards,
Michael
Hello Michael,
Thank you for supporting on this. In the meantime, we have added trace points in the code to determine what exactly is the task listening on the socket doing when the SPI transfer fails.
As mentioned, the task listeting on the socket is running an IOT solution and there's a keepalive arriving every 10 seconds. This keepalive message also contains a timestamp, and we use it to update the RTC time, but not on every message, only once every 10h or if the deviation of the RTC is too large.
The result is that it's not the LSPI transfer that's causing the issue. We assumed it's the LSPI that's causing this because two SPIs running at 20Mhz seemed like a lot of load and the nature of the uDMA (using idle cycles) would make it vulnerable to arbitration issues. It turns out that, after the socket unlocks, the socket task is calling:
unsigned long ulSecs; unsigned short usMsec; MAP_PRCMRTCGet(&ulSecs, &usMsec);
Every time this call is done, the transfer will get corrupted. We ended up using a mutex to make sure this is never called when a GSPI transfer is in progress and, while long term stress testing is in progress, after ~24h of test it seems the issue is gone. Just commenting this function call will make the transfers no longer fail, but we used the mutex because we need the functionality.
I couldn't find anything in the spec to justify this (maybe the access to the hibernate controller register is slower ? but so far it seems to fix the issue.
So, the current wild though is that the access to the HIB controller register is somehow breaking the GSPI. We'll run tests and I'll come back with the results. Do you know if there is some limitation we should know about when accessing these non-volatile registers ?
For example, it seems
PRCMHIBRegRead
has built-in 200us busy wait, I wonder what's the reason for it (can this fail to fulfill its goal in multi-threaded environments ?)
The hibernate controller the documentation focuses mostly on API persentation, maybe you have access to more details about this and can draw a better conclusion
EDIT: Performing CRC check and retry was our go-to solution unless the true cause was found. For now, it seems like we don't need to to it if this mutex solution works. As for making CC3220S SPI master, this is excluded because the other chip of on the board has a broader set of issues running as a SPI Slave.
Regards,
Radu
Hi Radu,
Thank you for your debug effort in this issue, and your work in finding that PRCMRTCGet() interaction with the GSPI.
I unfortunately do not have any information regarding the hibernate controller or the PRCM in general beyond what is in the TRM. Thus, I cannot say why the read to the RTC register seems to interact with the GSPI.
If controlling the PRCMRTCGet() calls works for your system, then I suggest you keep using that for the time being.
Let me know if you re-encounter the GSPI issue, or if you had more questions on this topic.
Regards,
Michael
Hello,
For the time, the mentioned mutex seems to slove our specific issue. We still have some strange issues that we very rarely experience and basically couldn't reproduce in a systematic way in order to analyze, so I'll come back with some questions when we do, if needed.
However, I have found this post. It's reltated to the CC3200 SimpleLink™ Wi-Fi®SDK v1.0.0Release Notes, where it is stated:
"Hardware Registers (32 bit wide) located in MCU memory map from addresses 0x4402 F800 to 0x4402 FC94 , by virtue of their connection on the device bus matrix, require a much longer access cycle compared to all other registers in the MCU memory map. Uncontrolled back to back accesses mapping the above mentioned registers could interfere with the data path efficiency / stability of the device".
The memory range seems to be the same for CC3220S and serves a similar function as it did on CC3200.
The way I understand it, it means no two consecutive accesses should be done in that area in less time than the 200usec implemented as a busy wait in the register read/write functions (PRCMHIBRegRead/PRCMHIBRegWrite). The busy wait does not secure the minimum access time in a multi-threaded environment (and we actually perform such an acess from multiple tasks).
What is the correct interpretation of that text, can you share your thoughts on this ?
Would you suggest we protect that access with a critical section ?
Many thanks,
Radu
Hi Radu,
Yes, if you are attempting to access those hibernate registers then you should ensure that the access time restriction is met. The 200us in the driverlib file should be sufficient.
You are correct in your assessment that in an RTOS environment, it is possible that back to back accesses could occur despite the built-in busy wait. Thus, you should use a mutex or similar RTOS object to ensure that does not happen.
Regards,
Michael