CC2538: CPU heavy load effects on SPI speed

Part Number: CC2538

Hi team,


My customer would like to confirm whether any chance the SPI speed could be affected when the CPU is having heavy load tasks processing ? 
such as SPI CLK become slow or the data is late in respect to the CLK.

It seems like they found this phenomenon on their system.

 

best regards,

Kenley

  • Hi Kenley,

    Here is our previous E2E thread.  Is this the same project, and if so then is the CC2538 still operating as the SPI slave?  I would not expect the SPI CLK speed to be affected, since this should be driven by the camera master, but SPI throughput can certainly be reduced when the CPU is so loaded that it cannot refill the SSI FIFO with more data which the customer may consider "late".  Can they provide logic analyzer or oscilloscope screenshots of the SPI CLK slowing down or the asynchronistic data?  They could consider implementing critical sections around their SPI operations but this would affect the processing times of the other tasks creating the heavy loads.

    Regards,
    Ryan

  • Hi Ryan

    Thank you for your support.

    Yes it is the same project and CC2538 still operating as the SPI slave.

    They are packing data into TX-FIFO by DMA.Communication frequency is 3 MHz.

    ① Sometimes, the transmission is out of sync by byte. What are the possible causes and countermeasures can they apply ?

    ② If the CC2538 does not keep up with the master clock, Is there a possibility that the RX-FIFO will be stacked out by bit?
    If so, is there a way to avoid this without lowering the master clock?

     

    Best regards,

    Kenley

  • Can the customer provider a logic analyzer or oscilloscope screenshot of the issue, or at least further describe what's received versus what is expected? Is this a byte or single bit out of sync, and are subsequent bytes affected? Only RX is affected but not TX?  What is the failure rate and can it be reliably replicated?  Is there any correlation between when failure occurs and what else is operating at the application/core at this time?

    The customer could try wrapping their SPI operations inside of critical sections, raising the priority of the SPI task, or reducing the overall loading.  If these are not suitable then the customer may need to consider mitigation by catching the issue and alerting the SPI controller through status or CRC bytes to re-transmit.

    Regards,
    Ryan

  • Hi Ryan

    Thank you for your support.

    Let me try talk with customers to provide the data available for us to analyze.

    At the same time, customer want to confirm about the settings of slave mode clock SSI.

    Previously you said that, "For slave mode, the system clock or the PIOSC must be at least six times faster than the SSIClk."

    But after checking the cc2538 driver library user guide, CC2538 Peripheral Driver Library User's Guide (Rev. A),

    It is written that the clock should be 12 times faster for slave mode.

    The ui32BitRate parameter defines the bit rate for the SSI. This bit rate must satisfy the following clock ratio criteria:
    FSSI >= 2 ∗ bit rate (master mode)
    FSSI >= 12 ∗ bit rate (slave modes)
    where FSSI is the frequency of the clock supplied to the SSI module.

    A: Does the SSIConfigSetExpClk need to be used to run up to the 5.33 MHz limits on swru319c.pdf without the driver and require additional manual register adjustments?
    Customer understands that the CC2538 itself is an SPI slave capable of theoretically 5.33 MHz.
    They would like to communicate at 3 MHz this time, so if you need additional settings for that, please let us know.

    B: Why are the limits different between the two manuals? There is a seemingly contradictory description between the two materials, but customer would like to know the reason for the description and the intention of the description.

    Best regards,

    Kenley

  • Hi Kenley,

    This difference between the TRM and Peripheral Driver Library User's Guide could possible be the difference between max allowances tested at the physical device peripheral register level versus overhead from the software library implementation.  This is not the first time I recall a software implementation being a slightly less maximum speed than what is defined in the datasheet or TRM.  The CC2538 is a legacy device so it is more difficult to track and find existing bug reports or tickets.

    32/12 is 2.667 MHz, which is fairly close to the 3 MHz they are currently using.  Would behavior improve or disappear if they wrapped SPI receive operations in a critical section, possibly based on when the CS pin is driven active.  This could however introduce errors for blocked tasks.

    Regards,
    Ryan

  • Hi Ryan,

    The failure rate frequency is 2-3 times out of 10,000 times.

    In addition to SPI communication (with DMA), radio communication (with DMA), GPIO, timer interrupts, and associated application activity are running periodically.
    It is running asynchronously with SPI, although periodically, so I think it may run at the same time.

    It seems like this happened not only on RX but TX too. But for TX, it happens due to following.

    The slave side exchanges the following three frames with the master
    1. Master -> Slave frame
    2. Slave -> Master frame
    3. Slave -> Master frame

    The clock increases by 1 bit due to noise and false positives when receiving frames in 1.
    At the time of frame transmission in 2., the first byte is (0x00) empty data.
    At the time of frame transmission in 3., the first byte is the last data to be sent in 2.
    It is sent out one byte apart.

    Thank you in advance.

    Kenley

    Best regards,

    Kenley

  • Hi Ryan,

    And Customer have additional questions based on your previous feedback.

    ① Is there a problem with the current setting in order to operate the 5.33MHz upper limit?

    • Data length 8 bits SPI Mode 3 (SpO, SPH = 1)
    • Communication frequency: 3 MHz
    • Communication from the master takes breath every 16 bits
    • RAM for DMA transfer is 256 bytes
    • DMA transport mode is Basic

    ② What exactly should they do if there is  a problem?

    ③ This time, they would like to communicate at 3MHz (details are 2.857 MHz). How exactly should they set it to do this?

    ④ Please tell customer about the expected limit when operating at 32MHz/12. (For example, the number of interrupts that are otherwise operational)

    ⑤ What is the main cause of the software library load that causes 32MHz/6 to 32MHz/12? I think the library has a processing load that is equivalent to setting registers.

    ⑥ What is the condition of "BSY" register ? it seems like the BSY flag is raised after setting the SSI and DMA even if DR is not configured and DMA transfer is enabled and no data is actually sent or received? Customer would like to check whether any data still in the shift register using BSY. 


    Thank you in advance.

    Kenley

    Best regards,

    Kenley

  • Given the low failure rate and nature of the behavior (one entire byte missed), I propose that the issue could involve refilling the DMA buffer.  The DMA buffer is refilled by the application, and if there is an ongoing higher priority task processing then this could delay the refill operation past the next byte transfer request from the SPI controller, thus causing a byte delay.  This could be addressed by using ping-pong mode DMA mode instead, which is used to support a continuous data flow to or from a peripheral.  You can find more from Section 10.3.6.4 of the TRM.

    I believe that the customer has full knowledge of the SPI registers based on our previous E2E thread.  BSY is set if the transmit FIFO is not empty.

    Regards,
    Ryan

  • Hi Ryan

    Here is customer's feedback.

    Do you mean the following ?

    1. DMA transfer in progress: DMA stops as soon as the DMA sends 256 bytes.

    2. DMA stopped: While the remaining bytes are being sucked into the shift register (FIFO is not empty yet, but Refill is stopped)

    3. The CPU has to restart the DMA for the next transfer before the FIFO is exhausted, or it sends 0 when the FIFO is empty. This will shift by one byte.

    And customer want to confirm this.

    1. The data they want to transfer DMA has already been prepared in RAM, and when transferring DMA, the uDMAChannelTransferSet() function is used with ui32TransferSize=256 (changes to 16,8 depending on the frame they want to transfer).

    It is recognized that is not affected by high priority processing in software (although there is no task because there is no OS).

    For example, if they use the uDMAChannelTransferSet() function to send 256 bytes of frame data with ui32TransferSize=8 and send it in a for loop, etc., they understand that there is a possibility that the delay may be interrupted by high priority processing.

    If there is a wrong recognition point, it would be helpful if you could teach us.

    2. Regarding the BSY, customer want to confirm.
    The assumption is that if the data they want to send to the master clock is transferred to the TX-FIFO later without a single byte, the subsequent byte is transferred to the shift register immediately because they were using SSIFss, so the TFE of the SSI_SR will not be "0: Transmit FIFO is not empty."

    Is customer understanding correct?

    Best regards,

    Kenley

  • I think the customer has understood what I've attempted to explain about the DMA operation.  Once again, using ping-pong mode instead of basic would help mitigate the risk.

    If I understand the SSI Module Block Diagram correctly, the TX FIFO is in between the SSIDR and transmit logic, and data is stored in the FIFO until transmitted.  So with a single byte the FIFO would not be empty (TFE: 0) until the byte was transmitted.

    19.4.2.1 Transmit FIFO

    The common TX FIFO is a 16-bit-wide, 8-location-deep, first-in first-out memory buffer. The CPU writes data to the FIFO by writing the SSI Data (SSI_DR) register (see SSI_DR), and data is stored in the FIFO until it is read out by the transmission logic. When configured as a master or a slave, parallel data is written into the TX FIFO before serial conversion and transmission to the attached slave or master, respectively, through the SSITx pin. In slave mode, the SSI transmits data each time the master initiates a transaction. If the TX FIFO is empty and the master initiates, the slave transmits the eighth most-recent value in the transmit FIFO. If less than eight values are written to the TX FIFO since the SSI module clock was enabled using the SSI bit in the SYS_CTRL_RCGCSSI register, then 0 is transmitted. Take care to ensure that valid data is in the FIFO as needed. The SSI can be configured to generate an interrupt or a µDMA request when the FIFO is empty.

    Regards,
    Ryan

  • Hi Ryan

    Customer said that the state of the SSI transmit/receive FIFO is not checked but customer said there is enough space for each frame to be processed, so customer said there should be no problem on DMA refill.

    The flow of frame transmission and reception is repeated as follows.

    1. All data to be sent is set to RAM, and all areas of RAM for receiving are cleared to 0
    2. Call the below two functions in the order of transmit and receive, and start the DMA transfer (because the DMA was in StopMode at the last transmit and receive). 

      uDMAChannelTransferSet

      uDMAChannelEnable

    3. After DMA transfer, TX-FIFO should be full so the following that you mentioned should not happen.
      "If less than eight values are written to the TX FIFO since the SSI module clock was enabled using the SSI bit in the SYS_CTRL_RCGCSSI register, then 0 is transmitted"
    4. Allow CLK transmission on master side (by setting the ready signal high)
    5. DMA Transmit Completion Interrupt, Receive completion interrupt waits for both completion
    6. Disable CLK transmission on the master side (by setting the ready signal low)

    Best regards,

    Kenley

  • Hi Ryan

    Let me share their configuration.

    Please let us know if there is any order need to be modified.

    Initialization:

    Order Registers Configuration
    1 SysCtrlPeripheralDisable SYS_CTRL_PERIPH_SSI0
    2 SysCtrlPeripheralReset SYS_CTRL_PERIPH_SSI0
    3 SysCtrlPeripheralEnable SYS_CTRL_PERIPH_SSI0
    4 GPIOPinTypeSSI base address of GPIO, ( SCK pin | MOSI pin | MISO pin )
    5 IOCPadConfigSet base address of GPIO, ( SCK pin | MOSI pin ), IOC_OVERRIDE_PDE
    6 IOCPinConfigPeriphInput base address of GPIO, SCK pin, IOC_CLK_SSIIN_SSI0
    7 IOCPinConfigPeriphOutput base address of GPIO, MISO pin, IOC_MUX_OUT_SEL_SSI0_TXD
    8 IOCPinConfigPeriphInput base address of GPIO, MOSI pin, IOC_SSIRXD_SSI0
    9 SSIDisable 0x40008000(SSI_CR0)
    10 SSIClockSourceSet 0x40008000(SSI_CR0), SSI_CLOCK_PIOSC
    11 SSIConfigSetExpClk 0x40008000(SSI_CR0), SysCtrlIOClockGet(), SSI_FRF_MOTO_MODE_3, SSI_MODE_SLAVE, SysCtrlClockGet()/2, 8bit
    12 SSIIntRegister 0x40008000(SSI_CR0), pointer to the function
    13 SSIDMAEnable 0x40008000(SSI_CR0), ( SSI_DMA_RX | SSI_DMA_TX )
    14 SSIEnable 0x40008000(SSI_CR0)
    15 uDMAChannelAssign UDMA_CH11_SSI0TX
    16 uDMAChannelAttributeDisable UDMA_CH11_SSI0TX, UDMA_ATTR_ALL
    17 uDMAChannelControlSet ( UDMA_CH11_SSI0TX | UDMA_PRI_SELECT ), ( UDMA_SIZE_8 | UDMA_SRC_INC_8 | UDMA_DST_INC_NONE | UDMA_ARB_4 )
    18 uDMAChannelAssign UDMA_CH10_SSI0RX
    19 uDMAChannelAttributeDisable UDMA_CH10_SSI0RX, UDMA_ATTR_ALL
    20 uDMAChannelControlSet ( UDMA_CH10_SSI0RX | UDMA_PRI_SELECT ), ( UDMA_SIZE_8 | UDMA_SRC_INC_8 | UDMA_DST_INC_NONE | UDMA_ARB_4 )

    By setting the GPIO before and after frame communication, it notifies master of permission/disapproval of CLK transmission and controls the timing of CLK from master.

    Here is the flow when doing transmit/receive application.

    Order Registers Configuration
    1 uDMAChannelIsEnabled UDMA_CH11_SSI0TX
    2 uDMAChannelTransferSet ( UDMA_CH11_SSI0TX | UDMA_PRI_SELECT ), UDMA_MODE_BASIC, source address of TxDataRAM , 0x40008008(SSI_DR), 16 or 256 or 8
    3 uDMAChannelEnable UDMA_CH11_SSI0TX
    4 uDMAChannelIsEnabled UDMA_CH10_SSI0RX
    5 uDMAChannelTransferSet ( UDMA_CH10_SSI0RX | UDMA_PRI_SELECT ), UDMA_MODE_BASIC, 0x40008008(SSI_DR), source address of RxDataRAM, 16 or 256 or 8
    6 uDMAChannelEnable UDMA_CH10_SSI0RX

    The pin for SSIFss is not set in CC2538 is used (as configured at system startup).
    In this state, transmission/reception starts when SSI is enabled, SSI SR goes to BSY, and data transmission starts by CLK reception, so it is expressed as always active (low state). Customer said they do not know whether the register actually transitions from idle because there was no description when SSIFss was not set.

    Here is the mode they are using:

  • Is the sequence always MOSI 16 bytes, MOSI 256 bytes (once or multiple), MISO 8 bytes?  And where in this sequence is the failed RX FIFO bit likely to occur?  I assume it could be in the middle of the MOSI 256 bytes, or is it always at the very beginning or very end of one of the processes?  The same question for the TX FIFO.  Do they have logic analyzer screenshots of where the failure occurs on the SPI line, and is anything out of place for the byte in question?  Have they considered using burst mode, DMAUSEBURSTSET, and also increasing the source increment to UDMA_SRC_INC_32 (UDMA_ARB_4 would become UDMA_ARB_1)?  This appears to be supported in the TRM and would be a valuable data point.

    Regards,
    Ryan

  • Hi Ryan

    Yes the sequence always MOSI 16 bytes, MOSI 256 bytes (once or multiple), MISO 8 bytes.
    There are 3 phenomenons and it is always the very beginning or very end of one of the processes.

    1. The last byte in 256 bytes frame is not sent 

    2. The last byte which is not sent in 256 bytes will be sent in the next 8bytes frame

    3. If the clock increases by 1 bit due to noise or false positives when receiving 16bytes frames. At the time of 256 bytes frame transmission, the first byte is (0x00) empty data. When the 8 bytes frame is sent, the first byte is the last data to be sent in 2. After that, it is transmitted in a form that is offset by one byte.

    I am asking for the logic analyzer waveform but no response yet.

    May i know what is the mechanism behind by using burst mode, DMAUSEBURSTSET, and also increasing the source increment to UDMA_SRC_INC_32 (UDMA_ARB_4 would become UDMA_ARB_1) could improve it ?

     

    Best regards,

    Kenley

  • once or multiple

    This was a clarifying question in my previous message, I assume once until otherwise informed

    Are 1, 2, and 3 mutually inclusive?  As in the the last byte not being sent (1) is always caused by the clock noise causing the first byte being empty data (3)? None of the examples provided show byte errors at the beginning without supposed clock false positives.

    I would expect that a clock increase by 1 bit should result in all data being bit-shifted, not one entire 0x00 byte shift.  Nevertheless, the behaviors observed are more indicative of clocking issues provided from the host than the DMA/SPI implementation on the CC2538.  You mentioned that there is no CS which is also contributing to the problem in this case.  This is why start of frame (SoF) and cyclic redundancy checks (CRC) are recommended for every data packet to confirm when the message actually begins and verify its contents. If the byte errors occur at the very beginning or end then it seems plausible to work around as compared to missing a byte in the middle. The DMAUSEBURSTSET comments concern different approaches to DMA transfers which may not be effectual if the error occurs at the beginning or end, depending on further analysis of the behaviors.

    Regards,
    Ryan

  • Hi Ryan

    Thank you for your support.

    They do not have the screen shot of where the failure occurs. 

    This ready signal is the attached yellow light, and between the permission timing and the camera side emits CLK time is 15.64μs.

    Do you think this is enough time to fill the FIFO by DMA? 

    Best regards,

    Kenley

  • Do you think this is enough time to fill the FIFO by DMA? 

    This would depend on the priority of the SPI task and the existing usage of other tasks from the radio, application, other peripherals, etc.  Wouldn't one prepare the RX DMA buffer configuration before sending the ready signal?

    Regards,
    Ryan

  • Hi Ryan

    Thank you for your support.

    I am going to have a meeting with customer tomorrow.

    There is too much information that I need to confirm and organize before asking your further support.

    Best regards,

    Kenley

  • Hi Ryan

    Let me reorganize the situation and information I have received in the meeting today.

    Phenomenon.

    1. The last byte in 256 bytes frame is not sent 

    2. The last byte which is not sent in 256 bytes will be sent in the next 8bytes frame

    3. If the clock increases by 1 bit due to noise or false positives when receiving 16bytes frames. At the time of 256 bytes frame transmission, the first byte is (0x00) empty data. When the 8 bytes frame is sent, the first byte is the last data to be sent in 2. After that, it is transmitted in a form that is offset by one byte.

    →No 1 & 2 is mutually inclusive. So, when 1 happens, 2 happens.

    Firstly, customer want to focus to fix this phenomenon first. 

    In order for this phenomenon occurs,
    it means that the last byte which is not sent in 256 bytes frame should be still in the TX-FIFO or Shift Register so when they send the next 8 bytes frame, the first byte will be the last byte which is not sent in the previous frame.

    Do you agree on this? 

    And here are their questions.

    1. Can they check whether any data is still in TX-FIFO or shift register using any registers? 

    2. Can they clear the TX-FIFO or shift register using any library available?

    3. What is the trigger for TX-FIFO to send the data to shift register? Is it when the slave receiving master clock or any other trigger? 
    It would be appreciated if you can share the hardware logic behind that.
    They just want to make sure the remaining data which is not sent is stuck at TX-FIFO or shift register.

    As you can see below, they refill the DMA before sending the master ready sign to send clock. 
    To prevent the phenomenon 1 & 2 happens, they want to clear the TX-FIFO or shift register after sending the 256 bytes frame so when they start the 8 bytes frame, the previous frame will not affect the data.

    Please let me know if I misunderstand or any information you need. 

     

    Best regards,

    Kenley

  • I agree to their understanding.

    1. We had previously discussed the TFE bit of the SSI_SR register, which I believe applies too this case

    2. I don't see any registers SSI registers for clearing the TX FIFO.  You could have the master device request additional bytes through the clock line and determine whether any additional valid data is received.

    3. What I have available is from the TRM: "In slave mode, the SSI transmits data each time the master initiates a transaction. If the TX FIFO is empty and the master initiates, the slave transmits the eighth most-recent value in the transmit FIFO. If less than eight values are written to the TX FIFO since the SSI module clock was enabled using the SSI bit in the SYS_CTRL_RCGCSSI register, then 0 is transmitted. Take care to ensure that valid data is in the FIFO as needed. The SSI can be configured to generate an interrupt or a µDMA request when the FIFO is empty...If the SSI is enabled and valid data is in the TX FIFO, the start of transmission is signified by the SSIFss master signal going low. The master SSITx output pad is enabled. After an additional one-half SSIClk period, both master and slave data are enabled onto their respective transmission lines. At the same time, SSIClk is enabled with a falling edge transition. Data is then captured on the rising edges and propagated on the falling edges of the SSIClk signal. In the case of a single word transmission, after all bits are transferred the SSIFss line is returned to its idle high state one SSIClk period after the last bit is captured. For continuous back-to-back transmissions, the SSIFss pin remains in its active low state until the final bit of the last word is captured and then returns to its idle state as previously described. For continuous back-to-back transfers, the SSIFss pin is held low between successive data words, and termination is the same as that of the single-word transfer"

    I think it would be best for them to have the master activate the clock one or two bytes past the existing 256 to make sure that the TX FIFO has been cleared. 

    Regards,
    Ryan

  • Hi Ryan

    Thank you for the feedback.

    Could you please elaborate on your recommendation?
    How can they ask the master device to activate the clock on or two bytes from the slave side ?
    Using this  ui32TransferSize=256 →  ui32TransferSize=257 or 258?

    "You could have the master device request additional bytes through the clock line and determine whether any additional valid data is received."

    "I think it would be best for them to have the master activate the clock one or two bytes past the existing 256 to make sure that the TX FIFO has been cleared. "

    Regarding Q3, on the TRM we can not find anything that is explaining the triggers for TX-FIFO to send it to shift register.

    Regards,

    Kenley

  • Hi Kenley,

    Yes, they can increase the TransferSize of the master to 257 or 258.  They can be repeats of byte 256  You reminded me that the 256 bytes are coming from the master (MOSI) so I'm not sure why we are discussing the TX FIFO on the slave (CC2538) instead of the RX FIFO.  Do you understand my confusion based on the diagram that has been provided?  From what I can tell, the CC2538 slave only sends 8 bytes (MISO) through the TX FIFO at any time.  I concur that the TRM does not have much to say about the shift register operation.

    Regards,
    Ryan

  • Hi Ryan

    Sorry for the confusion

    You are right, I missed that one.

    Let me check with customer.

    Regards,

    Kenley