TMS320F28388S: Optimizing SCI-A Communication (up to 70 bytes) to Reduce CPU Overhead

Ninad Lomte

Part Number: TMS320F28388S
Other Parts Discussed in Thread: C2000WARE

Hello,

I am developing a communication layer using the SCI-A module on my TMS320F28388S microcontroller module. We handle a custom serial protocol where data frames can reach up to 70 bytes. We are seeing significant CPU overhead due to high interrupt frequency and context switching.

Current Implementation & Issue:

RX Setup: SCI-A RX FIFO is configured to trigger an interrupt at a 1-word level (RX FIFO = 1).
RX Logic: The RX ISR reads a single character using SCI_readCharNonBlocking() and processes it.
TX Logic: The TX ISR polls the TX FIFO status and writes one byte at a time using SCI_writeCharNonBlocking().
Problem: For 70-byte frames, the constant interrupt servicing significantly impacts the system's real-time control loop.

Desired Goal: Eliminate CPU Overhead using DMA

I want to offload the entire 70-byte transfer process to the DMA controller.

My Assumption & Request for Verification:

RX DMA Trigger Constraint: Based on the TRM for my device, I assume that the SCI-A RX FIFO level is not available as a direct hardware peripheral trigger source for the DMA module. This means a standard DMA block transfer (triggered only once after a full frame) is not feasible. Is this assumption correct for my device, or is there an undocumented way to configure the SCI RX status (e.g., level threshold, or idle line) as a DMA trigger?
TX DMA Solution: I believe the SCI TX empty status is a viable DMA trigger. What is the recommended, lowest-overhead method for setting up the DMA to send 70 bytes from RAM to the SCI TX register, using the TX FIFO as the trigger?
Alternative RX Solutions (If DMA is not viable): Given the limitations, are there any lower-overhead methods for the RX side?
- Could increasing the RX FIFO interrupt level (e.g., to RX FIFO = 16) reduce the overhead significantly enough?
- Is there a recommended peripheral-to-peripheral trigger mechanism to emulate a DMA trigger for the SCI RX?

Any guidance on maximizing transfer efficiency for high-frequency, multi-byte SCI communications would be greatly appreciated.

Thank you,
Ninad Lomte

6 months ago

0 Delaney Woodward 6 months ago

TI__Mastermind 25250 points

Hi Ninad,

Unfortunately, the SCI module does have DMA access as you've mentioned. It cannot be used to trigger the DMA, nor can the DMA access its registers.

However, for the F2838x device, there is a different UART-capable module - just called UART, accessible from the CM core, that does have DMA access. There is a UART/DMA example in the following SDK path that I would suggest using as a starting point: [C2000ware install]/driverlib/f2838x/examples/cm/uart/uart_ex2_loopback_udma. You can use IPC to communicate between the C28x and the CM core, see example here: [C2000ware install]/driverlib/f2838x/examples/c28x_cm/ipc/ip_ex1_basic_XX.

Another fix would indeed be increasing the FIFO level. Is the amount of data received by the F2838x always fixed at 70 bytes? If yes, you can set the FIFO level to 10/16 and read 10 bytes per ISR to reduce the number of ISRs by a factor of 10. Or, you could set the RX FIFO level to 16/16, read 16 bytes in every ISR, then read the remaining 6 bytes in the main loop.

Best Regards,

Delaney

0 Ninad Lomte 6 months ago in reply to Delaney Woodward

Prodigy 10 points

Hello Delaney,

Thank you for your response! I will surely look into the feasibility of using the UART module from the CM core and using DMA access.
To answer your question regarding the frame size, the frame size is not fixed at 70 bytes. We are currently developing a custom protocol in which, the frame size will vary and the maximum length of a frame can go as high as 70 bytes.

1. According to the protocol , we have used delimiters at the start and end of the frame.
2. We have used RX1 FIFO interrupt level for my SCI RX ISR so that we run a state machine within the ISR which reads single character at a time and separates the delimiters from the main payload without complex logic.
3. Now according to your suggestion of maybe using FIFO level of 10/16, it will help us reduce the CPU overhead but we will require to store that data into a temporary buffer and my state machine would then have to process those 10/16 bytes one-by-one to correctly identify the sequence boundary.

Given the necessity of maintaining this byte-by-byte delimiter state machine, what is the recommended practice when moving from RX1 to RX16/RX10 for protocols with internal delimiters?

Is the recommended approach simply to increase the FIFO level (RX16) and move our entire byte-by-byte state machine logic into a loop inside the ISR? (We will then rely on the CRC check in the mainline code to handle any remaining ambiguity).
Alternatively, is there a simpler, library-supported way to handle this streaming state analysis on a fixed-level FIFO without significantly complexifying the ISR?

Thank you again for your original response, and your advice on switching to a different RX FIFO level, which ensuring parsing safety would be greatly appreciated!

0 AndyP 6 months ago in reply to Ninad Lomte

Expert 1806 points

Hi Ninad,

what I would recommend:

1. Reduce ISR latency with #pragma INTERRUPT(SCI_xxxxxx, HPI);
2. Move the ISR to RAM
3. If you don't use VCU in the SCI interrupts set --isr_save_vcu_regs=off to further reduce ISR latency (on a file level)
4. Heavily optimize the ISR as much as you can (with the help of analyzing generated assembly code)

What baudrate do you have?

Best regards,
Andy

0 Delaney Woodward 6 months ago in reply to AndyP

TI__Mastermind 25250 points

Hi Ninand,

In order to increase the RX FIFO level, yes, a larger buffer would need to be used to store more than one byte. Usually for time critical code, the RX ISR just moves this data over to the buffer and then post processing of the data is done in the main loop. This also requires a state variable to make sure the ISR operation and main loop operation are synchronized (we don't want to overwrite the buffer in the ISR before it has been processed by the main loop or vice versa). If your byte-by-byte state machine logic is currently in the background loop, then it is recommended to keep it there. You would just need to modify the logic to read from the buffer one at a time rather than a single global variable.

Best Regards,

Delaney

0 Ninad Lomte 6 months ago in reply to AndyP

Prodigy 10 points

Hi Andy,

Thank you for your response. I will take your suggestions into consideration and look into it. Our current baud rate is 57600.

0 Ninad Lomte 6 months ago in reply to Delaney Woodward

Prodigy 10 points

Hi Delaney,

Thank you for the response. In our code, our ISR is handling the state machine and byte-by-byte processing tasks and the main loop is implemented only when the whole response is parsed into a buffer using a flag variable to synchronize both, ISR and main loop, as you mentioned. I think changing the main code logic could be complex but I will take your suggestions into consideration!

0 Aishwarya Rajesh 6 months ago

TI__Mastermind 20225 points

Hi,

Please note that due to the holiday season, there may be some delay in responses.

Best Regards,
Aishwarya

0 Delaney Woodward 6 months ago in reply to Aishwarya Rajesh

TI__Mastermind 25250 points

Hi Ninad,

Sounds good, let me know if you have any other questions about what I recommended.

For the RX FIFO level, if the total number of bytes is not fixed at 70, is there instead a way to know the length of a frame after the first couple bytes are received based on the custom protocol? For example, if each frame in the protocol were to start with a start byte, then a command byte, and we know the full frame length based on which command was received. In this case, the FIFO level can be initialized to 2/16 and can be adjusted after each command byte is received depending on the what the command byte is.

Best Regards,

Delaney

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28388S: Optimizing SCI-A Communication (up to 70 bytes) to Reduce CPU Overhead

Current Implementation & Issue:

Desired Goal: Eliminate CPU Overhead using DMA

My Assumption & Request for Verification: