LP-AM243: The Timing Question of " PRU - FSI Bandwidth Optimization " Document's Implement

Bolt Hsieh

Part Number: LP-AM243

Hi experts,

I will use PRU to transmit FSI data, so I reference the document : /cfs-file/__key/communityserver-components-multipleuploadfilemanager/716d8558_2D00_1385_2D00_47b7_2D00_b953_2D00_308af7ebc898-549268-complete/FSI-Bandwidth_2D00_Optimization-for-Multi_2D00_axis-Servo-Control.pdf

And I got the example code "FSI_PRU_Handler_20221220" from TI-FAE.

When I implemented the .asm code to transmit FSI data in PRU, I referenced the example code, but I met some timing question.

I test three methods to execute FSI data exchange, My test result is shown in the below table.

method:	DMA - example flow	DMA - Correct flow(Wait Xout done)	memcpy
PRU process tick	32	174	176
Data Sampling time	Last Period	this Period	this Period
code

Reference the above table, the execute time of the original example code is the fastest, but the method cann't get the current data immediately and need to wait a period.

If the application needs to get the current data, it uses the method of "DMA of wait XOUT done" or the method of "memcpy" can success. However, the execute time of the two methods takes 5 times longer than the original example code, because both the methods always wait the current data to move done.

So, the document : FSI bandwidth optimization, the content of test result compare - R5f and PRU, I think the PRU data had 1 period delay to the R5f data, but the content didn't describe.

I want to ask the expert. If I want to achieve the FSI bandwidth optimization, my application needs to satisfy the condition which is 1 period delay, isn't it?

In other words, if I want to get the current DMA data, I have to wait the Xout done or use memcpy. Can the expert provide some suggestions to reduce the data moving time?

Best Regards

Bolt

over 2 years ago

0 Dhaval Khandla over 2 years ago

TI__Mastermind 28394 points

Bolt Hsieh said:
I will use PRU to transmit FSI data, so I reference the document : /cfs-file/__key/communityserver-components-multipleuploadfilemanager/716d8558_2D00_1385_2D00_47b7_2D00_b953_2D00_308af7ebc898-549268-complete/FSI-Bandwidth_2D00_Optimization-for-Multi_2D00_axis-Servo-Control.pdf

This link is not working. Can you please upload again?

Regards
Dhaval

0 Bolt Hsieh over 2 years ago in reply to Dhaval Khandla

Prodigy 165 points

Hi Khandla,

The paper can reference - https://www.ti.com/lit/an/snoaa89/snoaa89.pdf?ts=1696989029580

The paper's name is " FSI Bandwidth-Optimization for Multi-axis Servo Control "

Best Regards

Bolt

0 Chen Gao over 2 years ago

TI__Prodigy 375 points

Hi Bolt,

The code you attached is only for data movement from TCM to XFR2VBUS DMA. The RD_DATA is read by XIN instruction not XOUT. XOUT is just to configure the data size/address. You can refer to Technical Reference Manual (https://www.ti.com/lit/pdf/spruim2) chapter 6.4.6 for details.

And did you check the PRU R2-R17 PRU registers which store the DMA data?

The approach of FSI handler for transmitting data is: TCM --> XFR2VBUS DMA --> FSI TX Buffer

BR,

Chen

0 Bolt Hsieh over 2 years ago in reply to Chen Gao

Prodigy 165 points

Hi Chen,

I know the overall path of FSI transmitting data, my paste code just described the flow's error part.

And, if I check PRU R2-R17 registers, the data was sample from last period.

Maybe I verified the flow's error part when I read Technical Reference Manual (https://www.ti.com/lit/pdf/spruim2) chapter 6.4.6.3.1.6

In the chapter, the flow of "Read Model" is shown in which the operation of wait RD_BUSY(TRM is written to WR_BUSY) needs to execute after XOUT R18 R19. So, the approach of FSI handler ignored this step that caused the data error, wasn't it?

Then, your description makes me confused. About Xout & Xin instructions, my understanding is the description below. XOUT instruction operate R18[0] to do "Read Auto Mode", the step will let XFR2VBUS model which read data from external memory address. So, the flow need to wait XFR2VBUS service done after Xout instruction set.XIN instruction just read data from XFR2VBUS model.

Best Regards

Bolt

0 Chen Gao over 2 years ago in reply to Bolt Hsieh

TI__Prodigy 375 points

Hi Bolt,

You are correct regarding the XIN, XOUT instruction for XFR2VBUS RX widget. Apology for bring you the confusion. What I mean for the "XOUT R18" is to trigger DMA while "XIN R2-R17" is to read data from FIFO.

I verified the code again that it do requires 1 period delay for XFR2VBUS read. The FSI handler source code missing the wait time which caused the data error. Thank you very much for the findings.

I also tested the processing time for TX part W/ and W/O waiting period. It will take another ~200 ns with the waiting time (see following figures, purple curve is PRU_GPO toggling and blue curve is FSI CLK). Compares to the R5F control (polling for new data from FSI buffer), FSI_PRU_handler still can save more than half of the processing time.

Just wondering how you measured the PRU ticks?

Regards,

Chen

0 Bolt Hsieh over 2 years ago in reply to Chen Gao

Prodigy 165 points

Hi Chen,

Thanks for your response.

Because I used the method of Mix C & .asm code to apply PRU, I used " CT_PRU0_CTRL.CYCLE_COUNT_bit.CYCLECOUNT " which is written on the C code to record PRU ticks between the .asm code of XFR2VBUS application. I test the PRU tick record code which spends 10 ticks.

Do you have the time record of memcpy application for FSI communication?

And, can you test the time record of FSI Rx Part with wait time?

Let me reference to compare my experiment record.

Best Regards

Bolt

0 Chen Gao over 2 years ago in reply to Bolt Hsieh

TI__Prodigy 375 points

Hi Bolt,

I don't have the time record of memcpy application, but I think you can easily test the time by PRU_GPO toggling. eg. set r30, r30, 1 (set high level of PRUx_GPO1) and clr r30, r30, 1 (set low level of PRUx_GPO1). I will let you know the test result before next Tuesday.

Regards,

Chen

0 Chen Gao over 2 years ago in reply to Chen Gao

TI__Prodigy 375 points

Bolt,

Please find the test results as followed. Compares to the data from app notes, Tx processing time increased from 460ns to 666ns (~200ns) and Rx processing time increased from 280ns to 758ns (~480ns) for 64 bytes data movement.

Regards,

Chen

0 Bolt Hsieh over 2 years ago in reply to Chen Gao

Prodigy 165 points

Hi Chen,

Thanks for your response.

About PRU_GPO toggling, our circuit board don't layout these pins, so I cann't used the togging method in my experiment environment.

About FSI processing time of my test, Tx processing time increased from 99ns to 244ns (~145ns) for 32 bytes data movement and Rx processing time increased from 105ns to 580ns (~475ns) for 64 bytes data movement. So, I think my test result which is similar to your test result.

And I look forward to the timing record of memcpy application by PRU.

Best Regards

Bolt

0 Chen Gao over 2 years ago in reply to Bolt Hsieh

TI__Prodigy 375 points

Hi Bolt,

The memcpy_PRU application takes 166ns for 32 bytes data movement. Please find the test result and code under testing as followed.

Regards,

Chen

0 Bolt Hsieh over 2 years ago in reply to Chen Gao

Prodigy 165 points

Hi Chen,

Thanks for your response.

My test result is similar with your test result.

So, your test result let me know that the efficiency of memcpy is better or similar the efficiency of XFR2VBUS.

Do you agree this summarize? If you have other application suggest or summarize, please tell me. thanks.

Best Regards

Bolt

+1 Chen Gao over 2 years ago in reply to Bolt Hsieh

TI__Prodigy 375 points

Hi Bolt,

I agree with you for the FSI handler case since the FSI buffer size is 16-word. However, the XFR2VBUS is faster than memcpy If the data size goes to 64 bytes. Please find the test results as followed.

For FSI handler, I don't have more ideas to make it faster. For other cases using PRU I/O for transmitting data, the broadside (BS) RAM and IPC scratch pad memory may help to increase the bandwidth.

Regards,

Chen

Arm-based microcontrollers

Arm-based microcontrollers forum

LP-AM243: The Timing Question of " PRU - FSI Bandwidth Optimization " Document's Implement