This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software:
I want to copy the data in the V4L2 buffer filled by the CSI RX driver on AM6x. However, the copy is very slow. A 4MB copy takes about 38 msec. Why is that and how can I improve the memcpy performance?
Other parts: AM62A3, AM62A7-Q1, AM62A3-Q1, AM68A, AM69A
Reason for slow memcpy:
On AM6x (AM62, AM62A, AM68A, etc) SoC, the Linux CSI RX driver fills the V4L2 buffer in DDR through DMA. The DMA buffers for CSI RX driver are set to cache non-coherent by default.
When the CSI RX driver fills the provided V4L2 buffer via DMA, the cache content of that buffer becomes invalid and the DMA driver marks the cache lines for V4L2 buffer invalid. When the V4L2 buffer is read by memcpy(), Linux will perform cache maintenance operations by bringing in the data from DDR to cache. This is the reason why memcpy() takes long time.
Solution:
To avoid long memcpy time, it is recommended not to do memcpy but to use DMA or pass the buffer pointer instead. If memcpy is unavoidable, one can change the CSI RX DMA buffers to cache coherent. That can be done in the device tree as below (using k3-am62a-main.dtsi for AM62A in SDK 9.2 as an example).
diff --git a/arch/arm64/boot/dts/ti/k3-am62a-main.dtsi b/arch/arm64/boot/dts/ti/k3-am62a-main.dtsi index 1b3b492c7..0d35a6225 100644 --- a/arch/arm64/boot/dts/ti/k3-am62a-main.dtsi +++ b/arch/arm64/boot/dts/ti/k3-am62a-main.dtsi @@ -971,9 +971,9 @@ ti_csi2rx0: ticsi2rx@30102000 { compatible = "ti,j721e-csi2rx"; - dmas = <&main_bcdma_csi 0 0x5000 0>, <&main_bcdma_csi 0 0x5001 0>, - <&main_bcdma_csi 0 0x5002 0>, <&main_bcdma_csi 0 0x5003 0>, - <&main_bcdma_csi 0 0x5004 0>, <&main_bcdma_csi 0 0x5005 0>; + dmas = <&main_bcdma_csi 0 0x5000 15>, <&main_bcdma_csi 0 0x5001 15>, + <&main_bcdma_csi 0 0x5002 15>, <&main_bcdma_csi 0 0x5003 15>, + <&main_bcdma_csi 0 0x5004 15>, <&main_bcdma_csi 0 0x5005 15>; dma-names = "rx0", "rx1", "rx2", "rx3", "rx4", "rx5"; reg = <0x00 0x30102000 0x00 0x1000>; power-domains = <&k3_pds 182 TI_SCI_PD_EXCLUSIVE>;
Refer to the device tree binding for DMA, Documentation/devicetree/bindings/dma/ti/k3-bcdma.yaml, for more information about DMA properties.
Additional Information:
The device tree modification above changes the ASEL value from 0 to 15. The meanings of the ASEL values are:
References:
ARM Architecture - Accelerator Coherency Port
Note: ASEL=14 or 15 is not available on AM62P, so applications should be optimized not to use memcpy.
For other common problems related to CSI camera, please refer to this FAQ: