SK-AM62A-LP: how to support big endian in DMA

Eric Chen

Prodigy 30 points

Part Number: SK-AM62A-LP

Tool/software:

Hi, ti expert,

we use SK-AM62A-LP platform, and sdk 09.02.

we use GPMC interface connect to a oled (i-80 mode) module. and use DMA to copy data from frambuffer to oled's gram.

the DMA can work, and oled can display photo but photo color is reversed, red change to blue. The reason is : the oled' gram is Big-endian, and dma wokes at little-endian, dma's address increase from low address to high address. So how can i fix the issue?

I refered to the patch file 3386.udma-dev-to-mem-1019.diff, and patch the drivers\dma\ti\k3-udma.c

AM62A3: Linux DMA Identifiers for GPMC Device Register to Memory Transfer - Processors forum - Processors - TI E2E support forums

the dts and code like this:

&gpmc0 {
	status = "okay";
	pinctrl-names = "default";
	pinctrl-0 = <&gpmc0_pins_default>;
	assigned-clock-parents = <&k3_clks 80 2>;	/*1mclk=10ns*/
	assigned-clock-rates = <33333333>; /*33333333  ,30 ns*/
	ranges = <1 0 0x00 0x50000000 0x01000000>; /* CS1 space. size = 128MiB*/

	oled@1,0 {
		compatible = "atmel,unication-oled-module";
		reg = <1 0 0x00000004>;
		reg-names = "data";

		oled_rs = <22>;  /* pin index for LCD_RS */
		madctl = <0x00>;	//bit3:0-rgb, 1-bgr. bit6:1-mirror-x
		oled_type = <0x00>;//1: rm96092, 0: FT2308
		rotate_degree = <270>;  //clockwide, vaild for 0,90,180,270
		atmel,oled-has-dma;
		atmel,oled-bank-width = <8>; /* 8bit */
		dmas = <&main_bcdma 1 0 0>;
		dma-names = "tx";

		oled_reset-gpios = <&main_gpio0 38 GPIO_ACTIVE_HIGH>;
		oled_panel_pw_ctl-gpios = <&main_gpio0 14 GPIO_ACTIVE_HIGH>;
		interrupt-parent = <&main_gpio0>;
		interrupts = <36 IRQ_TYPE_EDGE_FALLING>; //Tearing Effect(TE) singal


	//assigned-clock-parents = <&k3_clks 80 2>;	/*1mclk=10ns*/
	//assigned-clock-rates = <33333333>; /*33333333  ,30 ns*/
		// can read 3208 ID ,can display R G B

		// mclk=30 ns
		
		/* CONFIG1*/
		bank-width = <1>;
		/*gpmc,sync-read;*/
		/*gpmc,sync-write;*/
		/*gpmc,clk-activation-ns = <0>;*/
		gpmc,mux-add-data = <0>;

		/* CONFIG2*/
		/*gpmc,sync-clk-ps = <0>;*/
		gpmc,cs-on-ns = <30>;
		gpmc,cs-rd-off-ns = <900>;
		gpmc,cs-wr-off-ns = <228>;

		/* CONFIG3*/
		/*gpmc,adv-on-ns = <35>;*/
		/*gpmc,adv-rd-off-ns = <9>;*/
		/*gpmc,adv-wr-off-ns = <9>;*/
		
		/* CONFIG4*/
		gpmc,we-on-ns = <53>;
		gpmc,we-off-ns = <129>;
		gpmc,oe-on-ns = <53>;
		gpmc,oe-off-ns = <454>;
		
		/* CONFIG5*/
		gpmc,access-ns = <454>;
		gpmc,rd-cycle-ns = <930>;
		gpmc,wr-cycle-ns = <227>;
		gpmc,wr-access-ns = <129>;


	};
	
};

static inline int set_dma_config(struct oledfb_dev *oled_dev)
{
    int ret;
    struct dma_slave_config slave_config;

    oled_dev->dma_chan = dma_request_slave_channel(oled_dev->dev, "tx");
    if(!oled_dev->dma_chan) {
        dev_err(oled_dev->dev, "%s Failed to request DMA channel\n", __FUNCTION__);
        return -EINVAL;
    }
    dev_info(oled_dev->dev, "oled using %s for DMA transfer\n", dma_chan_name(oled_dev->dma_chan));

    memset(&slave_config, 0, sizeof(slave_config));

	slave_config.direction = DMA_MEM_TO_DEV;
	slave_config.dst_addr_width = 1;
	slave_config.dst_maxburst = 1;
	slave_config.dst_addr = (dma_addr_t)oled_dev->oled_data_phy_addr;

    ret = dmaengine_slave_config(oled_dev->dma_chan, &slave_config);
    if(ret) {
        dev_err(oled_dev->dev, "error in dma configuration\n");
        goto err;
    }
    return 0;
err:
    dma_release_channel(oled_dev->dma_chan);
    oled_dev->dma_chan = NULL;
    return ret;
    
}

static inline int prepare_dma_transfer(struct oledfb_dev *oled_dev)
{
    struct scatterlist *sgl = &oled_dev->sgl[0];
    struct dma_async_tx_descriptor *desc;
    struct dma_chan *chan = oled_dev->dma_chan;
    dma_addr_t p;

    if (!oled_dev->dma_chan) {
        if (set_dma_config(oled_dev) < 0) {
            dev_err(oled_dev->dev, "%s: set_dma_config fail\n", __FUNCTION__);
            return -1;
        }
    }

    /*
      The maximun transaction units is 0xFFFF length. 
      We have logo data 448*368*2= 329728bytes to be send, need at least 6 scatters
      transactions. 0xFFFF(65535)*4 + 33794*2 =329728bytes.
    */

    /* Since ARM's memory is little-endian, while the driver IC needs to be feed 
     * in big-endian, hence the walk around is setup DMA transfering 
     * from buffer memory's tail to head, with address decreasing */

    sg_init_table(sgl, 6);

	p = oled_dev->rotated_framebuf_phy_start;
    sg_dma_address(&sgl[0]) = p;
    sg_dma_len(&sgl[0]) = 0xFFFF;

    p = p + 0xFFFF;
    sg_dma_address(&sgl[1]) = p;
    sg_dma_len(&sgl[1]) = 0xFFFF;

    p = p + 0xFFFF;
    sg_dma_address(&sgl[2]) = p;
    sg_dma_len(&sgl[2]) = 0xFFFF;

    p = p + 0xFFFF;
    sg_dma_address(&sgl[3]) = p;
    sg_dma_len(&sgl[3]) = 0xFFFF;

    p = p + 0xFFFF;
    sg_dma_address(&sgl[4]) = p;
    sg_dma_len(&sgl[4]) = 33794;

    p = p + 33794;
    sg_dma_address(&sgl[5]) = p;
	sg_dma_len(&sgl[5]) = 33794;

    desc = dmaengine_prep_slave_sg(chan, sgl, 6,
                DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT|DMA_CTRL_ACK);

	if(desc == NULL) {
		dev_err(oled_dev->dev, "get dma desc fail.\n");
		return -1;
	}

    oled_dev->desc_tx = desc;
    desc->callback = dma_complete_callback_function;
    desc->callback_param = oled_dev;

	dma_cookie_t cookie_tx;

    /* Start DMA transmition, send RGB data */
    cookie_tx = dmaengine_submit(oled_dev->desc_tx);
	
    txchan->device->device_issue_pending(txchan);

    return 0;
}


static int update_oled_contents_by_dma(struct oledfb_dev *oled_dev)
{
    int ret;
	dev_info(oled_dev->dev, "update_oled_contents_by_dma ENTER.\n");

	if(oled_dev->desc_tx) {
		dev_info(oled_dev->dev, "The old screen update request has not been executed yet!\n");
		return 0;
	}

    ret = prepare_dma_transfer(oled_dev);
    if (ret < 0) {
        dev_err(oled_dev->dev, "%s: prepare_dma_transfer fail: %d\n", __FUNCTION__, ret);
        return ret;
    }

    if(oled_dev->oled_id == OLED_ID_FT2308)
    {
        /* Tearing Effect Line OFF */
        oled_ft2308_cmd(oled_dev, 0x34);
        /* Memory Write */
        oled_ft2308_cmd(oled_dev, 0x2C);
    }
    else
    {
        /* Tearing Effect Line OFF */
        oled_rm69092_cmd(oled_dev, 0x3400);
        /* Memory Write */
        oled_rm69092_cmd(oled_dev, 0x2C00);
    }
    struct dma_chan *txchan = oled_dev->dma_chan;
	dma_cookie_t cookie_tx;

    /* Start DMA transmition, send RGB data */
    cookie_tx = dmaengine_submit(oled_dev->desc_tx);
	
    txchan->device->device_issue_pending(txchan);
	dev_info(oled_dev->dev, "update_oled_contents_by_dma END. cookie_tx=%d\n", cookie_tx);

    return 0;
}


/* Issue pending requests and wait for callback notification */
static void dma_complete_callback_function(void *dma_async_param)
{
    struct oledfb_dev *oled_dev = (struct oledfb_dev *)dma_async_param;
    struct dma_chan *txchan = oled_dev->dma_chan;

    async_tx_ack(oled_dev->desc_tx);
    oled_dev->desc_tx = NULL;

    oled_dev->refresh_complete = 1;
}

9 months ago

0 Bin Liu 9 months ago

TI__Guru**** 171491 points

Hi Eric,

I doubt the problem is in the DMA, rather the endianess mismatch between AM62Ax and the OLED module. I suspect the problem still exist even if you don't use DMA but use memcpy() to move the data to GPMC.

0 Eric Chen 9 months ago in reply to Bin Liu

Prodigy 30 points

Maybe you're right. I'll try it.

I porting the oled dirver from intel sama5d3 platform, the platform fix this issue by modifying the dma driver, so could you give me some help for the dma drvier k3-udma.c, i can't get more about how the dma works. for example, 1. the dma address increase or decrease, 2. icnt0,1,2,3, how can i set it. Do you got some dma pdf like that. thanks.

the sama5d3 platform fix it like this:

0 Eric Chen 9 months ago in reply to Eric Chen

Prodigy 30 points

i use memcpy , it's ok by confirm.

i use memcpy firstly, but it's slow when plack video, so i need use dma mode.

0 Bin Liu 9 months ago in reply to Eric Chen

TI__Guru**** 171491 points

Eric,

Glad to see you found the patch "3386.udma-dev-to-mem-1019.diff" on the other e2e thread and it almost works for your use case.

We don't have any other public DMA documentation other than the information in the TRM.

Eric Chen said:
1. the dma address increase or decrease, 2. icnt0,1,2,3, how can i set it.

The TRM section 11.1.3.3.2.1 "Linear Addressing" demonstrates how icnt0/1/2/3 are used in DMA transfer. And Table 11-38 "Tranfer Request Fields" shows information of all the fields.

It seems the sama5d3 platform uses the non-standard "DMA_MEM_DEC_TO_DEV" transfer to solve the problem for you. But can please explain what would be the ideal solution for you? DMA support big endian or address decrement after transferring each element? Can you please explain the data scheme in memory?

0 Eric Chen 9 months ago in reply to Bin Liu

Prodigy 30 points

thanks for your reply.

this method is ok: address decrement after transferring each element. sama5d3 platform use the method.

I would like to setup DMA transfering from buffer memory's tail to head, with address decreasing.

the oled'size is 448*368, rgb16 for one pixel. so We have 448*368*2= 329728bytes to be send.

the data scheme in memory like this:

1. the data want to be send:

const unsigned char gImage_logome[329728] = {
0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,
0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,
0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,
..........
..........
0X8F,0X5B,0X4E,0X53,0XCC,0X42,0XED,0X42,0X2E,0X4B,0X0D,0X4B,0X2D,0X4B,0X4E,0X53,
0X4E,0X53,0X2E,0X4B,0XF1,0X63,0X72,0X74,0X31,0X6C,0X93,0X7C,0XD0,0X63,0X4E,0X53,
};

2. I would like to send like this: gImage_logome[329728], gImage_logome[329727], gImage_logome[329726] ..... gImage_logome[0]

this mothed can fix the issue.

    sg_init_table(sgl, 6);

    /* start to send from tail to head.
    *  p = &gImage_logome[329728] 
    */
	p = oled_dev->rotated_framebuf_phy_start +SCREEN_SIZE -1;
    sg_dma_address(&sgl[0]) = p;
    sg_dma_len(&sgl[0]) = 0xFFFF;

    p = p - 0xFFFF;
    sg_dma_address(&sgl[1]) = p;
    sg_dma_len(&sgl[1]) = 0xFFFF;

    p = p - 0xFFFF;
    sg_dma_address(&sgl[2]) = p;
    sg_dma_len(&sgl[2]) = 0xFFFF;

    p = p - 0xFFFF;
    sg_dma_address(&sgl[3]) = p;
    sg_dma_len(&sgl[3]) = 0xFFFF;

    p = p - 0xFFFF;
    sg_dma_address(&sgl[4]) = p;
    sg_dma_len(&sgl[4]) = 0xFFFF;

    p = p - 0xFFFF;
    sg_dma_address(&sgl[5]) = p;
    sg_dma_len(&sgl[5]) = 2063;

    desc = dmaengine_prep_slave_sg(chan, sgl, 6,
                DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT|DMA_CTRL_ACK);

0 Eric Chen 9 months ago in reply to Eric Chen

Prodigy 30 points

sorry for mistake, start with gImage_logome[329727], not gImage_logome[329728]

0 Bin Liu 9 months ago in reply to Eric Chen

TI__Guru**** 171491 points

Hi Eric,

Just for prototype, please try the following kernel dma patch to see if it can transfer from MEM to GPMC correctly.

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index 208f4c120eab..1255b12f5654 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -3133,16 +3133,16 @@ udma_prep_slave_sg_triggered_tr(struct udma_chan *uc, struct scatterlist *sgl,
                        tr_req[tr_idx].icnt1 = tr_cnt1;
                        tr_req[tr_idx].icnt2 = tr0_cnt2;
                        tr_req[tr_idx].icnt3 = tr0_cnt3;
-                       tr_req[tr_idx].dim1 = tr_cnt0;
-                       tr_req[tr_idx].dim2 = trigger_size;
-                       tr_req[tr_idx].dim3 = trigger_size * tr0_cnt2;
+                       tr_req[tr_idx].dim1 = (-1) * tr_cnt0;
+                       tr_req[tr_idx].dim2 = (-1) * trigger_size;
+                       tr_req[tr_idx].dim3 = (-1) * trigger_size * tr0_cnt2;
 
                        tr_req[tr_idx].daddr = dev_addr;
                        tr_req[tr_idx].dicnt0 = tr_cnt0;
                        tr_req[tr_idx].dicnt1 = tr_cnt1;
                        tr_req[tr_idx].dicnt2 = tr0_cnt2;
                        tr_req[tr_idx].dicnt3 = tr0_cnt3;
-                       tr_req[tr_idx].ddim1 = (-1) * tr_cnt0;
+                       tr_req[tr_idx].ddim1 = 0;
                }
 
                tr_idx++;
@@ -3179,15 +3179,15 @@ udma_prep_slave_sg_triggered_tr(struct udma_chan *uc, struct scatterlist *sgl,
                                tr_req[tr_idx].icnt1 = tr_cnt1;
                                tr_req[tr_idx].icnt2 = tr1_cnt2;
                                tr_req[tr_idx].icnt3 = 1;
-                               tr_req[tr_idx].dim1 = tr_cnt0;
-                               tr_req[tr_idx].dim2 = trigger_size;
+                               tr_req[tr_idx].dim1 = (-1) * tr_cnt0;
+                               tr_req[tr_idx].dim2 = (-1) * trigger_size;
 
                                tr_req[tr_idx].daddr = dev_addr;
                                tr_req[tr_idx].dicnt0 = tr_cnt0;
                                tr_req[tr_idx].dicnt1 = tr_cnt1;
                                tr_req[tr_idx].dicnt2 = tr1_cnt2;
                                tr_req[tr_idx].dicnt3 = 1;
-                               tr_req[tr_idx].ddim1 = (-1) * tr_cnt0;
+                               tr_req[tr_idx].ddim1 = 0;
                        }
                        tr_idx++;
                }

0 Eric Chen 9 months ago in reply to Bin Liu

Prodigy 30 points

thanks, I'll try it.

and give you respone later.

0 Eric Chen 9 months ago in reply to Eric Chen

Prodigy 30 points

thanks, Bin. You help me a lot.

It works perfectly.

0 Bin Liu 9 months ago in reply to Eric Chen

TI__Guru**** 171491 points

Hi Eric,

Awesome! Thanks for the update.

Do you mind to explain how the data (gImage_logome) are generated which requires DMA to transfer decrementally? As I said earlier the kernel DMAEngine doesn't support something like "DMA_MEM_DEC_TO_DEV", but if you have an valid use case, we might have to extend DMAEngine to support such type of decremental transfer.

0 Eric Chen 9 months ago in reply to Bin Liu

Prodigy 30 points

1. how the data (gImage_logome) are generated ?

----> it's a logo.bmp, size is 448*368, rgb16 (2bytes) for one pixel, i use Img2Lcd.exe , Img2Lcd.exe can generate logo to a gImage_logome array, it just for testing oled.

2. why i need dma transfer decrementally?

the data in cpu memery like this, the first pixel value is 0x5418 (16bit), low 8bit is 0x18, high 8bit is 0x54. Normally, it will send 0x18 first and then 0x54.

But the data in oled gram is Big-endian, low 8bit and high 8bit is reversed. and oled bus width is 8bit. so it need to receive data 0x54 first and then 0x18,

when i use dma to send data for tail to head decrementally, it will send data like this: 0x53, 0x4e, 0x63, 0xd0, ......, 0x54, 0x18. it match the oled data format.

const unsigned char gImage_logome[329728] = {
0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,
0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,
0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,0X18,0X54,
..........
..........
0X8F,0X5B,0X4E,0X53,0XCC,0X42,0XED,0X42,0X2E,0X4B,0X0D,0X4B,0X2D,0X4B,0X4E,0X53,
0X4E,0X53,0X2E,0X4B,0XF1,0X63,0X72,0X74,0X31,0X6C,0X93,0X7C,0XD0,0X63,0X4E,0X53,
};

3. I got a question, i play a video (Jellyfish_448x368_h265.mp4) on the oled with dma ,it is NOT fluency. but it's fluency with memcpy. i'm not sure that the cause is dma or not?

root@am62axx-evm:/opt/edgeai-gst-apps# gst-launch-1.0 filesrc location=Jellyfish_448x368_h265.mp4 ! qtdemux name=demux demux.video_0 ! queue ! decodebin ! videoconvert ! videoscale ! fbdevsink
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Redistribute latency...
Redistribute latency...
Pipeline is PREROLLED ...0 %)
Setting pipeline to PLAYING ...
Redistribute latency...
New clock: GstSystemClock
WARNING: from element /GstPipeline:pipeline0/GstFBDEVSink:fbdevsink0: A lot of buffers are being dropped.
Additional debug info:
../gstreamer-1.20.7/libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstFBDEVSink:fbdevsink0:
There may be a timestamping problem, or this computer is too slow.
WARNING: from element /GstPipeline:pipeline0/GstFBDEVSink:fbdevsink0: A lot of buffers are being dropped.
Additional debug info:
../gstreamer-1.20.7/libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstFBDEVSink:fbdevsink0:
There may be a timestamping problem, or this computer is too slow.
WARNING: from element /GstPipeline:pipeline0/GstFBDEVSink:fbdevsink0: A lot of buffers are being dropped.
Additional debug info:
../gstreamer-1.20.7/libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstFBDEVSink:fbdevsink0:
There may be a timestamping problem, or this computer is too slow.
WARNING: from element /GstPipeline:pipeline0/GstFBDEVSink:fbdevsink0: A lot of buffers are being dropped.
Additional debug info:
../gstreamer-1.20.7/libs/gst/base/gstbasesink.c(3143): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstFBDEVSink:fbdevsink0:
There may be a timestamping problem, or this computer is too slow.
Got EOS from element "pipeline0".
Execution ended after 0:00:12.192273104
Setting pipeline to NULL ...
Freeing pipeline ...
root@am62axx-evm:/opt/edgeai-gst-apps#

0 Eric Chen 9 months ago in reply to Eric Chen

Prodigy 30 points

The memcpy's defect is CPU usage is high, 60%~80%.

0 Bin Liu 9 months ago in reply to Eric Chen

TI__Guru**** 171491 points

Hi Eric,

Eric Chen said:
when i use dma to send data for tail to head decrementally, it will send data like this: 0x53, 0x4e, 0x63, 0xd0, ......, 0x54, 0x18. it match the oled data format.

Do you mean the last two bytes in the buffer (0x53, 0x4e) is for the first pixel (top-left corner) on the display panel, and the first two bytes in the buffer (0x54, 0x18) is for the last pixel (bottom-right corner)?

If so, how do you convert video frame data to this scheme when playing back a video? As far as I know, the video frame data in memory have the reversed order - the beginning of the buffer is for the first pixel, while the end of the buffer is for the last pixel.

Eric Chen said:
3. I got a question, i play a video (Jellyfish_448x368_h265.mp4) on the oled with dma ,it is NOT fluency.

You would have to profile/debug the problem to know the root cause.

0 Eric Chen 9 months ago in reply to Bin Liu

Prodigy 30 points

1. Don't care about top-left corner and bottom-right corner, OLED driver ic can roate the degree. What i care about is only the Big-endian, send high 8bit first. It match oled gram's mdoe.

2. video is actaully photo in frames. video strem become to one frame by one frame while input to oled gram.

3.I think, I'll test how many time when the dma move datas.

0 Bin Liu 9 months ago in reply to Eric Chen

TI__Guru**** 171491 points

This is really not a rotation, rather flipping diagonally. Good to hear the OLED IC can handle this transformation!

You might want to first measure the time which the DMA takes to transfer one frame, from the call to dmaengine_submit() to the DMA completion callback function is called.

0 Eric Chen 9 months ago in reply to Bin Liu

Prodigy 30 points

Dma takes 80ms from dmaengine_submit() to the DMA completion, memcpy takes 85~100ms. That can't explain the video cation above.

I have no idea about it. and dma takes 80ms, it suprised me, almost equal to memcpy.

I test like that.

static int update_oled_contents_by_dma(struct oledfb_dev *oled_dev)
{
    int ret;

	if(oled_dev->desc_tx) {
		dev_info(oled_dev->dev, "The old screen update request has not been executed yet!\n");
		return 0;
	}

    ret = prepare_dma_transfer(oled_dev);
    if (ret < 0) {
        dev_err(oled_dev->dev, "%s: prepare_dma_transfer fail: %d\n", __FUNCTION__, ret);
        return ret;
    }


    if(oled_dev->oled_id == OLED_ID_FT2308)
    {
        /* Tearing Effect Line OFF */
 //       oled_ft2308_cmd(oled_dev, 0x34);
        /* Memory Write */
        oled_ft2308_cmd(oled_dev, 0x2C);
    }
    else
    {
        /* Tearing Effect Line OFF */
//        oled_rm69092_cmd(oled_dev, 0x3400);
        /* Memory Write */
        oled_rm69092_cmd(oled_dev, 0x2C00);
    }
    struct dma_chan *txchan = oled_dev->dma_chan;
	dma_cookie_t cookie_tx;
	dev_info(oled_dev->dev,"update_oled_contents dma ENTER.222\n");
#ifdef DBG_GPO_TIMING
	gpiod_set_value(oled_dev->dbg_gpo_timing, 1);
#endif

    /* Start DMA transmition, send RGB data */
    cookie_tx = dmaengine_submit(oled_dev->desc_tx);
    txchan->device->device_issue_pending(txchan);

    return 0;
}


/* Issue pending requests and wait for callback notification */
static void dma_complete_callback_function(void *dma_async_param)
{
    struct oledfb_dev *oled_dev = (struct oledfb_dev *)dma_async_param;
    struct dma_chan *txchan = oled_dev->dma_chan;

#ifdef DBG_GPO_TIMING
	gpiod_set_value(oled_dev->dbg_gpo_timing, 0);
#endif
	dev_info(oled_dev->dev, "%s(%d): ENTER2b.\n",__func__, __LINE__);

    async_tx_ack(oled_dev->desc_tx);
    oled_dev->desc_tx = NULL;

    oled_dev->refresh_complete = 1;
}

//k3-am62a-oled.dtsi
//add dbg_gpo_timing gpio

		dbg_gpo_timing-gpios = <&main_gpio1 10 GPIO_ACTIVE_HIGH>;

memcpy test code like that.

static int update_oled_contents(struct oledfb_dev *oled_dev)
{
    int i;
    char *pcontent = oled_dev->rotated_framebuf;
	dev_info(oled_dev->dev,"update_oled_contents ENTER.\n");

    oled_prepare_to_write_pixel(oled_dev);
#ifdef DBG_GPO_TIMING
	gpiod_set_value(oled_dev->dbg_gpo_timing, 1);
#endif

    /* RGB contents are written to OLED from tail to head */
    for(i=(OLED_MAX_X_PIXELS * OLED_MAX_Y_PIXELS * 2); i>0; i=i-2) {	//tail to head
        oled_ft2308_write(oled_dev, *(pcontent+i+1)); // upper one
        oled_ft2308_write(oled_dev, *(pcontent+i));    // lower one
    }
#ifdef DBG_GPO_TIMING
	gpiod_set_value(oled_dev->dbg_gpo_timing, 0);
#endif

    oled_dev->refresh_complete = 1;

    return 0;
}

dma test time.

memcpy test time

0 Bin Liu 9 months ago in reply to Eric Chen

TI__Guru**** 171491 points

Eric Chen said:
Dma takes 80ms from dmaengine_submit() to the DMA completion, memcpy takes 85~100ms. That can't explain the video cation above.

I have no idea about it. and dma takes 80ms, it suprised me, almost equal to memcpy.

Since the time is very similar (and CPU load not full in memcpy()), I think likely the bottleneck is the GPMC interface, which only runs at 33MHz and receives one byte per access.

329728 bytes / 80ms = 4.1MB/sec.

0 Eric Chen 9 months ago in reply to Bin Liu

Prodigy 30 points

You're right, I agree with you about the dma cause.

1. I test the CPU usage while playing video, use this command.

gst-launch-1.0 filesrc location=Jellyfish_448x368_h265.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h265parse ! v4l2h265dec capture-io-mode=dmabuf ! videoconvert ! videoscale ! fbdevsink

here is the test results:

memcpy mode, gst-launch-1.0 CPU uage is 80%, vpu_irq_thread is 2%. It's fluency.

DMA mode, gst-launch-1.0 is 40%, vpu_irq_thread is 35%. It's NOT fluency.

2. playing video , use this command :

gst-launch-1.0 filesrc location=Jellyfish_448x368_h265.mp4 ! qtdemux name=demux demux.video_0 ! queue ! decodebin ! videoconvert ! videoscale ! kmssink driver-name=tidss -v

here is the test results:

gst-launch-1.0 CPU uage is 4.6%, vpu_irq_thread is 2.3%. It's fluency.

My question is: the video is played by hard decoder. why the cpu usage is so high when output to fbsink?

0 Bin Liu 9 months ago in reply to Eric Chen

TI__Guru**** 171491 points

Hi Eric,

I am checking with our Multimedia expert for comments.

0 Suren Porwar 9 months ago in reply to Bin Liu

TI__Mastermind 30265 points

Hi Eric,

Yes the decoder should be HW accelerated. But in your pipeline I do see you are using SW plugins like videoconvert, videoscale which would increase the CPU usage. Is it possible to run the file with the below command:

gst-launch-1.0 -v filesrc location=<filename>.mp4 ! qtdemux ! h264parse ! v4l2h264dec ! fpsdisplaysink video-sink="kmssink driver-name=tidss sync=false" and see if it plays fine

Appreciate if you could attach your file as well.

Best Regards,

Suren

0 Eric Chen 9 months ago in reply to Suren Porwar

Prodigy 30 points

Bin, Suren， thanks.

I'll try your command. here is my video file.

0 Eric Chen 9 months ago in reply to Eric Chen

Prodigy 30 points

Jellyfish_448x368_h265.zip

video file zip

0 Eric Chen 9 months ago in reply to Suren Porwar

Prodigy 30 points

When the following two commands are used for playback on the HDMI large screen, the CPU usage is not high, approximately between 10% and 28%.

gst-launch-1.0 -v filesrc location=Jellyfish_448x368_h265.mp4 ! qtdemux ! h265parse ! v4l2h265dec ! fpsdisplaysink video-sink="kmssink driver-name=tidss sync=false"
gst-launch-1.0 filesrc location=Jellyfish_448x368_h265.mp4 ! qtdemux name=demux demux.video_0 ! queue ! decodebin ! videoconvert ! videoscale ! kmssink driver-name=tidss -v

What I'm mainly concerned about is the playback to the fbdevsink. The CPU usage is extremely high when the playback output is directed to the fbdevsink. However, if the videoconvert element you mentioned is not used, it seems that v4l2h265dec cannot directly output to the fbdevsink. For example, with a command like this:

gst-launch-1.0 filesrc location=Jellyfish_448x368_h265.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h265parse ! v4l2h265dec ! videoconvert ! fbdevsink

Is there any command that can output to the fbdevsink without using the SW plugin you mentioned? Thank you.

0 Suren Porwar 9 months ago in reply to Eric Chen

TI__Mastermind 30265 points

Hi Eric,

Thanks for sharing the file.

I ran the below gstreamer command on my AM62A board with 10.1 SDK released software.

gst-launch-1.0 -v filesrc location=Jellyfish_448x368_h265.mp4 ! qtdemux ! h265parse ! v4l2h265dec capture-io-mode=dmabuf ! kmssink driver-name=tidss sync=false plane_id=31

and observed the htop to be 1.3%-2% with kmssink.

Also on our SoC we support kmssink and waylandsink. Fbdevsink is not supported currently. Also I assume that fbdevsink would use CPU and that's the reason you might be seeing a huge CPU load. Also in order to convert the required format using SW plugins videoconvert , videoscale would cost a lot of CPU utilization.

Best Regards,

Suren

0 Eric Chen 9 months ago in reply to Suren Porwar

Prodigy 30 points

Hi, Suren

I have connected the oled to gpmc interface. and it support fbdevsink. The ai adge application has displayed on it when it startup.

0 Suren Porwar 9 months ago in reply to Eric Chen

TI__Mastermind 30265 points

Hi Eric,

I did give it a try to run the same pipeline with fbdevsink on AM62A, but requires a videoconvert (SW plugin that costs CPU) in order to have BGRx format for fbdevsink to support.

gst-launch-1.0 -v filesrc location=Jellyfish_448x368_h265.mp4 ! qtdemux ! h265parse ! v4l2h265dec capture-io-mode=dmabuf ! videoconvert ! fbdevsink device=/dev/fb1

I do see the CPU utilization to be ~60-70%

Best Regards,

Suren

0 Eric Chen 9 months ago in reply to Suren Porwar

Prodigy 30 points

Hi, Suren

thanks.

yes, you're right.

I also just tested it. it must use videoconvert , i use gst-inspect-1.0 fbdevsink , gst-inspect-1.0 v4l2h265dec , check the src and sink Capabilities.

0 Suren Porwar 9 months ago in reply to Eric Chen

TI__Mastermind 30265 points

Hi Eric,

Have you reached to the community on fbdevsink? AFAIK, Its deprecated and advice to use kmssink/waylandsink.

Also, fbdevsink sink caps says RGB is supported but when using ticolorconvert it fails to accept the RGB input and only expects the input as BGRx format.

Best Regards,

Suren

0 Eric Chen 8 months ago in reply to Suren Porwar

Prodigy 30 points

Hi, Suren

I'm still on vacation, so I haven't seen your reply. I'll communicate with you again once I have new progress. Thank you.