This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DMA transfer over keystone PCIe

Hi. Let's imagine very obvious scenario - PC computer acting as PCIe RC and TMDSEVM6678 as EP. DSP board is running PCIe bootloader, no custom code is loaded. I wrote simple device driver which writes test data to BAR 3 (one of memories) and reads it back. In PIO mode (direct reading / writing of PCIe space) it works as expected. However if I try to read DSP memory using DMA on PC side, all 32-bit words of data read has value of 0xDEADBEEF. At the beginning I was thinking about bug in my code. With the same code I tried to read PCIe registers from bar 0 - this time I got data vector with values 0xFFFFFFFF, 0xFFFFFFFE, 0xFFFFFFFD and so on. Could you please tell me if there are important constraints concerning DMA over PCIe in C6678 DSP? Once again, I was using host DMA, not DSP EDMA.

  • Hi,

    There are a few things we do to use DMA over PCIe about programming outbound registers w/ OB window sizes. BTW, we use DSP's EDMA for transfers.

    Please download Desktop Linux SDK (available at: http://software-dl.ti.com/sdoemb/sdoemb_public_sw/desktop_linux_sdk/latest/index_FDS.html) and look into /sdk/pciedrv/src/pciedrv.c the functions of interest are pciedrv_open() and resource_init(). 

    There is a top level example called "filetestdemo" (under demos/ folder) that demonstrates the use case of moving data and control messages between host and DSP.

    You can find additional details @ http://processors.wiki.ti.com/index.php/Desktop-linux-sdk_01.00.00_Getting_Started_Guide and http://processors.wiki.ti.com/index.php/Desktop-linux-sdk_01.00.00_Development_Guide

    Regards,

    Vivek

  • Hi Vivek,

    Thanks for materials indicated. My intention is NOT to use outbound transfers, as in case of buggy DSP software, outbound writes to wrong areas may affect software running on host machine. Bus mastering is DISABLED in my case. I am rather interested in DMA transfer made by host DMA engine, between pcie BAR window and memory. And today I found the solution. Before I was using virt_to_page for both memory regions. The function works well for kernel memory but not virtual memory or iomapped PCIe BAR. For PCIe BAR I changed it to vmalloc_to_page. Now it works as expected. Summing up, that was my bug.

    So code copying data from BAR to memory  looks as follows:

    /* At the beginning, once ... */

    dma_cap_mask_t mask;

    dma_cap_zero(mask);

    dma_cap_set(DMA_MEMCPY, mask);

    dma = dma_request_channel(mask, NULL, NULL);

    /* vmalloc_to_page is more expensive as it must traverse vm structures. So let's do it just once here. */

    bar1_first_page = vmalloc_to_page(bar_virtual_address);

    /* And later... */

    cookie = dma_async_memcpy_pg_to_pg(dma, virt_to_page(memory_virtual_address), memory_virtual_address & (PAGE_SIZE - 1), bar1_first_page, bar_offset, size);

    r = dma_sync_wait(dma, cookie);

    The solution works with kernel memory regions (allocated with "kmalloc", not "vmalloc"). For buffers containing few kB or more, throughput is more than 100x higher than in ordinal, software memcpy from ioremapped BAR to memory.

    Paul

  • Paul,

    Thanks for the update. Glad that it worked for you.

    Regards,

    Vivek