This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/TDA2PXEVM: TDA2P custom board - NVMe SSD - DMA Usage

Part Number: TDA2PXEVM


Tool/software: Linux

Hi TI,

we are having issues while writing files to SSD (Intel Optane 900P) over NVMe using our custom TDA2P board. Write speed is good but the CPU load is around 80 percent. We would like to see if we are using DMA/eDMA for this transfer.

The SW running on TDA2p is linux/SDK.

Do you have propositions for lowering this CPU load?

Regards,

Stefan.

  • Hi Stefan,

    Can you please provide the commands used to write to SSD? Reason I ask is if commands such as dd are used it could involve mem copy increasing the load. Also can you provide a snapshot of top when the writes are performed.

    DMA can be used to write to the SSD, however with the SSD servicing as an EP the DMA writes must be triggered by the SSD (most end-points trigger DMA read / writes, host only programs registers to trigger events such as DMA copy).

    Regards
    Shravan
  • Hey Shravan,

    we are writing to a preallocated 8GB file (fallocate) and then just write (system write) to it.

    root@dra7xx-evm:/# top

    top - 12:01:23 up 6 min, 2 users, load average: 1.50, 0.66, 0.27
    Tasks: 106 total, 3 running, 103 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 0.2 us, 85.4 sy, 0.0 ni, 12.1 id, 2.1 wa, 0.0 hi, 0.2 si, 0.0 st
    KiB Mem : 1819728 total, 133908 free, 58388 used, 1627432 buff/cache
    KiB Swap: 0 total, 0 free, 0 used. 1715276 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    1214 root 20 0 1208 456 288 R 75.0 0.0 0:35.31 a.out
    75 root 20 0 0 0 0 R 70.1 0.0 0:08.41 kworker/u4:3

    Regards,
    Stefan.
  • Hi Stefan,

    Can you confirm that you're A15 is running at 1.8GHz? Please set the scaling governor to "performance" by running the below command.

    echo "performance" > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

    Regards

    Shravan

  • Hi Shravan,

    we have modified our .dts file so it runs on 1800MHz. Please note the bolded part of the following command.

    root@dra7xx-evm:~# omapconf show opp
    OMAPCONF (rev v1.73-17-g578778b built Thu Aug 31 13:16:54 IST 2017)

    HW Platform:
      Generic DRA74X (Flattened Device Tree)
      DRA76X ES1.0 GP Device (STANDARD performance (1.0GHz))
    Error: I2C Read failed
    Error: I2C Read failed
    Error: I2C Read failed
      UNKNOWN POWER IC

    SW Build Details:
      Build:
        Version:  _____                    _____           _         _
      Kernel:
        Version: 4.4.84
        Author: root@rtrkn096-lin
        Toolchain: gcc version 5.3.1 20160113 (Linaro GCC 5.3-2016.02)
        Type: #5 SMP PREEMPT
        Date: Thu Aug 16 14:47:18 CEST 2018

    |-----------------------------------------------------------------------------------|
    |                        | Temperature | Voltage | Frequency      | OPerating Point |
    |-----------------------------------------------------------------------------------|
    | VDD_CORE / VDD_CORE0   | 42C / 107F  | NA      |                | NOM             |
    |   L3                   |             |         |  266  MHz      |                 |
    |   DMM                  |             |         |  266  MHz      |                 |
    |   EMIF1                |             |         |  266  MHz      |                 |
    |   EMIF2                |             |         |  266  MHz      |                 |
    |     LP-DDR2            |             |         |  666  MHz      |                 |
    |   L4                   |             |         |  266  MHz      |                 |
    |   IPU1                 |             |         | (2128 MHz) (1) |                 |
    |     Cortex-M4 Cores    |             |         | (1064 MHz) (1) |                 |
    |   IPU2                 |             |         |  2128 MHz      |                 |
    |     Cortex-M4 Cores    |             |         |  1064 MHz      |                 |
    |   DSS                  |             |         |  192  MHz      |                 |
    |   BB2D                 |             |         | (2128 MHz) (1) |                 |
    |                        |             |         |                |                 |
    | VDD_MPU / VDD_CORE1    | 43C / 109F  | NA      |                | PLUS            |
    |   MPU (CPU1 ON)        |             |         |  1800 MHz      |                 |
    |                        |             |         |                |                 |
    | VDD_GPU / VDD_CORE2    | 42C / 107F  | NA      |                | HIGH            |
    |   GPU                  |             |         |  532  MHz      |                 |
    |                        |             |         |                |                 |
    | VDD_DSPEVE / VDD_CORE3 | 41C / 105F  | NA      |                | NOM             |
    |   DSP1                 |             |         |  750  MHz      |                 |
    |   DSP2                 |             |         |  750  MHz      |                 |
    |   EVE1                 |             |         |  535  MHz      |                 |
    |   EVE2                 |             |         |  535  MHz      |                 |
    |                        |             |         |                |                 |
    | VDD_IVA / VDD_CORE4    | 43C / 109F  | NA      |                | HIGH            |
    |   IVA                  |             |         |  532  MHz      |                 |
    |                        |             |         |                |                 |
    |-----------------------------------------------------------------------------------|

    Notes:
      (1) Module is disabled, rate may not be relevant.

    Regards,

    Stefan.

  • Hi Stefan,

    I've had the look at the driver and below are some observations / comments:

    1. I don't think using DMA will decrease the sytem load. Since the SSD card acts as an endpoint, DMA is initiated from the SSD (and not the TDA2P board)
    2. The load in the system could be due to the copies involved between user and kernel space

    To avoid user-space copies, you can use the splice commands. In your final use-case, you want to write camera streams to the SSD, the data from the camera streams is exported to Linux as a DMA-buf file-descriptor (refer Documentation/virt-mem-export.txt in the Linux kernel and /hlos/src/links/ipcIn/ipcInLink_drv.c in VSDK). Since the input is also a file, splice is a classic command to copy data between two files without copy between user-space and kernel space. You can find more information here.

    blog.plenz.com/.../so-you-want-to-write-to-a-file-real-fast.html

    Please note let the output file (file written to SSD), needs to still be pre-allocated using fallocate (in-fact comparison with and without fallocate is mentioned in the above blog-post).

    Regards
    Shravan