This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DRA756: Declink_h264DecodeFrame takes more then 100ms(top) sometime

Part Number: DRA756
Other Parts Discussed in Thread: SYSBIOS

Hi experts:

 

issues:

 

NULLSRC-2: No emptyQue is observed and the video is not good:

[IPU1-0]    217.960714 s: NULLSRC-2: No emptyQue, frameId-2254, chId-0

 

i can observed Declink_h264DecodeFrame takes more 40ms in sometime  when decode with 720p@25fps.

what did i do to find the issue:

tm1 = Utils_getCurGlobalTimeInUsec();

fxns->algActivate((IALG_Handle) handle);
error = handle->fxns->ividdec3.process((IVIDDEC3_Handle) handle,
inputBufDesc,
outputBufDesc,
(IVIDDEC3_InArgs *) inArgs,
(IVIDDEC3_OutArgs *) outArgs);
fxns->algDeactivate((IALG_Handle) handle);
tm1 = Utils_getCurGlobalTimeInUsec() - tm1;

if (tm1 > (10 * 1000))
{
Vps_printf("ividdec3 fid=%d tm %lld  ***** \n", pReqObj->InBuf->frameId, tm1);
}

log:

[HOST] [IPU1-0] 272.541603 s: ividdec3 fid=3051 tm 12872 error!!!! ***** 
[HOST] [IPU1-0] 272.602055 s: ividdec3 fid=3052 tm 28549 error!!!! ***** 
[HOST] [IPU1-0] 272.638443 s: ividdec3 fid=3053 tm 15067 error!!!! ***** 
[HOST] [IPU1-0] 272.830354 s: ividdec3 fid=3055 tm 45233 error!!!! *****

[HOST] [IPU1-0] 272.883822 s: ividdec3 fid=3056 tm 11315 *****
[HOST] [IPU1-0] 272.911517 s: ividdec3 fid=3057 tm 13207 *****
[HOST] [IPU1-0] 272.980814 s: ividdec3 fid=3058 tm 19703 *****
[HOST] [IPU1-0] 273.131977 s: ividdec3 fid=3059 tm 119899 *****

environment:

the decode version is h264vdec_02_00_17_01_production

vision sdk 3.2

our usecase in IPU1:

7-channel capture of 720p

our usecase in a15:

take 4 yuv frm from ipcout_surroundView to do video fusion though GPU.

take 1 yuv frm from ipcout_capture  to do fd though arm cpu.

 

our EMIF test result:

M4 statCollector:

SCI_EMIF1 RD+WR maximum:2374.584718

SCI_EMIF2 RD+WR maximum:3109.538576

attachment for the detail result:

Statistics Collector,

 

STATISTIC Avg Data Peak Data

COLLECTOR MB/s MB/s

--------------------------------------------------

SCI_EMIF1 RD+WR | 1175.080023 2374.584718

SCI_EMIF2 RD+WR | 1444.636093 3109.538576

SCI_EMIF1 RD ONLY | 814.471451 1525.882800

SCI_EMIF1 WR ONLY | 360.761496 878.057164

SCI_EMIF2 RD ONLY | 1081.784603 2234.331402

SCI_EMIF2 WR ONLY | 363.088525 912.946885

SCI_MA_MPU_P1 | 104.285875 567.077803

SCI_MA_MPU_P2 | 220.883907 1117.839188

SCI_DSS | 428.426779 473.662801

SCI_IPU1 | 34.330070 53.968017

SCI_VIP1_P1 | 29.553571 38.795876

SCI_VIP1_P2 | 64.905693 75.743583

SCI_VPE_P1 | 115.062491 339.556288

SCI_VPE_P2 | 115.070836 339.609072

SCI_DSP1_MDMA | 257.788436 467.796699

SCI_DSP1_EDMA | 0.000000 0.000000

SCI_DSP2_MDMA | 254.011228 439.289042

SCI_DSP2_EDMA | 0.000000 0.000000

SCI_EVE1_TC0 | 0.000000 0.000000

SCI_EVE1_TC1 | 0.000000 0.000000

SCI_EVE2_TC0 | 0.000000 0.000000

SCI_EVE2_TC1 | 0.000000 0.000000

SCI_EDMA_TC0_RD | 0.005387 0.060739

SCI_EDMA_TC0_WR | 0.005387 0.060739

SCI_EDMA_TC1_RD | 0.022858 0.185149

SCI_EDMA_TC1_WR | 0.045729 0.370298

SCI_VIP2_P1 | 8.796014 17.754530

SCI_VIP2_P2 | 20.510495 41.427237

SCI_VIP3_P1 | 17.917204 35.509060

SCI_VIP3_P2 | 40.548533 82.854474

SCI_EVE3_TC0 | 0.000000 0.000000

SCI_EVE3_TC1 | 0.000000 0.000000

SCI_EVE4_TC0 | 0.000000 0.000000

SCI_EVE4_TC1 | 0.000000 0.000000

SCI_IVA | 107.746195 650.960122

SCI_GPU_P1 | 523.765875 957.638688

SCI_GPU_P2 | 531.234214 948.841269

SCI_GMAC_SW | 0.000000 0.000000

SCI_OCMC_RAM1 | 0.000000 0.000000

SCI_OCMC_RAM2 | 0.000000 0.000000

SCI_OCMC_RAM3 | 0.000000 0.000000

Performance from document:

i read from DRA74x_75x and DRA72x Performance(SPRAC46A)

Interleaved (Two 32-Bit Memory)

EMIF1
Bandwidth

EMIF2
Bandwidth

 

3650.88

3642.88

 

 

clock of our hardware setup:

533Mhz each EMIF interface.

EMIF1 has 2*256MiB chips connected.

EMIF2 has 2*512MiB chip connected.

EMIF has ecc chip connected.

attach for DMM_LISA_MAP_i in case you want to know if they are interleaved.

~ # omapconf read 0x4E000040
00000000
~ # omapconf read 0x4E000044
80640300
~ # omapconf read 0x4E000048
C0500220
~ # omapconf read 0x4E00004c
FF020100

 

 

formula for emif payload:

Interleave: 32bits * 1066MHz * 2-interleave * 0.65(actual rate) / 8 = 5,543.2MB/s

Non-interleave: 5,543.2MB/s / 2 = 2,771.6MB/s

which is that far to the test result.

please tell me if it's over run or not.

 

any suggestion will be appreciated.

 

  • Hi Wen,

    IVAHD is capable of 1920x1080@60fps which means it should be able to finish decoding within 33ms.

    Are you seeing this performance drop for a specific stream and for specific frames always?

    Please check if IVAHD is configured for 532MHz from "omapconf show opp"  command.

    IVA average DDR access is 107Mbps which means very little access.

    Please refer these to appnotes.

    http://www.ti.com/lit/an/sprabx1a/sprabx1a.pdf

    http://www.ti.com/lit/an/sprabx0/sprabx0.pdf

    1) You can try to increase the priority of IVAHD first

    2) Bandwidth regulation and bandwidth limitation can  also be tried.

    Thanks

    RamPrasad

  • RamPrasad, thanks for your advise which is very valuable.

    result of "omapconf show opp" command shows:

    OMAPCONF (rev v1.73-17-g578778b built Thu Dec 28 05:15:12 IST 2017)

    HW Platform:
    Generic DRA74X (Flattened Device Tree)
    DRA75X ES2.0 GP Device (STANDARD performance (1.0GHz))
    TPS659038 ES2.2

    SW Build Details:
    Build:
    release_details_get(): could not open /etc/issue.net file?!
    Version: UNKNOWN
    Kernel:
    Version: 4.4.84+
    Author: ******
    Toolchain: gcc version 5.3.1 20160113 (Linaro GCC 5.3-2016.02)
    Type: #1 SMP PREEMPT
    Date: Fri Nov 29 13:17:40 CST 2019

                                                    | Temperature | Voltage    |   Frequency      | OPerating Point

    VDD_CORE / VDD_CORE0   | 71C / 159F  | 1.060 V     |                          | NOM             |

    | L3                                           |                   |                   |  266 MHz          |                           |
    | DMM                                      |                   |                     | 266 MHz          |                   |
    | EMIF1                                     |                   |                   | 266 MHz           |                  |
    | EMIF2                                     |                   |                   | 266 MHz           |                  |
    | LP-DDR2                               |                   |                   | 532 MHz             |                   |
    | L4                                          |                   |                   | 266 MHz             |                   |
    | IPU1                                      |                   |                   | 425 MHz             |                   |
    | Cortex-M4 Cores                   |                   |                   | 212 MHz             |                   |
    | IPU2                                      |                   |                   | 425 MHz              |                   |
    | Cortex-M4 Cores                   |                   |                   | 212 MHz              |                   |
    | DSS                                       |                   |                   | 192 MHz              |                   |
    | BB2D                                     |                   |                   | (2128 MHz) (1)    |                   |

    | VDD_MPU / VDD_CORE1   | 72C / 161F | 1.100 V        |                          |        NOM |
    | MPU (CPU1 ON)                   |                     |                     | 1000 MHz       |           |

    | VDD_GPU / VDD_CORE2    | 72C / 161F  |        1.060 V |                      |           NOM |
    | GPU                                       |                  |                        |           425 MHz |           |

    | VDD_DSPEVE / VDD_CORE3 | 70C / 158F | 1.060 V     |                              | OVERDRIVE |
    | DSP1                                   |                    |                     | 700 MHz           |           |
    | DSP2                                   |                     |                     | 700 MHz           |           |
    | EVE1                                   |                     |                     | (0 MHz) (1)        |           |
    | EVE2                                   |                      |                     | (0 MHz) (1)        |           |
     
    | VDD_IVA / VDD_CORE4 | 72C / 161F | 1.060 V           |                            | NOM    |
    | IVA                                      |                       |                     | 388 MHz        |                         |

    does IVA 388 MHz  mean IVAHD here? if so, how can i change it to 532MHz?

    the EMIF1/2 is 266 MHz which is half of 533Mhz. can i trust this result? 

    based on our usecase, please advance how can i optimize our system, both in clocks and prioritys of the basic modules of each link/chain.

    thanks in advance.

  • Hi Wen,

    Which version of visionSDK is used here?

    I am seeing all Operating points at NOM and it should have been HIGH. 

    omapconf shows the device is DRA74x which means IVA's OPP HIGH is 532MHz .

    Thanks

    RamPrasad

  • hi, RamPrasad

    we are using vision_sdk_3.2.

    a quick search of IVA NOM in the fourm, i found

    https://e2e.ti.com/support/processors/f/791/p/549323/2006872?tisearch=e2e-sitesearch&keymatch=IVA%25252525252525252520nom#2006872

    I looked into u-boot implementation in PSDKLA 3.0 and found following configuration in file “/u-boot/configs/dra7xx_evm_defconfig

    CONFIG_DRA7_IVA_OPP_HIGH=y

    after modification the config file, the Operating points of IVA is HIGH now:

    | VDD_IVA / VDD_CORE4            |    80C / 176F           |        1.060 V       |                   | HIGH |
    | IVA                                               |                                 |                            | 532 MHz  |          |

    the result of decode time tasks about 70ms(top) but i think it maybe a tiny difference between the old one.

    71677
    69572
    66309

    any ideas?

    btw, the EMIF1/2 is 266 MHz which is half of 533Mhz. can i trust this result? 

    thanks in advance.

  • hi, RamPrasad

    any update?

  • Hi Wen,

    I have VisionSDK 3.06 setup and I am seeing attached opp information. 

    In your case no IP is set for HIGH .

    Can you check what this command shows from u-boot source code?

    cd ti_components/os_tools/linux/u-boot/u-boot

    cat .config | grep IVA

    My observation is 

    # CONFIG_DRA7_IVA_OPP_NOM is not set
    # CONFIG_DRA7_IVA_OPP_OD is not set
    CONFIG_DRA7_IVA_OPP_HIGH=y

    Thanks

    RamPrasad

  • opp.txt
    root@dra7xx-evm:~# omapconf show opp
    OMAPCONF (rev v1.73-17-g578778b built Wed Jan 2 18:53:32 IST 2019)
    
    HW Platform:
      Generic DRA74X (Flattened Device Tree)
      DRA75X ES1.0 GP Device (STANDARD performance (1.5GHz))
      TPS659038  ES1.0
    
    SW Build Details:
      Build:
        Version:  _____                    _____           _         _
      Kernel:
        Version: 4.4.84-00034-gaa42c96
        Author: x0038811@udx0038811
        Toolchain: gcc version 5.3.1 20160113 (Linaro GCC 5.3-2016.02)
        Type: #1 SMP PREEMPT
        Date: Mon Dec 30 13:10:14 IST 2019
    
       41.865639] random: nonblocking pool is initialized
    -|---------------------------------------------------------------------------------|
    |                        | Temperature | Voltage | Frequency      | OPerating Point |
    |-----------------------------------------------------------------------------------|
    | VDD_CORE / VDD_CORE0   | 46C / 114F  | 1.030 V |                | NOM             |
    |   L3                   |             |         |  266  MHz      |                 |
    |   DMM                  |             |         |  266  MHz      |                 |
    |   EMIF1                |             |         |  266  MHz      |                 |
    |   EMIF2                |             |         |  266  MHz      |                 |
    |     LP-DDR2            |             |         |  532  MHz      |                 |
    |   L4                   |             |         |  266  MHz      |                 |
    |   IPU1                 |             |         | (2128 MHz) (1) |                 |
    |     Cortex-M4 Cores    |             |         | (1064 MHz) (1) |                 |
    |   IPU2                 |             |         |  2128 MHz      |                 |
    |     Cortex-M4 Cores    |             |         |  1064 MHz      |                 |
    |   DSS                  |             |         |  192  MHz      |                 |
    |   BB2D                 |             |         | (2128 MHz) (1) |                 |
    |                        |             |         |                |                 |
    | VDD_MPU / VDD_CORE1    | 48C / 118F  | 1.060 V |                | NOM             |
    |   MPU (CPU1 ON)        |             |         |  1000 MHz      |                 |
    |                        |             |         |                |                 |
    | VDD_GPU / VDD_CORE2    | 44C / 111F  | 1.250 V |                | HIGH            |
    |   GPU                  |             |         |  532  MHz      |                 |
    |                        |             |         |                |                 |
    | VDD_DSPEVE / VDD_CORE3 | 48C / 118F  | 1.250 V |                | NOM             |
    |   DSP1                 |             |         |  750  MHz      |                 |
    |   DSP2                 |             |         |  750  MHz      |                 |
    |   EVE1                 |             |         |  535  MHz      |                 |
    |   EVE2                 |             |         |  535  MHz      |                 |
    |                        |             |         |                |                 |
    | VDD_IVA / VDD_CORE4    | 40C / 104F  | 1.250 V |                | HIGH            |
    |   IVA                  |             |         |  532  MHz      |                 |
    |                        |             |         |                |                 |
    |-----------------------------------------------------------------------------------|
    
    Notes:
      (1) Module is disabled, rate may not be relevant.
    

  • hi, RamPrasad

    i have modified the uboot config  file in the previous post:

    boot/configs/dra7xx_evm_defconfig

    CONFIG_DRA7_IVA_OPP_HIGH=y

    and  the  result  of iva is high.

    but  this doesnot affect decode performance.

    here is what i got from .config:

    cat .config| grep IVA
    # CONFIG_DRA7_IVA_OPP_NOM is not set
    # CONFIG_DRA7_IVA_OPP_OD is not set
    CONFIG_DRA7_IVA_OPP_HIGH=y
    CONFIG_HAVE_PRIVATE_LIBGCC=y
    # CONFIG_USE_PRIVATE_LIBGCC is not set

    Regards, wen

  • hi, RamPrasad

    any update about this issue?

  • Hi Wen,

    I would recommend to use OPP_HIGH for DSP and GPU also and then check the behavior. Default u-boot configuration doesn't set the OPP to NOM, so not sure why your configuration sets to NOM or OD for all IPs. 

    or  please try bto use u-boot from visionSDK 3.07, it is the latest visionSDK 

    Please try with these commands one by one and check the behavior. These commands set the priorities of IPU2 and IVAHD to 1.

    omapconf write 0x4e000624 0x09000000
    omapconf write 0x4e00062C 0x00000090

    Please refer section 4.1.1 of http://www.ti.com/lit/an/sprabx1a/sprabx1a.pdf on setting priorities 

    Please also refer section 4.2 of this doc 

    Thanks

    RamPrasad

  • hi, RamPrasad

    i have GPU DSP1&2 IVA running at HIGH, the priorities of IPU2&1 and IVAHD set to 1, but the result is a tiny difference between the very old one.

    65119us is observed after the above modifactions.

    0601.opp.txt
    ~ # omapconf show opp
    OMAPCONF (rev v1.73-17-g578778b built Thu Dec 28 05:15:12 IST 2017)
    
    HW Platform:
      Generic DRA74X (Flattened Device Tree)
      DRA75X ES2.0 GP Device (STANDARD performance (1.0GHz))
      TPS659038  ES2.2 
    
    SW Build Details:
      Build:
    release_details_get(): could not open /etc/issue.net file?!
        Version: UNKNOWN
      Kernel:
        Version: 4.4.84+
        Author: yang_yong@ubuntu
        Toolchain: gcc version 5.3.1 20160113 (Linaro GCC 5.3-2016.02)
        Type: #1 SMP PREEMPT
        Date: Fri Nov 29 13:17:40 CST 2019
    
    |-----------------------------------------------------------------------------------|
    |                        | Temperature | Voltage | Frequency      | OPerating Point |
    |-----------------------------------------------------------------------------------|
    | VDD_CORE / VDD_CORE0   | 69C / 156F  | 1.060 V |                | NOM             |
    |   L3                   |             |         |  266  MHz      |                 |
    |   DMM                  |             |         |  266  MHz      |                 |
    |   EMIF1                |             |         |  266  MHz      |                 |
    |   EMIF2                |             |         |  266  MHz      |                 |
    |     LP-DDR2            |             |         |  532  MHz      |                 |
    |   L4                   |             |         |  266  MHz      |                 |
    |   IPU1                 |             |         |  425  MHz      |                 |
    |     Cortex-M4 Cores    |             |         |  212  MHz      |                 |
    |   IPU2                 |             |         |  425  MHz      |                 |
    |     Cortex-M4 Cores    |             |         |  212  MHz      |                 |
    |   DSS                  |             |         |  192  MHz      |                 |
    |   BB2D                 |             |         | (2128 MHz) (1) |                 |
    |                        |             |         |                |                 |
    | VDD_MPU / VDD_CORE1    | 71C / 159F  | 1.100 V |                | NOM             |
    |   MPU (CPU1 ON)        |             |         |  1000 MHz      |                 |
    |                        |             |         |                |                 |
    | VDD_GPU / VDD_CORE2    | 70C / 158F  | 1.060 V |                | HIGH            |
    |   GPU                  |             |         |  532  MHz      |                 |
    |                        |             |         |                |                 |
    | VDD_DSPEVE / VDD_CORE3 | 69C / 156F  | 1.060 V |                | HIGH            |
    |   DSP1                 |             |         |  750  MHz      |                 |
    |   DSP2                 |             |         |  750  MHz      |                 |
    |   EVE1                 |             |         | (0    MHz) (1) |                 |
    |   EVE2                 |             |         | (0    MHz) (1) |                 |
    |                        |             |         |                |                 |
    | VDD_IVA / VDD_CORE4    | 71C / 159F  | 1.060 V |                | HIGH            |
    |   IVA                  |             |         |  532  MHz      |                 |
    |                        |             |         |                |                 |
    |-----------------------------------------------------------------------------------|
    
    Notes:
      (1) Module is disabled, rate may not be relevant.
    
    ~ # omapconf write 0x4e000624 0x09000000
    ~ # omapconf write 0x4e00062C 0x00000090
    ~ # omapconf write 0x4e00062C 0x00000099
    

    i will try to modify the Bandwidth Regulator and let you know the result.

  • hi, RamPrasad

    i have add BWRegulator for iva, but the system just failed to start, the log read as:

    [HOST] [IPU1-0] 72634.231867 s: CHAINS: Application Started !!!
    [HOST] [IPU1-0] 72634.235161 s: CHAINS: Sv + Adas Sysbios Started !!!
    [HOST] [IPU1-0] 72634.235283 s: AppCtrl_setSystemL3DmmPri!
    [HOST] [IPU1-0] 72634.235679 s:
    [HOST] [IPU1-0] 72634.235710 s: ### XDC ASSERT - ERROR CALLBACK START ###
    [HOST] [IPU1-0] 72634.235801 s:
    [HOST] [IPU1-0] 72634.235923 s: E_hardFault: FORCED
    [HOST] [IPU1-0] 72634.235984 s:
    [HOST] [IPU1-0] 72634.236045 s: ### XDC ASSERT - ERROR CALLBACK END ###
    [HOST] [IPU1-0] 72634.236106 s:
    [HOST] [IPU1-0] 72634.236808 s:
    [HOST] [IPU1-0] 72634.236838 s: ### XDC ASSERT - ERROR CALLBACK START ###
    [HOST] [IPU1-0] 72634.236899 s:
    [HOST] [IPU1-0] 72634.237052 s: E_busFault: IMPRECISERR: Delayed Bus Fault, exact addr unknown, address: e000ed38
    [HOST] [IPU1-0] 72634.237143 s:
    [HOST] [IPU1-0] 72634.237204 s: ### XDC ASSERT - ERROR CALLBACK END ###
    [HOST] [IPU1-0] 72634.237265 s:
    [HOST] [HOST ] 72637.616665 s: SYSTEM: System A15 Init in progress !!!
    [HOST] [HOST ] 72637.616726 s: SYSTEM: Waiting DSP1 Init SUCCESS !!!
    [HOST] [HOST ] 72637.616726 s: SYSTEM: Waiting DSP2 Init SUCCESS !!!
    [HOST] [HOST ] 72637.616757 s: SYSTEM: Waiting IPU2 Init SUCCESS !!!
    [HOST] [HOST ] 72639.812450 s: SYSTEM: Waiting to [IPU1-0] ... state = 65537
    [HOST] [HOST ] 72642.007442 s: SYSTEM: Waiting to [IPU1-0] ... state = 65537
    [HOST] [HOST ] 72644.173793 s: SYSTEM: Waiting to [IPU1-0] ... state = 65537
    [HOST] [HOST ] 72646.358903 s: SYSTEM: Waiting to [IPU1-0] ... state = 65537

     

    what i have done in thet souce code:

     

    Chains_main:

    Vps_printf(" CHAINS: Sv + Adas Sysbios Started !!!");

    gChains_usecaseCfg.numLvdsCh = 7;

    Vps_printf(" AppCtrl_setSystemL3DmmPri!");
    AppCtrl_setSystemL3DmmPri();                                     //add here

    //...

     

     

    Void AppCtrl_setSystemL3DmmPri()
    {
    /* Assert Mflag of DSS to give DSS highest priority */
    Utils_setDssMflagMode(UTILS_DSS_MFLAG_MODE_FORCE_ENABLE);

    /* enable usage of Mflag at DMM */
    Utils_setDmmMflagEmergencyEnable(TRUE);

    /* Set DMM as higest priority at DMM and EMIF */
    Utils_setDmmPri(UTILS_DMM_INITIATOR_ID_DSS, 0);

    Utils_setBWLimiter(UTILS_DMM_INITIATOR_ID_GPU_P1, 1000);
    Utils_setBWLimiter(UTILS_DMM_INITIATOR_ID_GPU_P2, 1000);

    Utils_setBWRegulator(UTILS_DMM_INITIATOR_ID_IVA, 1000);                                  //add here
    }

    please advise me how to do it.

     

  • Hi Wen,

    I suggest you to apply these changes using omapconf write command before strating the usecase

  • hi, RamPrasad

    it works after a little modification: 

    Chains_main:

    Utils_setAppInitState(SYSTEM_IPU_PROC_PRIMARY, CORE_APP_INITSTATUS_IPU_PRIMARY_ALL_DONE);

    status = System_linkControl(SYSTEM_LINK_ID_APP_CTRL,
    APP_CTRL_LINK_CMD_SET_DMM_PRIORITIES,
    NULL,
    0,
    TRUE);

    Vps_printf("SYSTEM_LINK_ID_APP_CTRL %d|", status);

    #ifdef IPU_PRIMARY_CORE_IPU1
    IPU1_0_main(NULL, NULL);

    status comes out 0 so it's sucecssed, there are no differrence about  performance compare to the very old one.

    regards, wen.

  • hi, RamPrasad

    in the first post, the SCI_MA_MPU_P2 is using to much emif bw:

    SCI_MA_MPU_P2 | 220.883907 1117.839188

    but i dont think we can use dmm to control the mpu with emif request(the device datasheet has a Memory Subsystem Overview which tells mpu is request emif directly with out DMM)

    it decode performance is related to the peek data of SCI_MA_MPU_P2 in Statistics Collector, the very last thing we can do is set the priority, am i right?

    regards, wen

  • hi, RamPrasad.

    i need to say sorry that i have send the wrong data at the first post(the decode is not working when the Statistics Collector is running), 

    please analyze this result instead:

    Statistics Collector,

    STATISTIC Avg Data Peak Data
    COLLECTOR MB/s MB/s
    --------------------------------------------------
    SCI_EMIF1 RD+WR | 566.365281 3090.439422
    SCI_EMIF2 RD+WR | 823.858270 2155.900998
    SCI_EMIF1 RD ONLY | 309.901695 1671.054464
    SCI_EMIF1 WR ONLY | 260.515843 1420.081511
    SCI_EMIF2 RD ONLY | 527.971667 3128.853649
    SCI_EMIF2 WR ONLY | 296.546599 1424.560207
    SCI_MA_MPU_P1 | 71.570719 302.704705
    SCI_MA_MPU_P2 | 183.343419 850.119303
    SCI_DSS | 401.901217 437.338080
    SCI_IPU1 | 33.107404 42.318894
    SCI_VIP1_P1 | 39.070339 51.558000
    SCI_VIP1_P2 | 57.687818 65.021202
    SCI_VPE_P1 | 136.401427 436.873507
    SCI_VPE_P2 | 136.411739 437.003035
    SCI_DSP1_MDMA | 5.430534 5.737301
    SCI_DSP1_EDMA | 0.000000 0.000000
    SCI_DSP2_MDMA | 5.564167 5.989887
    SCI_DSP2_EDMA | 0.000000 0.000000
    SCI_EVE1_TC0 | 0.000000 0.000000
    SCI_EVE1_TC1 | 0.000000 0.000000
    SCI_EVE2_TC0 | 0.000000 0.000000
    SCI_EVE2_TC1 | 0.000000 0.000000
    SCI_EDMA_TC0_RD | 0.003822 0.065946
    SCI_EDMA_TC0_WR | 0.003823 0.065946
    SCI_EDMA_TC1_RD | 0.000000 0.000000
    SCI_EDMA_TC1_WR | 0.000000 0.000000
    SCI_VIP2_P1 | 9.483616 43.096384
    SCI_VIP2_P2 | 21.902342 98.974537
    SCI_VIP3_P1 | 27.107120 153.236135
    SCI_VIP3_P2 | 34.161385 211.268457
    SCI_EVE3_TC0 | 0.000000 0.000000
    SCI_EVE3_TC1 | 0.000000 0.000000
    SCI_EVE4_TC0 | 0.000000 0.000000
    SCI_EVE4_TC1 | 0.000000 0.000000
    SCI_IVA | 200.005473 1374.099988
    SCI_GPU_P1 | 118.212548 1725.597023
    SCI_GPU_P2 | 118.672237 1732.792698
    SCI_GMAC_SW | 0.002630 0.174879
    SCI_OCMC_RAM1 | 0.000000 0.000000
    SCI_OCMC_RAM2 | 0.000000 0.000000
    SCI_OCMC_RAM3 | 0.000000 0.000000

     i cannot simply understand why GPU_P1&2 are so much high: 1725MB/S. after a little dig in the forum, i found:

    GPU is being limited to 1.3GBps with these two instructions.

    Utils_setBWLimiter(UTILS_DMM_INITIATOR_ID_GPU_P1, 1000);
    Utils_setBWLimiter(UTILS_DMM_INITIATOR_ID_GPU_P2, 1000);

    WHY?

     

    attachment for the details infomactions.Statistics Collector.txt

  • hi, RamPrasad

    any update here ?

  • Hi Wen,

    Utils_setBWLimiter(UTILS_DMM_INITIATOR_ID_GPU_P1, 1000);
    Utils_setBWLimiter(UTILS_DMM_INITIATOR_ID_GPU_P2, 1000);

    were added to limit GPU BW for a specific usecase. You can comment these two and allow GPU to go beyond 1GBPS.

    Can you try applying following settings one by one?

    This is to increase EMIF priority. This has worked for similar issue.

    1. No change of EMIF system port priority, increase priorities of IPU2, IVA and VPE
    omapconf write 0x4e000624 0x0A000000
    omapconf write 0x4e00062C 0x000000A0
    omapconf write 0x4e000630 0xA0000000


    2. No change of EMIF system port priority, increase priorities of IPU2, IVA, VPE, GPU and BB2D
    omapconf write 0x4e000624 0x0A000000
    omapconf write 0x4e00062C 0x000000A0
    omapconf write 0x4e000630 0xA0000000
    omapconf write 0x4e000634 0x00000BBB


    3. Increase EMIF system port priority, increase priorities of IPU2, IVA and VPE
    omapconf write 0x40D00040 0xC0000000
    omapconf write 0x4e000624 0x0A000000
    omapconf write 0x4e00062C 0x000000A0
    omapconf write 0x4e000630 0xA0000000


    4. Increase EMIF system port priority, increase priorities of IPU2, IVA, VPE, GPU and BB2D
    omapconf write 0x40D00040 0xC0000000
    omapconf write 0x4e000624 0x0A000000
    omapconf write 0x4e00062C 0x000000A0
    omapconf write 0x4e000630 0xA0000000
    omapconf write 0x4e000634 0x00000BBB

    Thanks

    RamPrasad

  • hi, RamPrasad

    i have done this by a debug command:

    Utils_setBWLimiter(UTILS_DMM_INITIATOR_ID_GPU_P1, 1000);
    Utils_setBWLimiter(UTILS_DMM_INITIATOR_ID_GPU_P2, 1000);

    Utils_setBWRegulator(UTILS_DMM_INITIATOR_ID_IVA, 800);

    but i can still observe that the IVA GPU go up to ~2GiB

    SCI_IVA | 204.034240 2003.259853
    SCI_GPU_P1 | 140.557736 2242.502267
    SCI_GPU_P2 | 141.078969 2248.949424

    i will update you later with above commands.

  • hi, RamPrasad

    this is what i was done when test your script:

    1. start the system, the usercase is running

    2. run the script you post(on-line)

    3. check the log and summarized, reboot the system

    this is what i got:

    1. 66065us was observed after a short test.

    2. 66065us was observed after a short test.

    3. 73385us was observed after a short test.

    3. 66065us was observed after a short test.

    please tell me if anything is wrong.

    regards, wen

  • hi, RamPrasad

    any ideas ?

    regards, wen

  • hi, RamPrasad

    it seems that the cpu loading is too high:

    LOAD: CPU: 95.4% HWI: 12.3%, SWI:26.1%

    LOAD: TSK: SYSTEM_MSGQ : 4.3%
    LOAD: TSK: DUP0 : 3.1%
    LOAD: TSK: DUP1 : 1.8%
    LOAD: TSK: DUP2 : 0.4%
    LOAD: TSK: DUP3 : 0.3%
    LOAD: TSK: IPC_OUT_0 : 3.0%
    LOAD: TSK: IPC_OUT_2 : 2.6%
    LOAD: TSK: IPC_OUT_3 : 4.7%
    LOAD: TSK: IPC_OUT_4 : 0.9%
    LOAD: TSK: IPC_OUT_5 : 1.0%
    LOAD: TSK: IPC_OUT_6 : 0.9%
    LOAD: TSK: MERGE0 : 2.2%
    LOAD: TSK: MERGE2 : 2.0%
    LOAD: TSK: MERGE3 : 1.6%
    LOAD: TSK: NULL_SRC2 : 0.2%
    LOAD: TSK: SELECT0 : 1.6%
    LOAD: TSK: SYNC0 : 1.8%
    LOAD: TSK: SYNC1 : 1.5%
    LOAD: TSK: DISPLAY0 : 1.1%
    LOAD: TSK: ENC0 : 2.9%
    LOAD: TSK: DEC0 : 0.7%
    LOAD: TSK: CAPTURE : 2.1%
    LOAD: TSK: VPE0 : 4.9%
    LOAD: TSK: SWOSD_0 : 5.8%
    LOAD: TSK: DEC_PROCESS_TSK_0 : 1.5%
    LOAD: TSK: ENC_PROCESS_TSK_0 : 1.9%
    LOAD: TSK: STAT_COLL : 0.6%
    LOAD: TSK: MISC : 1.6%

    TSK:

    4.3+3.1+1.8+0.4+0.3+3.0+2.6+4.7 +0.9 +1.0 +0.9 +2.2 +2.0 +1.6 +0.1 +1.6 +1.8 +1.5 +1.1 +2.9 +0.7 +2.1 +4.9 +5.8 +1.5 +1.9 +0.6 +1.7 = 57

    HWI+SWI+TSK:

    12.3 + 26.0 + 57 = 95.3

    any ideas? can we cut done the SWI?

  • hi, RamPrasad,

    i have optimize the system and the LOAD is : CPU: 78.3% HWI: 10.9%, SWI:20.3%

    but i can still observe 40ms sometime.

    please advise!!

    regards, wen

  • hi, RamPrasad,

    i have also checked on encode(running with decode at the sametime) performance with 720p@25, and it doesnot beyond 10ms per frame.

    any update on thread?
  • hi, RamPrasad,

    any update ?
    regards, wen
  • Hi Wen,

    Do you see this issue of decoder taking >40ms with default use case of NullSrc->Decode->VPE->Display usecase? or only with your complex chain-link usecase?

    I haven't seen the issue for 1920x1080 streams also. 

    Is there a way to bypass encode part in your link and test the usecase and get the decode time?

    As there is only IVAHD, encoding and decoding will happen in sequence, so decoding time may be involving waiting time for IVAHD. To confirm this please check if you can bypass encode.

    Thanks

    RamPrasad

  • hi, RamPrasad,

    it's good to hear from you.

    decoder taking >40ms is observed only with my complex chain-link usecas.

    by the way, i have optimized the cpu load to 70% and the decoder take about 10-30ms most of frames(not every frame).

    i have confimed that the encoder is disabled when decode h264 which takes 10-40ms each frame.

    this is what i have tried:

    1. trid decode h264 frm which cames from j6 h264enc, but nothing changed.

    regards, wen
  • hi, RamPrasad,

    sorry for the long-delay. i have successfully created a simple use case that you meantioned in the previous post:

    UseCase: chains_decVpeDisplay(the VPE link drops all the data from the previous link)

    NullSource_dec -> Decode -> VPE -> IPCOut_Capture

    after a little while, i found that the cose is <8113us which is quite good.

    what should i do to shot the reason why i got big cose with the complex use case ?

    this post has also told about DDR bandwidth when vpe is in de-interlacing mode,should i check it?

    regards, wen

  • Hi Wen,

    Yes 8ms decode time is correct for a 720p resolution. As I mentioned earlier, your usecase involves multiple IPs with multiple tasks and causing IVAHD to content for DDR and this is resulting in performance drop.

    Unfortunately you need to do multiple trial and error experiments to adjust Bandwidth regulation and limitation + Prioritization (as mentioned in the appNote I pointed).

    Thanks

    RamPrasad

  • hi, RamPrasad,

    i understand there will be a long shot to find the correct regulation & prioritizations configuration, and i will start dig more with your guidance.

    but now, i'd like to find data evidence that the root cause is the DDR Bandwidth restriction, and, for you information, the result of 

    M4 statCollector cannot tell the cause in the original post(maybe i had omitted something & wrong configuration).

    many thanks.

    regards, wen