This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to do profiling with video codec library on C6678?

Hi expert, 

      We may need to do some function level code performance profiling with video codecs(h264HP encoder) on C6678.

 We have try to run it on the  "Cycle Approximate Simulator" for using the profiler shipped with CCSv5.3.  But the test

application will always fall into The __TI_decompress_rle_core and never reach the main function. So How can we get

the profiling in the simulator to get to know where is the critical part of the library? 

Thanks a lot!

 B.R.

Sunzhao

  • Hi Sunzhao,

    Before running to main, there can be sections which need to be initialized. That can take long time if the sections are huge. You can modify your linker command file to add "type = NOINIT" to the sections when appropriate. One example: .codecScratch> DDR_CACHED PAGE 0 type = NOINIT.

    Also please note that the speed with Cycle Approximate Simulator is usually slow. Another way to do profiling is using TSCL in the code, recompile, and then run on real target. The cycle numbers from TSCL can be recorded in arrays and then saved through CCS.

    Example:

    TSCL_Begin = TSCL;

    process_call();

    TSCL_End   = TSCL;

    Cycle numbers taken by the process_call will be (TSCL_End - TSCL_Begin).

    Thanks,

    Hongmei

  • Hi Sunzhao,

    Sorry I missed the initialization of TSCL when using it for profiling. Please add "TSCL=0" at the beginning of your application so that TSCL can be initialized and then used later. Also TSCL can be declared as "extern cregister volatile unsigned int TSCL;".

    Thanks,

    Hongmei

  • Hi Hongmei, 

            Thanks for the reply.

            We may want to do function level profiling and then list the function by sequence so that we can find the critical

    part of the code.  The method which insert "TSCL" in the code may be helpless because it is hard to insert "TSCL"

    to every funciton in the code.  I am more willing to do profiling on the Cycle simulator.  But can you give some addvice

    on how to find out which section takes so many time for the simulator to initialize. Because  I have add .type = "NOINIT"

    behind each large section in the .cfg file. I paste the .cfg file and the map file here. Please help me check it.

    BTW. Is it possible to accelerate the Cycle simulator?

    Thanks!

    0755.h264hpvenc.cfg

    3750.h264hpvenc_ti_c66x.map.txt

    Sunzhao 

  • Hi Sunzhao,

    Thanks for the files. From your map file, it looks like there are still huge sections which need to be initialized:

    SEGMENT ALLOCATION MAP

    run origin load origin length init length attrs members
    ---------- ----------- ---------- ----------- ----- -------

    ...

    80000000 80000000 1901250d 1901250d rw- .external_cached_mem

    ...

    9d3ee700    9d3ee700    00cc7800   00cc7800    rw- .outPutBuffMem

    In your .cfg file, there is

    Program.sectMap[".external_cached_mem"].type = "NOINIT"  

    It should be Program.sectMap[".external_cached_mem"].type = "NOINIT"We may be missing ";" for all the lines you are trying to add NOINIT. Can you please modify your .cfg file accordingly and recheck the map file? With correct NOINIT, "init length" should be 0 for the corresponding section.

    Thanks,

    Hongmei

  • Hongmei,

             I do add ";" in the cfg file under your directions. But the map file is almost the same.

    And I find one strange thing in the map file. With the same expressions in the cfg file(have or have not ";"),

     the init length of external_cached_mem section is always 1901250d , But the inputbuffer_mem

    and the shared_mem_DDR2 is always zero.  It is really strange because I can not find other place

    to set the section attribute excepte the cfg file. Can you help check this issue and work out how to

    resolve it?

    Thanks a lot!

    Sunzhao

  • Hi Sunzhao,

    There should be a linker command file which is generated from the cfg file. Can you please provide that?

    Also, can you please provide the source file(s) which define the globals placed in sections .outPutBuffMem and .external_cached_mem?

    Thanks,

    Hongmei

  • Hi Hongmei,

          The generated .cmd file and the whole project is attached here. 7128.linker.rar

    5270.Client.rar

    Sunzhao

  • Hi Sunzhao,

    Thanks for the files. We looked into the two huge sections: .outPutBuffMem and .external_cached_mem. It looks like some globals in these two sections are defined with initialized values. That is why the "init length" is not 0 for these two sections.

    .outPutBuffMem

    XDAS_Int32  nalSizes[IVIDMC_TI_MAXCORES][8160] = {0};

    .external_cached_mem

    unsigned char *pInternalDataMemory = internalDataMemory;
    unsigned int internalDataMemorySize = INTERNAL_DATA_MEM_SIZE;
    XDAS_Int8 read_core_team_once = 1;
    XDAS_Int32 CORE_TEAM_MAPPING[IVIDMC_TI_MAXCORES] =
    {
    IVIDMC_CORE_NOT_USED, IVIDMC_CORE_NOT_USED, IVIDMC_CORE_NOT_USED,
    IVIDMC_CORE_NOT_USED, IVIDMC_CORE_NOT_USED, IVIDMC_CORE_NOT_USED,
    IVIDMC_CORE_NOT_USED, IVIDMC_CORE_NOT_USED
    };
    volatile XDAS_Int8 set_init_once = 1;
    volatile XDAS_Int8 init_completed = 0;

    So, in order to get 0 "init length", please initialize the corresponding globals in the code instead. Please also check the other sections which are appropriate to do NOINIT.

    Using .outPutBuffMem as the example, if we define "XDAS_Int32  nalSizes[IVIDMC_TI_MAXCORES][8160];" without initialize value, we can get the following in the map file:

    run origin load origin length init length attrs members
    ---------- ----------- ---------- ----------- ----- -------

    ......

    9d3ee700 9d3ee700 00cc7800 00000000 rw-
    9d3ee700 9d3ee700 00cc7800 00000000 rw- .outPutBuffMem

    Thanks,

    Hongmei

  • Hi Hongmei,

           Thank you for your suggestion.

           Although I cut down the initial size in the two huge sections, My application still can not enter the main

    function under the cycle simulator.  The generated .map file is attached here. The SP is always in

     function __TI_zero_init.  It seems that the cycle simulator is still 4571.h264hpvenc_ti_c66x.rar too slow to run...

           So do you have some suggestion on how to list function by sequence according to the cycle consumed,

    And how I can find the critical part in my application?  It is important for deciding the priority of the optimization.

    Thanks !

    Sunzhao

  • Hi Sunzhao,

    It looks like the "NOINIT" added is .cfg file is not taking effect. One workaround is creating an application linker command file (with the following content below), placing it at the same place you have h264hpvenc.cfg, commenting out  these sections in h264hpvenc.cfg, and rebuilding the application.

    SECTIONS{
      .external_cached_mem > ERAM type = NOINIT
      .inputbuffer_mem > ERAM type = NOINIT
      .outPutBuffMem > ERAM type = NOINIT
      .shared_mem_DDR2 > ERAM type = NOINIT
    }

    With the above change, we have verified that the application can run to main() in CA simulator. Please give it a try and let us know if there are still issues. Meanwhile, I am following up with the issue when adding NOINIT in .cfg.

    Thanks,

    Hongmei

  • Hongmei and Sunzhao,
    the right way to use Program.sectMap is
    Program.sectMap["outPutBuffMem"] = new Program.SectionSpec();
    Program.sectMap["outPutBuffMem"].loadSegment = "ERAM";
    Program.sectMap["outPutBuffMem"].type = "NOINIT";

    Program.sectMap entries must be SectionSpec instances, if a user wants to define a type, different load and run addresses, etc for sections. Initially, Program.sectMap entries were String instances, and a user could only define a name of the memory object to which the section should be allocated. At the time when we introduced the type SectionSpec, we had too many scripts using Strings and we didn't want to break them so we disabled type checking for sectMap entries.
    The script code from this thread is assigning the value "NOINIT" to a property "type" of a String, but when our code recognizes a String, we only read the textual content of the string and ignore any other properties that might be attached to it. 

    We can do more to detect this issue and I hope we'll have some deprecation notice to warn users to switch to using SectionSpec entries.