This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Multi-DSP HEVC encoder hangs.

Hi,

HEVC encoder (1.0.0.44) works good for me on single DSP doing live stream for few days without any issues. But it doesn't work on multiple DSPs - produces wrong HEVC video and hangs after few process calls.

Here you can find logs and output (MPEG-TS stream) of HEVC running on 16 cores:

drive.google.com/.../view

Notes about the implementation:

- I implemented all the synchronization primitives reqested throug multicore API and required for multi-DSP algorithm
- I tested all the shared between both DSPs buffers - they really point to the same memory but addresses on different DSPs are different because of different OB registers usage.
- Output buffer is also the same shared between both DSPs.
- In logs you can see all keyCreate callbacks and their results.
- In logs you can see beginning of each process call: process inputID=1 outBuf=@64b9cbc4
- In logs you can see all shm sync callbacks.
- In logs you can see all read and write mailbox callbacks.

Questions:

1) Do you see any problems that cause the hang of encoder?
2) Why there is no shm sync callbacks that sync shared between DSP memory (DDR_CACHED)?
3) Mailbox write initates interrupt on another DSP. This interrupt callback terminates execution of process call task on aonther DSP and queries\reads mailboxes. Is any additional synchronization between process call taks and interrupt callback required?

Regards,
Andrey Lisnevich

  • Hi Andrey,

    We are analyzing the log file. Can you please try encoding multi-tile on multi-chip, by making enableTiles flag to 1. Where in multi-tile scenerio interchip commuinication will be less.

    Can you please share the elementary stream, instead of MPEG-TS format.

    1) By description wise setup looks fine.
    2) Communication between adjacent chips done through inter-chip memory and data is updated through DMA so syncing is not required. And other memory used for communication is remote un-cached memory, where sync callbacks is not required.
    3) No additional synchronization is required. Interrupt are not used in tile encoding. So Tile encoding is better, before moving to single tile encoding.

    Regards

    Kuladeepak

  • Hi Kuladeepak,

    With tiles enabled encoder hangs in first process call.

    enableTiles = 1;
    numTileColumns = 1;
    numTileRows = 2;

    Logs: drive.google.com/.../view

    core#0 waits in barrier for core#8 (i.e. core#0 of DSP1)
    cores#1-7 done with process call.
    cores#8-15 wait for something forever by acquire/release lock.

    Also I have another issue, probably related to this one - artifacts even on single DSP when tiles are enabled - e2e.ti.com/.../406319

    Regards,
    Andrey Lisnevich
  • Hi Andrew,

    It would be better to take newer version of encoder.

    Can you please share the config.

    Regards

    Kuladeepak
  • Hi Kuladeepak,

    HEVC 01.00.00.44 is the latest officially released encoder.

    The configuration:

    params.videnc2Params.encodingPreset= XDM_USER_DEFINED;
    params.videnc2Params.rateControlPreset = IVIDEO_STORAGE;
    params.videnc2Params.maxWidth = 704;
    params.videnc2Params.maxHeight = 576;

    params.videnc2Params.dataEndianness = XDM_BYTE;
    params.videnc2Params.maxInterFrameInterval = 1;
    params.videnc2Params.maxBitRate= 600000;
    params.videnc2Params.minBitRate= 600000;
    params.videnc2Params.inputChromaFormat = XDM_YUV_420P;
    params.videnc2Params.inputContentType = IVIDEO_PROGRESSIVE;
    params.videnc2Params.operatingMode = IVIDEO_ENCODE_ONLY;
    params.videnc2Params.profile = IH265_MAIN_PROFILE;
    params.videnc2Params.level = IH265_LEVEL_41;
    params.videnc2Params.inputDataMode = IVIDEO_ENTIREFRAME;
    params.videnc2Params.outputDataMode= IVIDEO_ENTIREFRAME;
    params.videnc2Params.numInputDataUnits = 1;
    params.videnc2Params.numOutputDataUnits= 1;

    int i;
    for (i = 0 ; i < IVIDEO_MAX_NUM_METADATA_PLANES; i++) {
    videnc2Params.metadataType[i] = IVIDEO_METADATAPLANE_NONE;
    }

    params.scalingMatrixPreset = IH265_SCALINGMATRIXPRESET_DEFAULT;
    params.decRefreshType = 0;
    params.decRefreshInterval = 1;
    params.enableTransQuantBypass = 0;
    params.maxPoc = 256;
    params.enableTransformSkip = 0;
    params.maxIntraFrameInterval = 120;
    params.enableWPP = 0;
    params.maxNumRefFrames = 1;
    params.enableVirtualTile = 0;
    params.debugTraceLevel = 0;
    params.lastNFramesToLog = 0;

    params.rateControlParams.rateControlParamsPreset = IH265_RATECONTROLPARAMS_USERDEFINED;
    params.rateControlParams.rcAlgo = 0;
    params.rateControlParams.qpI = -1;
    params.rateControlParams.qpMaxI = 40;
    params.rateControlParams.qpMinI = 12;
    params.rateControlParams.qpP = 28;
    params.rateControlParams.qpMaxP = 51;
    params.rateControlParams.qpMinP = 12;
    params.rateControlParams.qpOffsetB = 4;
    params.rateControlParams.qpMaxB = 51;
    params.rateControlParams.qpMinB = 12;
    params.rateControlParams.enableFrameSkip = 0;
    params.rateControlParams.enablePartialFrameSkip = 0;
    params.rateControlParams.qualityFactorIP = 0;
    params.rateControlParams.cbQPIndexOffset = 2;
    params.rateControlParams.crQPIndexOffset = 2;
    params.rateControlParams.initialBufferLevel = 1200000;
    params.rateControlParams.hrdBufferSize = 1200000;
    params.rateControlParams.enableHRDComplianceMode = 0;
    params.rateControlParams.maxFrameSkipCnt = 0;
    params.rateControlParams.SubFrameRC = 1;
    params.rateControlParams.maxDeltaQP = 0;
    params.rateControlParams.enablePRC = 0;

    params.loopFilterParams.loopFilterParamsPreset = IH265_SLICECODINGPRESET_USERDEFINED;
    params.loopFilterParams.enableDeblockFilter = 1;
    params.loopFilterParams.enableSaoFilter = 1;
    params.loopFilterParams.enableLoopFilterSliceBoundary = 0;
    params.loopFilterParams.enableLoopFilterTileBoundary = 0;
    params.loopFilterParams.separateCbCrSAO = 0;
    params.loopFilterParams.offsetLoopFilterInPPSFlag = 0;
    params.loopFilterParams.offsetDeblockBetaDiv2 = 0;
    params.loopFilterParams.offsetDeblockTcDiv2 = 0;

    params.gopCntrlParams.gopCntrlParamsPreset = IH265_GOPCTRLPRESET_DEFAULT;

    params.sliceCodingParams.sliceCodingPreset = IH265_SLICECODINGPRESET_USERDEFINED;
    params.sliceCodingParams.sliceCodingMode = 0;
    params.sliceCodingParams.sliceCodingArg = 0;
    params.sliceCodingParams.enableTiles = 1;
    params.sliceCodingParams.numTileColumns = 1;
    params.sliceCodingParams.numTileRows = 2;
    params.sliceCodingParams.enableDependentSlice = 0;

    params.intraCodingParams.intraCodingPreset = IH265_INTRACODINGPRESET_USERDEFINED;
    params.intraCodingParams.intraRefreshMethod = 0;
    params.intraCodingParams.intraRefreshRate = 0;
    params.intraCodingParams.constrainedIntraPredEnable = 0;
    params.intraCodingParams.enableStrongIntraSmoothing = 1;
    params.intraCodingParams.matchYCbCrIntraMode = 0;
    params.intraCodingParams.enableLumaIntra4x4Mode = 0;
    params.intraCodingParams.enableLumaIntra8x8Mode = 0;
    params.intraCodingParams.enableLumaIntra16x16Mode = 0;
    params.intraCodingParams.enableLumaIntra32x32Mode = 0;
    params.intraCodingParams.enableChromaIntra4x4Mode = 0;
    params.intraCodingParams.enableChromaIntra8x8Mode = 0;
    params.intraCodingParams.enableChromaIntra16x16Mode = 0;

    params.interCodingParams.interCodingPreset = IH265_INTERCODINGPRESET_USERDEFINED;
    params.interCodingParams.enableTmvp = 0;
    params.interCodingParams.searchRangeHorP = 144;
    params.interCodingParams.searchRangeVerP = 32;
    params.interCodingParams.searchRangeHorB = 144;
    params.interCodingParams.searchRangeVerB = 32;
    params.interCodingParams.interCodingBias = 0;
    params.interCodingParams.skipMVCodingBias = 0;
    params.interCodingParams.numMergeCandidates = 3;
    params.interCodingParams.enableBiPredMode = 0;
    params.interCodingParams.enableFastIntraAlgo = 1;

    params.vuiCodingParams.vuiCodingPreset = IH265_VUICODINGPRESET_DEFAULT;
    params.vuiCodingParams.aspectRatioInfoPresentFlag = 1;
    params.vuiCodingParams.aspectRatioIdc = IH265_ASPECTRATIO_EXTENDED;
    params.vuiCodingParams.videoSignalTypePresentFlag = 0;
    params.vuiCodingParams.videoFormat = 0;
    params.vuiCodingParams.videoFullRangeFlag = 0;
    params.vuiCodingParams.colourDescriptionPresentFlag = 0;
    params.vuiCodingParams.colourPrimaries = 0;
    params.vuiCodingParams.transferCharacteristics = 0;
    params.vuiCodingParams.matrixCoefficients = 0;
    params.vuiCodingParams.timingInfoPresentFlag = 0;

    params.seiParams.enableSeiFlag = 0;

    params.ctbCodingParams.maxCTBSize = 64;
    params.ctbCodingParams.maxCUDepth = 3;

    dynamicParams.videnc2DynamicParams.forceFrame = IVIDEO_NA_FRAME;
    dynamicParams.videnc2DynamicParams.generateHeader = XDM_ENCODE_AU;
    dynamicParams.videnc2DynamicParams.ignoreOutbufSizeFlag = XDAS_FALSE;
    dynamicParams.videnc2DynamicParams.inputWidth = 704;
    dynamicParams.videnc2DynamicParams.inputHeight = 576;
    dynamicParams.videnc2DynamicParams.interFrameInterval = 1;
    dynamicParams.videnc2DynamicParams.intraFrameInterval = 120;
    dynamicParams.videnc2DynamicParams.mvAccuracy = IVIDENC2_MOTIONVECTOR_QUARTERPEL;
    dynamicParams.videnc2DynamicParams.putDataFxn = NULL;
    dynamicParams.videnc2DynamicParams.putDataHandle = 0;
    dynamicParams.videnc2DynamicParams.getDataFxn = NULL;
    dynamicParams.videnc2DynamicParams.getDataHandle = 0;
    dynamicParams.videnc2DynamicParams.getBufferFxn = NULL;
    dynamicParams.videnc2DynamicParams.getBufferHandle = 0;
    dynamicParams.videnc2DynamicParams.refFrameRate = 25000;
    dynamicParams.videnc2DynamicParams.targetFrameRate = 25000;
    dynamicParams.videnc2DynamicParams.sampleAspectRatioWidth = 1;
    dynamicParams.videnc2DynamicParams.sampleAspectRatioHeight = 1;
    dynamicParams.videnc2DynamicParams.targetBitRate = 600000;

    memcpy(&dynamicParams.rateControlParams, &params.rateControlParams, sizeof(params.rateControlParams));
    memcpy(&dynamicParams.loopFilterParams, &params.loopFilterParams, sizeof(params.loopFilterParams));
    memcpy(&dynamicParams.intraCodingParams, &params.intraCodingParams, sizeof(params.intraCodingParams));
    memcpy(&dynamicParams.interCodingParams, &params.interCodingParams, sizeof(params.interCodingParams));
    memcpy(&dynamicParams.ctbCodingParams, &params.ctbCodingParams, sizeof(params.ctbCodingParams));
    memcpy(&dynamicParams.sliceCodingParams, &params.sliceCodingParams, sizeof(params.sliceCodingParams));

    dynamicParams.enableTransQuantBypass = params.enableTransQuantBypass;
    dynamicParams.enableTransformSkip = params.enableTransformSkip;
    dynamicParams.enableROI = 0;

    Regards,
    Andrey Lisnevich

  • Hi Kuladeepak,

    Do you have any news on this?
    Do you see from logs why encoder hangs?

    Regards,
    Andrey Lisnevich
  • Hi,

    I've finally integrated HEVC 2.0.0.0 encoder but still having issues with it running on multiple DSPs.

    I am trying to run HEVC encoder 704x576 on 2 DSPs with 2 tiles enabled:

              enableTiles = 1;
              numTileColumns = 2;
              numTileRows = 1;
              subFrameRC = 0;

    But it produces wrong output and with time hangs. Sample of output (raw HEVC bitstream): 3730.out.123

    At the same time single DSP configuration that works on the same code base produces correct bitstream.

    I do validation of input buffers to be the same on all cores and DSPs by computing checksum on each core.

    I do validation that output bufer located in x86 memory is the same on all cores and DSPs. Address of output buffer is different on different DSPs because of different OB register mappings but I do validation that it points to the same x86 memory on all the DSPs.

    Addresses of RMT* shm memory blocks are different on different DSPs because of different OB register mappings but I do validation that they point to the same x86 memory on all the DSPs.

    Also I did validation of other multi DSP (barrier, shm) and single DSP (barrier, shm, lock) keys and functionality.

    Logs with details: 1346.logs.zip

    I failed to find the reason. Can you please help to find out what is wrong?


    Regards,

    Andrey Lisnevich

  • Hi Kuladeepak,

    HEVC 2.0.0.0 is integrated but it still doesn't work for me correctly when runs on multiple DSPs. Details are in the post above.
    Can you please help to find out what is wrong?

    Regards,
    Andrey Lisnevich
  • Hi Andrey,
    After analysing bitstream. It looks like other(Chip-1) data is not updated in the output buffer.
    In multichip and multiTile scenorio, At each chip bitstreams will be stored in local DDR. After emulation also, bitstream is stored in local DDR. local master will DMA the bitstream to X86 output memory.

    Can you check the bitstream at these location at the end of the process.
    shmem name = "shared_mem_bitstreamXX" -> Each chip will update the bitream in this memory.
    shmem name = "shared_mem_SwappedStreamXX" -> After Emulation Each chip will update the bitream in this memory.

    And bitstream offset for each is updated in RMT_Uncached memory by each chip. Bitstream is transfered from these offset.
    shmem name = "shared_mem_chip2chip" -> RMT_Uncached memory where each chip will update the bitstream size. If we look at the log.
    This memory address is different for chip, but it should be same.

    Regards
    Kuladeepak

  • Hi Kuladeepak,

    I modified code in the way so OB register mapping is the same on Chip-0 and Chip-1. Now I get different result that is not correct but looks better: 4237.hevc.123

    So I already may assume that current HEVC encoder requires output x86 buffer to be mapped to the same address (i.e. same OB register) on all DSPs. That in turn not correct and makes much harder dynamic x86 buffer management for apps that dynamicaly create and delete encoders.

    Can you confirm this?

    Regards,

    Andrey Lisnevich

  • Hi Kuladeepak,

    This is more detailed logs with dumps of shared_mem_chip2chip, shared_mem_bitstream, shared_mem_SwappedStream and output buffer after each process call on cores#0 of each DSP.

    Also it contains logs of all keyCreate calls with details. Before each process call it prints in and out buffers. For in buffers it computes CRC32 to ensure that the input is the same.

    Generated HEVC bitstream is also in the archive.

    Do you see the problems there?


    6523.hevc_dump.zip

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

        Yes. We need output x86 buffer to be mapped to the same address.

    Regards

    Kuladepak

  • Hi Kuladepak,

    1) Will this limitation be removed in release version?

    I believe it is rather easy to fix in encoder and it will make easier for dynamic applications to use OB registers on DSPs. For example imagine transcoding cluster with load balancer. If you need to run HEVC on 8 DSPs balancer have to find 8 DSPs with the same available OB registers. It makes logic of such balancer much harder. And you need to create custom driver for that, because your default driver doesn't let you choose specific OB registers - it allocates OBs one by one.

    2) And even when x86 buffer is mapped to the same address I get incorrect output. Can you please check the logs with dumps above?

    3) What other addresses should be the same? I hope that addresses of input buffers can be different.

    Thanks in advance,
    Andrey Lisnevich

  • Hi Andrey,
       I didn't find any issue in the memory allocations.
    And if you observe the shared bitstreams is decodable without error. That rule out any bistream issue.

    For more debuging, I need your testapp for 2 chip. Without that i am not finding any concerns.
    Can you please share the MultiChip Testapp solution.

    Regards
    Kuladeepak

  • Hi Kuladeepak,

    Do you have answer on this two questions:


    1) Should input buffer addresses be the same as output buffer address?

    2) Can I hope that in next releases limitation on the same output buffer address on all chips will be removed?

    Regarding demo, I will try to prepare it. Is it Ok if demo runs on Linux and uses custom driver based on Advantech Lightning (not CMEM)?

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

    1) Yes Input buffer address also needs to be in similar way as output buffer address.
    2) As of now we don't have plans of changing the buffer address manner. We need to analyse what are the changes required in codec with new buffer management design.

    Even TI's MCSDK runs on linux and not sure with Advantech Lighting. We can work it out.

    Regards
    Kuladeepak

  • Hi Kuladeepak,

    I created demo:  

    Instructions how to build and run are in README file. Feel free to ask any questions.

    Regards,

    Andrey Lisnevich

  • Hi Andrey Lisnevich,

    When I tried to build the host using with option -DDESKTOP_LINUX_SDK_PATH, make command is unable to identify the option "DDESKTOP_LINUX_SDK_PATH" . (make: invalid option -D).

    Please let me know if there is any prerequisite to run make command.

    Thanks and Regards,

    Palachandra M V

  • Hi,

    My fault. Run cmake and then make:

    3) Run CMake with path to your Desktop Linux SDK:
    cmake -DCMAKE_BUILD_TYPE=Debug -DDESKTOP_LINUX_SDK_PATH=/path/to/desktop/linux/sdk .
    4) Run make:
    make

    Regards,
    Andrey Lisnevich
  • Hi Andrey,

    I have few queries regarding exection of hevc demo solution,

    Execution of hevc demo on single chip:
    1. When I tried to build and run the solution using single chip (hevc_demo 1) as suggested in readme file , there was an error and warning message displayed on the console
    Error message "Input error: DSP input failed"
    Warning message "BFD: dsp.out: warning: sh_link not set for section `.c6xabi.exidx'"

    2.Error message logged in one of log file console-0-7.log
    ERROR: [0x0 0] Raw video input cyclic buffer error (incorrect header size)

    3. Can you please let me know whether I am missing out any dependency.

    Execution of solution using ccs:
    1. To run the code in ccs we reset all the cores followed by download the out file before running the solution, can you please let me know where actually out file in downloaded to cores.

    Thanks and Regards,
    Palahandra M V

  • Hi,

    BFD: dsp.out: warning: sh_link not set for section `.c6xabi.exidx' -- this is bfd library warning. Do not mention it. Not only my .out file experience the issue.

    This demo doesn't require to run code through CCS. It is hardcoded to load dsp.out file to DPSs using Desktop Linux SDK libraries\driver. Is it ok?

    In host/bin there is already pre-built dsp.out file. But you can build your own dsp.out file by importing and building CCS project in 'dsp' directory.

    If you see:
    ERROR: [0x0 0] Raw video input cyclic buffer error (incorrect header size)

    It means that you run application correctly, I/O buffers were configured, you can see console output from DSP and application works and generates out.265 file. But something went wrong (probably you did attempt to run additionally something through CCS on DSP while application was already running on DSP).

    Can you share full log and let me know if out.265 is generated.

    Regards,
    Andrey Lisnevich
  • Hi Andrey,

    Execution of hevc demo on single chip:

    I Tried running the demo using default out file provided with the solution and with out file build using ccs, in both scenario out.265 generated with zero file size followed by error message in log file console-0-7.log. Attachiing log files for reference.

    Execution of solution using ccs:

    To carry out step by step execution we reset the cores followed by download solution to cores and then run solution on ccs, can you please let me know steps to carry out debugging using demo solution.

    Thanks and Regards,

    Palachandra M V

    7266.logs.7z

  • Hi Palachandra,

    I gou your point and will try to prepare solution that runs on CCS.

    Regarding logs - I do not see why it fails. 8MiB mapped buffer for communication with each DSP (commands and consoles) works correct: you can see logs from DSP how encoder and raw video input components are created. But for some reason communication through 8MiB YUV input buffer fails. I will double check this.

    Regards,
    Andrey Lisnevich
  • Hi Palachandra,

    You can find updated demo by the URL: drive.google.com/.../view

    Both DSP and host applications slightly updated to be able to run and debug code through CCS.

    README file contains updated instructions. Just run "./hevc_demo 1" for single DSP mode or "./hevc_demo 2" for two DSP mode and then run DSP application on all the DSP (i.e. 4 DSP in case of DSPC-8681) using CCS.

    DSP logs are duplicated in console-?-?.log and in CCS console output.
    out.265 contains generated HEVC bitstream. It contains artifacts in case of two DSP scenario.

    Regards,
    Andrey Lisnevich
  • Hi Andrey,

    When I run the updated hevc demo application on ccs, I am able to get logs on ccs console and in (.log) file.


    In single chip scenario, out.265 is created with zero size and with an error in [core 7].
    - Error Message: "Raw video input cyclic buffer error (incorrect header size)".


    In two chip scenario the same error message appears in [core7] and [core 15].


    I am using DSPC-8682 to run demo application.


    Attaching the log files and solution, can you please help me to resolve the error.


    Can you please share the out.265 file generated with one chip and 2 chip scenario.

    Thanks and Regards,7725.logs.7z

    Palachandra M V

  • Hi Palachandra,

    I tested the demo with your dsp.out - it work ok.

    Please check the video how I 1) run your dsp.out and 2) debug CCS project

    drive.google.com/.../view

    It is better to download video and then watch it for better quality.

    Regards,
    Andrey Lisnevich

  • Hi Andrey,

    Thanks you for sharing the video clip, The process we both follow to load the solution into ccs are identical.

    It looks like there is some issue with structure alignment in my scenario.

    Ideally, if I sum up all the elements of structure InputPacketHeader it is equal to 48.

    Since one of the structure element inputPacketHeader.pts is of type int64_t (long long), structure element is aligning to 8 byte boundary, hence sizeof(inputPacketHeader) is equal to 56

    - &(inputPacketHeader.marker) = 0x008489A0

    - &(inputPacketHeader.pts) = 0x008489A8 (Ideally it should be 0x008489A4 since inputPacketHeader.marker is of type uint32_t)

    Since bytesRead from CyclicBuffer_getMessage is returning 48 and sizeof(inputPacketHeader) is 56, "Error Messsage : incorrect header size" is observed.

    Due to 4 byte structure alignment shift, all the values from address &(inputPacketHeader.pts) got shift by 4 bytes, after memcpy in function CyclicBuffer_getMessage.

    Attaching the screen shot for reference, at the right corner of the image memory address of inputPacketHeader elements and (inputPacketHeader.parameters) values are displayed

    Can you please suggest me any change to correct the alignment for structure inputPacketHeader.

    Since you are observing the artifact in two chip scenario, please share us the out.265 file generated with one chip and two chip scenario

    Thanks and Regards,

    Palachandra M V

  • Hi Palachandra,

    Yes it looks like structure members alignment done by c66x compiler and your host compiler is different. Strange, it was always the same for me. I use 64bit Ubuntu.

    This patch should fix the issue:

    typedef struct {

       uint32_t marker __attribute__((aligned(8)));

       int64_t pts __attribute__((aligned(8)));

       int32_t size __attribute__((aligned(8)));

       VideoStreamParameters parameters __attribute__((aligned(8)));

    } InputPacketHeader;

    typedef struct {

       int32_t frameType __attribute__((aligned(8)));

       int32_t size __attribute__((aligned(8)));

       int64_t pts __attribute__((aligned(8)));

       int64_t dts __attribute__((aligned(8)));

    } OutputPacketHeader;

    Both GCC and c66x should support aligned attribute.

    This is output HEVC elementary streams of 1 and 2 DSP scenarios:

    7851.out.265.zip

    Regards,

    Andrey Lisnevich

  • Hi Andrey,

    We are able to reproduce the artifact with your demo setup in 2 chip scenario with older library version.

    It looks like the above issue is addressed in latest release version.

    Please find the attached latest release library.

    Please let me know if artifact persist in the latest release version.

    7587.h265venc_ti.7z

    Thanks and Regards,

    Palachandra M V

  • Hi Palachandra,

    I am still integrating and tesing the updated library. No artifacts now.

    But not all configurations work. When I try to run the library on 4 DSPs without tiles enabled I get conflicts in names of multi-DSP keys:

    DSP0:
    keyCreate shmem name=shared_mem_CABAC_Context03 user_id=0 num_users=2 user_ids=0,24 type=DDR_CACHED size=1600 alignment=128

    DSP3:
    keyCreate shmem name=shared_mem_CABAC_Context03 user_id=8 num_users=2 user_ids=8,16 type=DDR_CACHED size=1600 alignment=128

    As you see different multi-DSP key is created with the same name. Since multi-DSP keys registry is common for all DSPs (as it should be) I get conflict.

    Another similar not resolved conflict you can find by the URL: e2e.ti.com/.../409584
    In this case HEVC encoder creates 2 keys with the same name on same DSP.

    Regards,
    Andrey Lisnevich
  • Hi Andrey,


    Please find my findings inline.


    Query 1 : Conflicts in names of multi-DSP keys :

    [Palachandra] :

    - Keys used for Data sharing for Inter-chip communication have same names.


    - Although the memory is allocated on different chips virtually they refer to same memory between two neighboring chips in cyclic fashion.


    - When I run the encoder in 4 chip configuration without tiles, I observe that

    DSP0:
    keyCreate shmem name=shared_mem_CABAC_Context03 user_id=0 num_users=2 user_ids=0,24 type=DDR_CACHED size=1600 alignment=128

    DSP3:
    keyCreate shmem name=shared_mem_CABAC_Context03 user_id=24 num_users=2 user_ids=0,24 type=DDR_CACHED size=1600 alignment=128

    Here shared_mem_CABAC_Context03 in DSP0 indicates Backward Memory Access Key Creation where as shared_mem_CABAC_Context03 in DSP3 indcates Forward Memory access Key creation. Although the allocation is on two different chips they refer to the same memory.

    Query 2: Unresolved naming conflict:

    [Palachandra M V] : I have tried to answer the query in the original thread.

    Thanks and Regards,

    Palachandra M V

  • Thanks Palachandra,

    My fault. Wrong DSP numbers in my previous post. Should be:

    DSP0:
    keyCreate shmem name=shared_mem_CABAC_Context03 user_id=0 num_users=2 user_ids=0,24 type=DDR_CACHED size=1600 alignment=128

    DSP1:
    keyCreate shmem name=shared_mem_CABAC_Context03 user_id=8 num_users=2 user_ids=8,16 type=DDR_CACHED size=1600 alignment=128

    Of course user_id=0 means first DSP first core and user_id=8 means next DSP first core. So problem with key naming exists.

    Regards,
    Andrey Lisnevich
  • Hi Andrey,

    Please find my findings inline:

    Query 1 : Conflicts in names of multi-DSP keys :

    [Palachandra] :

    - We are not observing issue with conflicts in names of multi-DSP keys.

    - I have made minor modifications in naming keys to resolve the naming conflict for multi-DSP Keys

    - Keys used for data sharing in Inter-chip communication share same key name.

    When I run the encoder in 4 chip configuration without tiles (Considering modifications in naming keys), I observe that

    - Memory allocation in forward direction in one chip is virtually same as memory allocation in backward direction in next chip, hence they share same name.

    - Apart from this, all keys allocated on different chips have unique names.



    Memory Allocations in Forward Direction:

    DSP0:
    keyCreate shmem name=shared_mem_CABAC_Context00 user_id=0 num_users=2 user_ids=0,8 type=DDR_CACHED size=1600 alignment=128

    DSP1:
    keyCreate shmem name=shared_mem_CABAC_Context01 user_id=8 num_users=2 user_ids=8,16 type=DDR_CACHED size=1600 alignment=128

    DSP2:
    keyCreate shmem name=shared_mem_CABAC_Context02 user_id=0 num_users=2 user_ids=16,24 type=DDR_CACHED size=1600 alignment=128

    DSP3:
    keyCreate shmem name=shared_mem_CABAC_Context03 user_id=8 num_users=2 user_ids=0,24 type=DDR_CACHED size=1600 alignment=128



    Memory Allocations in Backward Direction:


    DSP1:
    keyCreate shmem name=shared_mem_CABAC_Context00 user_id=8 num_users=2 user_ids=0,8 type=DDR_CACHED size=1600 alignment=128
    (Virtually same location as DSP0 Forward Allocation)

    DSP2:
    keyCreate shmem name=shared_mem_CABAC_Context01 user_id=0 num_users=2 user_ids=8,16 type=DDR_CACHED size=1600 alignment=128
    (Virtually same location as DSP1 Forward Allocation)

    DSP3:
    keyCreate shmem name=shared_mem_CABAC_Context02 user_id=8 num_users=2 user_ids=16,24 type=DDR_CACHED size=1600 alignment=128
    (Virtually same location as DSP2 Forward Allocation)

    DSP0:
    keyCreate shmem name=shared_mem_CABAC_Context03 user_id=0 num_users=2 user_ids=0,24 type=DDR_CACHED size=1600 alignment=128
    (Virtually same location as DSP3 Forward Allocation)


    - I have attached the updated library

    - Can you please check whether conflicts due to naming of multi-DSP Keys is resolved.

    6505.h265venc_ti.7z


    Thanks and Regards,

    Palachandra M V

  • Hi Andrey,

    Can you please let us know whether the changes in the updated library has resolved the conflict in names of multi-DSP keys.

    Regards
    Palachandra M V
  • Thanks! It solved my issues.