• Resolved

Multi-DSP HEVC encoder hangs.

Hi,

HEVC encoder (1.0.0.44) works good for me on single DSP doing live stream for few days without any issues. But it doesn't work on multiple DSPs - produces wrong HEVC video and hangs after few process calls.

Here you can find logs and output (MPEG-TS stream) of HEVC running on 16 cores:

drive.google.com/.../view

Notes about the implementation:

- I implemented all the synchronization primitives reqested throug multicore API and required for multi-DSP algorithm
- I tested all the shared between both DSPs buffers - they really point to the same memory but addresses on different DSPs are different because of different OB registers usage.
- Output buffer is also the same shared between both DSPs.
- In logs you can see all keyCreate callbacks and their results.
- In logs you can see beginning of each process call: process inputID=1 outBuf=@64b9cbc4
- In logs you can see all shm sync callbacks.
- In logs you can see all read and write mailbox callbacks.

Questions:

1) Do you see any problems that cause the hang of encoder?
2) Why there is no shm sync callbacks that sync shared between DSP memory (DDR_CACHED)?
3) Mailbox write initates interrupt on another DSP. This interrupt callback terminates execution of process call task on aonther DSP and queries\reads mailboxes. Is any additional synchronization between process call taks and interrupt callback required?

Regards,
Andrey Lisnevich

  • Hi Andrey,

    We are analyzing the log file. Can you please try encoding multi-tile on multi-chip, by making enableTiles flag to 1. Where in multi-tile scenerio interchip commuinication will be less.

    Can you please share the elementary stream, instead of MPEG-TS format.

    1) By description wise setup looks fine.
    2) Communication between adjacent chips done through inter-chip memory and data is updated through DMA so syncing is not required. And other memory used for communication is remote un-cached memory, where sync callbacks is not required.
    3) No additional synchronization is required. Interrupt are not used in tile encoding. So Tile encoding is better, before moving to single tile encoding.

    Regards

    Kuladeepak

  • In reply to Kuladeepak Gowda:

    Hi Kuladeepak,

    With tiles enabled encoder hangs in first process call.

    enableTiles = 1;
    numTileColumns = 1;
    numTileRows = 2;

    Logs: drive.google.com/.../view

    core#0 waits in barrier for core#8 (i.e. core#0 of DSP1)
    cores#1-7 done with process call.
    cores#8-15 wait for something forever by acquire/release lock.

    Also I have another issue, probably related to this one - artifacts even on single DSP when tiles are enabled - e2e.ti.com/.../406319

    Regards,
    Andrey Lisnevich
  • In reply to Andrey Lisnevich:

    Hi Andrew,

    It would be better to take newer version of encoder.

    Can you please share the config.

    Regards

    Kuladeepak
  • In reply to Kuladeepak Gowda:

    Hi Kuladeepak,

    HEVC 01.00.00.44 is the latest officially released encoder.

    The configuration:

    params.videnc2Params.encodingPreset= XDM_USER_DEFINED;
    params.videnc2Params.rateControlPreset = IVIDEO_STORAGE;
    params.videnc2Params.maxWidth = 704;
    params.videnc2Params.maxHeight = 576;

    params.videnc2Params.dataEndianness = XDM_BYTE;
    params.videnc2Params.maxInterFrameInterval = 1;
    params.videnc2Params.maxBitRate= 600000;
    params.videnc2Params.minBitRate= 600000;
    params.videnc2Params.inputChromaFormat = XDM_YUV_420P;
    params.videnc2Params.inputContentType = IVIDEO_PROGRESSIVE;
    params.videnc2Params.operatingMode = IVIDEO_ENCODE_ONLY;
    params.videnc2Params.profile = IH265_MAIN_PROFILE;
    params.videnc2Params.level = IH265_LEVEL_41;
    params.videnc2Params.inputDataMode = IVIDEO_ENTIREFRAME;
    params.videnc2Params.outputDataMode= IVIDEO_ENTIREFRAME;
    params.videnc2Params.numInputDataUnits = 1;
    params.videnc2Params.numOutputDataUnits= 1;

    int i;
    for (i = 0 ; i < IVIDEO_MAX_NUM_METADATA_PLANES; i++) {
    videnc2Params.metadataType[i] = IVIDEO_METADATAPLANE_NONE;
    }

    params.scalingMatrixPreset = IH265_SCALINGMATRIXPRESET_DEFAULT;
    params.decRefreshType = 0;
    params.decRefreshInterval = 1;
    params.enableTransQuantBypass = 0;
    params.maxPoc = 256;
    params.enableTransformSkip = 0;
    params.maxIntraFrameInterval = 120;
    params.enableWPP = 0;
    params.maxNumRefFrames = 1;
    params.enableVirtualTile = 0;
    params.debugTraceLevel = 0;
    params.lastNFramesToLog = 0;

    params.rateControlParams.rateControlParamsPreset = IH265_RATECONTROLPARAMS_USERDEFINED;
    params.rateControlParams.rcAlgo = 0;
    params.rateControlParams.qpI = -1;
    params.rateControlParams.qpMaxI = 40;
    params.rateControlParams.qpMinI = 12;
    params.rateControlParams.qpP = 28;
    params.rateControlParams.qpMaxP = 51;
    params.rateControlParams.qpMinP = 12;
    params.rateControlParams.qpOffsetB = 4;
    params.rateControlParams.qpMaxB = 51;
    params.rateControlParams.qpMinB = 12;
    params.rateControlParams.enableFrameSkip = 0;
    params.rateControlParams.enablePartialFrameSkip = 0;
    params.rateControlParams.qualityFactorIP = 0;
    params.rateControlParams.cbQPIndexOffset = 2;
    params.rateControlParams.crQPIndexOffset = 2;
    params.rateControlParams.initialBufferLevel = 1200000;
    params.rateControlParams.hrdBufferSize = 1200000;
    params.rateControlParams.enableHRDComplianceMode = 0;
    params.rateControlParams.maxFrameSkipCnt = 0;
    params.rateControlParams.SubFrameRC = 1;
    params.rateControlParams.maxDeltaQP = 0;
    params.rateControlParams.enablePRC = 0;

    params.loopFilterParams.loopFilterParamsPreset = IH265_SLICECODINGPRESET_USERDEFINED;
    params.loopFilterParams.enableDeblockFilter = 1;
    params.loopFilterParams.enableSaoFilter = 1;
    params.loopFilterParams.enableLoopFilterSliceBoundary = 0;
    params.loopFilterParams.enableLoopFilterTileBoundary = 0;
    params.loopFilterParams.separateCbCrSAO = 0;
    params.loopFilterParams.offsetLoopFilterInPPSFlag = 0;
    params.loopFilterParams.offsetDeblockBetaDiv2 = 0;
    params.loopFilterParams.offsetDeblockTcDiv2 = 0;

    params.gopCntrlParams.gopCntrlParamsPreset = IH265_GOPCTRLPRESET_DEFAULT;

    params.sliceCodingParams.sliceCodingPreset = IH265_SLICECODINGPRESET_USERDEFINED;
    params.sliceCodingParams.sliceCodingMode = 0;
    params.sliceCodingParams.sliceCodingArg = 0;
    params.sliceCodingParams.enableTiles = 1;
    params.sliceCodingParams.numTileColumns = 1;
    params.sliceCodingParams.numTileRows = 2;
    params.sliceCodingParams.enableDependentSlice = 0;

    params.intraCodingParams.intraCodingPreset = IH265_INTRACODINGPRESET_USERDEFINED;
    params.intraCodingParams.intraRefreshMethod = 0;
    params.intraCodingParams.intraRefreshRate = 0;
    params.intraCodingParams.constrainedIntraPredEnable = 0;
    params.intraCodingParams.enableStrongIntraSmoothing = 1;
    params.intraCodingParams.matchYCbCrIntraMode = 0;
    params.intraCodingParams.enableLumaIntra4x4Mode = 0;
    params.intraCodingParams.enableLumaIntra8x8Mode = 0;
    params.intraCodingParams.enableLumaIntra16x16Mode = 0;
    params.intraCodingParams.enableLumaIntra32x32Mode = 0;
    params.intraCodingParams.enableChromaIntra4x4Mode = 0;
    params.intraCodingParams.enableChromaIntra8x8Mode = 0;
    params.intraCodingParams.enableChromaIntra16x16Mode = 0;

    params.interCodingParams.interCodingPreset = IH265_INTERCODINGPRESET_USERDEFINED;
    params.interCodingParams.enableTmvp = 0;
    params.interCodingParams.searchRangeHorP = 144;
    params.interCodingParams.searchRangeVerP = 32;
    params.interCodingParams.searchRangeHorB = 144;
    params.interCodingParams.searchRangeVerB = 32;
    params.interCodingParams.interCodingBias = 0;
    params.interCodingParams.skipMVCodingBias = 0;
    params.interCodingParams.numMergeCandidates = 3;
    params.interCodingParams.enableBiPredMode = 0;
    params.interCodingParams.enableFastIntraAlgo = 1;

    params.vuiCodingParams.vuiCodingPreset = IH265_VUICODINGPRESET_DEFAULT;
    params.vuiCodingParams.aspectRatioInfoPresentFlag = 1;
    params.vuiCodingParams.aspectRatioIdc = IH265_ASPECTRATIO_EXTENDED;
    params.vuiCodingParams.videoSignalTypePresentFlag = 0;
    params.vuiCodingParams.videoFormat = 0;
    params.vuiCodingParams.videoFullRangeFlag = 0;
    params.vuiCodingParams.colourDescriptionPresentFlag = 0;
    params.vuiCodingParams.colourPrimaries = 0;
    params.vuiCodingParams.transferCharacteristics = 0;
    params.vuiCodingParams.matrixCoefficients = 0;
    params.vuiCodingParams.timingInfoPresentFlag = 0;

    params.seiParams.enableSeiFlag = 0;

    params.ctbCodingParams.maxCTBSize = 64;
    params.ctbCodingParams.maxCUDepth = 3;

    dynamicParams.videnc2DynamicParams.forceFrame = IVIDEO_NA_FRAME;
    dynamicParams.videnc2DynamicParams.generateHeader = XDM_ENCODE_AU;
    dynamicParams.videnc2DynamicParams.ignoreOutbufSizeFlag = XDAS_FALSE;
    dynamicParams.videnc2DynamicParams.inputWidth = 704;
    dynamicParams.videnc2DynamicParams.inputHeight = 576;
    dynamicParams.videnc2DynamicParams.interFrameInterval = 1;
    dynamicParams.videnc2DynamicParams.intraFrameInterval = 120;
    dynamicParams.videnc2DynamicParams.mvAccuracy = IVIDENC2_MOTIONVECTOR_QUARTERPEL;
    dynamicParams.videnc2DynamicParams.putDataFxn = NULL;
    dynamicParams.videnc2DynamicParams.putDataHandle = 0;
    dynamicParams.videnc2DynamicParams.getDataFxn = NULL;
    dynamicParams.videnc2DynamicParams.getDataHandle = 0;
    dynamicParams.videnc2DynamicParams.getBufferFxn = NULL;
    dynamicParams.videnc2DynamicParams.getBufferHandle = 0;
    dynamicParams.videnc2DynamicParams.refFrameRate = 25000;
    dynamicParams.videnc2DynamicParams.targetFrameRate = 25000;
    dynamicParams.videnc2DynamicParams.sampleAspectRatioWidth = 1;
    dynamicParams.videnc2DynamicParams.sampleAspectRatioHeight = 1;
    dynamicParams.videnc2DynamicParams.targetBitRate = 600000;

    memcpy(&dynamicParams.rateControlParams, &params.rateControlParams, sizeof(params.rateControlParams));
    memcpy(&dynamicParams.loopFilterParams, &params.loopFilterParams, sizeof(params.loopFilterParams));
    memcpy(&dynamicParams.intraCodingParams, &params.intraCodingParams, sizeof(params.intraCodingParams));
    memcpy(&dynamicParams.interCodingParams, &params.interCodingParams, sizeof(params.interCodingParams));
    memcpy(&dynamicParams.ctbCodingParams, &params.ctbCodingParams, sizeof(params.ctbCodingParams));
    memcpy(&dynamicParams.sliceCodingParams, &params.sliceCodingParams, sizeof(params.sliceCodingParams));

    dynamicParams.enableTransQuantBypass = params.enableTransQuantBypass;
    dynamicParams.enableTransformSkip = params.enableTransformSkip;
    dynamicParams.enableROI = 0;

    Regards,
    Andrey Lisnevich

  • In reply to Andrey Lisnevich:

    Hi Kuladeepak,

    Do you have any news on this?
    Do you see from logs why encoder hangs?

    Regards,
    Andrey Lisnevich
  • In reply to Andrey Lisnevich:

    Hi,

    I've finally integrated HEVC 2.0.0.0 encoder but still having issues with it running on multiple DSPs.

    I am trying to run HEVC encoder 704x576 on 2 DSPs with 2 tiles enabled:

              enableTiles = 1;
              numTileColumns = 2;
              numTileRows = 1;
              subFrameRC = 0;

    But it produces wrong output and with time hangs. Sample of output (raw HEVC bitstream): out.123

    At the same time single DSP configuration that works on the same code base produces correct bitstream.

    I do validation of input buffers to be the same on all cores and DSPs by computing checksum on each core.

    I do validation that output bufer located in x86 memory is the same on all cores and DSPs. Address of output buffer is different on different DSPs because of different OB register mappings but I do validation that it points to the same x86 memory on all the DSPs.

    Addresses of RMT* shm memory blocks are different on different DSPs because of different OB register mappings but I do validation that they point to the same x86 memory on all the DSPs.

    Also I did validation of other multi DSP (barrier, shm) and single DSP (barrier, shm, lock) keys and functionality.

    Logs with details: 7750.logs.zip

    I failed to find the reason. Can you please help to find out what is wrong?


    Regards,

    Andrey Lisnevich

  • In reply to Kuladeepak Gowda:

    Hi Kuladeepak,

    HEVC 2.0.0.0 is integrated but it still doesn't work for me correctly when runs on multiple DSPs. Details are in the post above.
    Can you please help to find out what is wrong?

    Regards,
    Andrey Lisnevich
  • In reply to Andrey Lisnevich:

    Hi Andrey,
    After analysing bitstream. It looks like other(Chip-1) data is not updated in the output buffer.
    In multichip and multiTile scenorio, At each chip bitstreams will be stored in local DDR. After emulation also, bitstream is stored in local DDR. local master will DMA the bitstream to X86 output memory.

    Can you check the bitstream at these location at the end of the process.
    shmem name = "shared_mem_bitstreamXX" -> Each chip will update the bitream in this memory.
    shmem name = "shared_mem_SwappedStreamXX" -> After Emulation Each chip will update the bitream in this memory.

    And bitstream offset for each is updated in RMT_Uncached memory by each chip. Bitstream is transfered from these offset.
    shmem name = "shared_mem_chip2chip" -> RMT_Uncached memory where each chip will update the bitstream size. If we look at the log.
    This memory address is different for chip, but it should be same.

    Regards
    Kuladeepak

  • In reply to Kuladeepak Gowda:

    Hi Kuladeepak,

    I modified code in the way so OB register mapping is the same on Chip-0 and Chip-1. Now I get different result that is not correct but looks better: hevc.123

    So I already may assume that current HEVC encoder requires output x86 buffer to be mapped to the same address (i.e. same OB register) on all DSPs. That in turn not correct and makes much harder dynamic x86 buffer management for apps that dynamicaly create and delete encoders.

    Can you confirm this?

    Regards,

    Andrey Lisnevich

  • In reply to Kuladeepak Gowda:

    Hi Kuladeepak,

    This is more detailed logs with dumps of shared_mem_chip2chip, shared_mem_bitstream, shared_mem_SwappedStream and output buffer after each process call on cores#0 of each DSP.

    Also it contains logs of all keyCreate calls with details. Before each process call it prints in and out buffers. For in buffers it computes CRC32 to ensure that the input is the same.

    Generated HEVC bitstream is also in the archive.

    Do you see the problems there?


    hevc_dump.zip

    Regards,

    Andrey Lisnevich