This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728: SGX buffers in TILER

Expert 6280 points
Part Number: AM5728

Dear Team,

from our customer:

We faced a problem while using several Gstreamer video pipelines. Some of running applications were ended and we got this error:

[ 1111.109174] omapdrm omapdrm.0: could not remap: -12 (3)

We've referred to PSDKLA-3753 on this site: http://processors.wiki.ti.com/index.php/Processor_SDK_Linux_Automotive_Post_Release_Fixes

This explains observed behaviour, because buffers are allocated in TILER and this has maximum of 128MB available.

Indeed, we were able to check TILLER memory usage by cat /sys/kernel/debug/dri/0/tiler_map and everything was OK until we exceed the limit.

After exceeding the limit we got the error mentioned above. However, we do not use PSDKLA because it is suggested for different platform (DRA7xx) as far as I know. We use PROCESSOR-SDK-LINUX-AM57X which is intended to be used with AM57X. We checked last release 04_01_00_06 and it seems that it does not cover the same functionality as PSDKLA-3753. It updates omapdrm (kernel space), libdrm-omap (user-space), but not SGX part similar to this commit within PSDKLA: http://arago-project.org/git/projects/?p=meta-glsdk.git;a=commit;h=c2c93adce12d24f124ff9b8c979115227d555dce

Therefore, all SGX buffers are still allocated in TILER (including Weston).

  • Do you plan to update also SGX part?
  • I've read that it is necessary to allocate memory in TILER for encoder/decoder part, but is it possible to allocate buffers for VPE in CMA region?


Thank you for your support.

  • The software team have been notified. They will respond here.
  • Hi Bartosz,
    Yes, this fix is available only for DRA7xx device + kernel v4.4 based SDK.
    hence the recipe file arago-project.org/.../
    mentions a branch-name ti-img-sgx/1.14.3699939_k4.4.

    Kernel 4.9 doesn't have all kernel patches for this. One patch is missing.
    In this link, arago-project.org/.../
    3rd patch is not available on k4.9 since it is a HACK and it is not upstreamable.

    You can try applying 3rd patch on kernel and SGX recipe patch, it should work on 4.9k based SDK also( but you may miss few sgx-ddk-um bugfixes from branch ti-img-sgx/1.14.3699939 which is current branch for your SDK)
    if there is no dependency for SGX userspace libraries, this should work.

    With this, all SGX allocations includeing weston will be from CMA region.

    VPE buffers can also work with CMA but with gstreamer it is not possible since there is a common dumb allocator
    for ducati and vpe.
    A flag OMAP_BO_MEM_CONTIG needs to be set while allocating buffer in CMA region and this can not be passed now
    for VPE.

    Thanks
    Ramprasad
  • Hi Ramprasad,

    thank you, this helps. We'd have some questions:

    1. It is descibed that it should also work with sources from a different SDK, but there is no direct answer to the question whether you plan to update SGX part also for the SDK suitable for AM57XX. This fix has S2-Major severity for mentioned PSDKLA.

    2.

    Ramprasad said:

    You can try applying 3rd patch on kernel and SGX recipe patch, it should work on 4.9k based SDK also( but you may miss few sgx-ddk-um bugfixes from branch ti-img-sgx/1.14.3699939 which is current branch for your SDK)
    if there is no dependency for SGX userspace libraries, this should work.

     Could you kindly elaborate/clarify the meaning of "no dependency for SGX userspace libraries"?

    Thank you,

    Regards,

    Bartosz

  • Hi Ramprasad,

    can we count on your feedback regarding the above two questions?

    Thank you,
    Bartosz

  • Dear Team,

    any feedback on the above 2 questions?

    Thanks,

    Regards

  • Hello,

    Let me check with the experts and get back to you.


    Best Regards,
    Margarita

  • What's your actual gstreamer pipeline and use case?? What are you using SGX for and who is owning it?  Will need these information to direct you on right path and approach. 

  • Hi Manisha,

    Use case is showing a video from IP cameras (CCTV system). Therefore, it is necessary to launch multiple gstreamer pipelines. Example:

    gst-launch-1.0 rtspsrc latency=200 location="rtsp://192.168.35.212/axis-media/media.amp?videocodec=h264&h264profile=main&resolution=1024x768&fps=25" ! rtph264depay ! h264parse ! ducatih264dec ! vpe ! 'video/x-raw, width=1024, height=768' ! waylandsink

    VPE with capsfilter is necessary to use because without that video image get stuck as I reported earlier.

    As far as we know, memory for Weston, Waylandsink, decoder and VPE comes from TILER now and this is quite limiting for our use case. Every instance "eats" a lot of TILER memory. Just the Weston alone occupies more then 20MB of TILER (fixed size is 218MB).

    Thanks for your help.

    Regards,

    Bartosz

  • Hi Bartosz,

    As a quick hack, can you try disabling DMM from the dts file. This will cause DMM to not be available, and the DRM driver will fall to use default CMA. Let me know how it goes for you.

    You can find dmm entry in dra7.dtsi file under linux directory arch\arm\boot\dts\dra7.dtsi

    Regards,
    Manisha
  • Hi Manisha,

    wish you all the best in 2018. Coming back to our discussion:

    1. The hack with disabling DMM in dra7.dts causes the following error in gstreamer plugin initialization:

    MmRpc_create: Error: connect failed
    ../git/libdce.c:416: dce_ipc_init ERROR: Failed eError == DCE_EOK error val -4../git/libdce.c:479:

    Do you have any guidelines why does that happen/how to overcome that?

    2. We've found out that we could use less of TILER memory if we change gstreamer plugin as described in codec documentation: 

    It looks that decoder allocates large buffers to handle reordering of frames which isn't needed as it is already handled by source/jitterbuffer element. Documenation in ipumm/extrel/ti/ivahd_codecs/packages/ti/sdo/codecs/h264vdec/docs has section about decreasing ddr use by decreasing that. Gstreamer plugin just sets those parameters to auto. Could that be added for gstreamer plugin?

    Thanks for your help,
    Regards, Bartosz

  • Thanks Bartosz! Wish you Happy New Year too!!

    Bartosz Marcinkowski said:

    The hack with disabling DMM in dra7.dts causes the following error in gstreamer plugin initialization:

    MmRpc_create: Error: connect failed
    ../git/libdce.c:416: dce_ipc_init ERROR: Failed eError == DCE_EOK error val -4../git/libdce.c:479:

    Do you have any guidelines why does that happen/how to overcome that?

    Does this work for pipeline without ducati codec?

    Bartosz Marcinkowski said:
    It looks that decoder allocates large buffers to handle reordering of frames which isn't needed as it is already handled by source/jitterbuffer element. Documenation in ipumm/extrel/ti/ivahd_codecs/packages/ti/sdo/codecs/h264vdec/docs has section about decreasing ddr use by decreasing that. Gstreamer plugin just sets those parameters to auto. Could that be added for gstreamer plugin?

    I don't think that jitterbuffer element can take care of buffer reordering imposed by H.264 standard. If you know the configuration of the encoder and know that encoder doesn't uses any long term reference frame, then you can reduce the decoder buffer requirement. If the application is supposed to behave as universal decoder, able to decode any H.264 stream for given profile and level, then you will need to obey the decoder requirement on that buffer. 

  • As I am now responsible for creating port from customer side I will reply directly.

    With hack I also get wayland crash, I confirmed that only recently. Previously I used X branch from source which gives that error and prebuild wayland branch. With wayland I was not sure if that wasn't caused by incompatible kernel. Compiling wayland branch to see that there is same problem took while.

    For second part gstreamer is highly configurable tool used to build specialized pipelines and one could assume some control over encoder.

    For each type of source there could be different pipeline to handle requirements on latency, resolution, quality, scaling and rotation, caption with name of camera etc.

    Here issue is that when using baseline profile there are no B frames but ducati plugin documentation say only meaningful for codecs with B frames. However there is no difference when I launch pipeline and look at TILER usage.
    In our case we found out that camera uses GOP=32 by default which leads to large buffer usage. Decreasing that is a workaround that isn't ideal because it will increase network load.

    Jitterbuffer handles network retransmission/reordering of rtp stream, after that frames will have same ordering as from encoder, if data don't arrive in time they get marked as frame lost.
  • Ondrej Bilka said:
    With hack I also get wayland crash, I confirmed that only recently. Previously I used X branch from source which gives that error and prebuild wayland branch. With wayland I was not sure if that wasn't caused by incompatible kernel. Compiling wayland branch to see that there is same problem took while.

    By "hack", are you referring to disabling DMM in dts file hack? Which application did you run to confirm wayland is crashing?? What kind of error log do you see? What do you mean by wayland branch??

    Ondrej Bilka said:
    Jitterbuffer handles network retransmission/reordering of rtp stream, after that frames will have same ordering as from encoder, if data don't arrive in time they get marked as frame lost.

    network data reordering is different compared to display delay/reorder. Possibly you are getting confused between the two.

    There are two parameters to control to reduce H.264 decoder DDR memory footprint. IH264VDEC_Params::dpbSizeInFrames.

    IVIDDEC3_Params::displayDelay -> When the encoded order of frames (and hence the decoded order) is different then display order. 

    H264VDEC_Params::dpbSizeInFrames -> Max number of reference frames (in past) encoder can refer to encode to encode current frame. For level 5.1, this is specified to 16 but not necessary all encoders uses that. Most of the encoders may use at max 2 to 3 reference frames.  May be this wiki page can help to build the understanding. 

      

    Ondrej Bilka said:
    In our case we found out that camera uses GOP=32 by default which leads to large buffer usage. Decreasing that is a workaround that isn't ideal because it will increase network load.

    It is not the GOP size but the GOP structure that matters to reduce the foot print. From H.264 vide codec user guide Appendix F section -

    "For simple GOP structures, typically, it is enough if dpbSizeInFrames is configured as M+1. And display delay is set as M, where M = SPS -> max_num_ref_frames."

    Possibly, you can check the max-ref-frames in SPS NAL unit of the encoded stream and set  the viddecParams field values accordingly. 

  • Yes, it was dmm hack, Wayland crashed on startup and system was only acessible with serial port/console. Perhaps next time you suggest such hack you should test it to see if machine boots correctly.

    I am not confusing anything. Issue is that you code to determine number for display reorder simply does not work in gstreamer plugin. You should read posts more carefully as I wrote:
    Here issue is that when using baseline profile there are no B frames but ducati plugin documentation say only meaningful for codecs with B frames.

    There is no reordering needed for baseline profile and software decoder plays same stream with sub 200ms latency. Only your hardware decoder needs 500ms to work at all.
  • Hi Ondrej,

    For the situation, when our out of box Processor SDK Linux offerings doesn't meet exact need of customers and we are not able to pull the request in our SDK offerings, we have most of the source code made available via external git repository and have provided setup to build via Arago build setup. For any custom need and requirement outside the Processor SDK offerings, our customer can play with source code, do the modifications and build via Arago build setup to see if it fits the need. Sometimes we can provide suggestions or ideas to customers to help them enable their custom need. That doesn't necessarily mean we always validate the suggestion. If we incorporate the effort to validate the suggestion and that doesn't fit the amount of time needed to spend on validation, then it will prevent us to provide the suggestion in first place which many customers find useful.

    It would have helped me to understand your post better if this was mentioned in quotes " only meaningful for codecs with B frames. " The statement lost its meaning without it.

    Anyways, my suggestion was to modify ducatih264dec gstreamer plugin source code for num_ref_frames (only if you are sure that incoming encoded stream won't use more the N reference frames that you intend to set) and try that out. Source code can be found in below link. Look for num_ref_frames in that source code. Again, I would like to highlight that I have not tried this suggestion. Also, note that this suggestion is to only help with managing the codec memory requirement and not with latency. 

    git.ti.com/.../gstducatih264dec.c