This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VH-Q1: Parallel Batch Inference on TDA4 Board

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: TDA4VH

Hello TI,
     

I have an ONNX model with input dimensions of 6xCxHxW. While exporting this model using the TIDL importer, I have configured the numbatchs parameter as 6. The version number I am using is TDA4VH 0805.

I would like to inquire whether setting batch export results in parallel inference on the TDA4 board or if it runs the inference 6 times sequentially on the board. In other words, is the inference time a cumulative result of 6 individual runs or a singular time for all 6 inputs?

Thank you for your assistance.

  • Hi,

    8.5 release version supports batch processing by intelligently converting the model into a single batch equivalent model internally and running on a single core. This would give performance better than sequential run but not as good as running them in parallel in most of the cases. Performance also depends on the models and may/may not be optimal to run multiple batch models.

    Upcoming SDK 9.0 release would support running models upto batch size of 4 using parallel execution on cores on TDA4VH.

    Regards,

    Anand

  • Hi,

    Anand

    """ Upcoming SDK 9.0 release would support running models upto batch size of 4 using parallel execution on cores on TDA4VH. """

    Can I only run parallel with a maximum batch of 4 in version SDK 9.0? If my requirement is to run batch for 8, does that mean it is not supported?

    Thank you for your assistance.

  • Hi 

    I see following aspects of running batch processing on TDA4VH:

    (1) TDA4VH has 4 DSP cores, so model with batch size of 4 is supported to be run with parallel compute. In case application requires batch size of more than 4, this feature can still be used by running multiple iterations of 4 batch model (e.g. will have to run inference twice on a 4 batch model in case you need to process batch size of 8, since only 4 batches can be parallelized)

    (2) "8.5 release version supports batch processing by intelligently converting the model into a single batch equivalent model internally and running on a single core. This would give performance better than sequential run but not as good as running them in parallel in most of the cases" -- this feature is still available.

    Both the above features can be benchmarked and the more optimal one for the application can be used.

    I hope that answers the question. Let me know if something is still unclear.

    Regards,

    Anand