Hi, we want to ensure that TDA4VM can achieve an end-to-end processing time of less than 10ms, which includes image pre-processing, model inference, and post-processing steps like tracking. The backbone model is currently ResNet18, and the input image size is 640 x 384. The main tasks are lane prediction and 3D object detection (3DOD). The final output size will be approximately 14N + 32000, depending on the number of detected objects (N). Do you have any recommendations for meeting these requirements?
Additionally, would using TIDL-RT for model inference be more beneficial compared to OSRT?