Tool/software:
Hi,
We have a lot of custom C7X kernels and most of them use between 2-6 DRU channels for block processing. The issue we have is when we try to run kernels that require >2 channesl on the C7X_1 after TIDL has been initialized, then the dma initialization fails. We have found that we cannot use more than 2 DRU channels when TIDL is running because TIDL uses 14 and it seems that it does not share the handles.
I do want to note that those kernels work perfectly together when TIDL is off or when running on C7X_2.
Here's our flow:
- Init TIDL through OnnxRunTime
class Tidl { public: bool initialize(); bool runInference() const; /* ... */ private: bool initTidl(Ort::SessionOptions& sessionOptions); bool m_verbose; std::string m_artifactsPath; std::string m_weights; Ort::SessionOptions m_sessionOptions; Ort::Env m_env; std::unique_ptr<Ort::Session> m_session; std::unique_ptr<Ort::IoBinding> m_bindings; /* ... */ }; bool Tidl::initialize() { m_env = Ort::Env(m_verbose ? ORT_LOGGING_LEVEL_VERBOSE : ORT_LOGGING_LEVEL_WARNING, "tidl"); m_sessionOptions.SetIntraOpNumThreads(1); if (!initTidl(m_sessionOptions)) { return false; } m_sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED); // Initialize session try { m_session = std::unique_ptr<Ort::Session>(new Ort::Session(m_env, m_weights.c_str(), m_sessionOptions)); } catch (const Ort::Exception& e) { LOG_ERROR(e.what()); return false; } m_bindings = std::unique_ptr<Ort::IoBinding>(new Ort::IoBinding(*m_session)); return true; } bool Tidl::initTidl(Ort::SessionOptions& sessionOptions) { // Setup session using TIDL c_api_tidl_options options; OrtStatus* status = OrtSessionsOptionsSetDefault_Tidl(&options); if (status != nullptr) { LOG_ERROR("Failed to set default TIDL options."); return false; } fs::path weightsPath(m_weights); if (!fs::exists(weightsPath)) { LOG_ERROR("The following weight file does not exist : %s\n", m_weights.c_str()); return false; } m_artifactsPath = weightsPath.parent_path().string(); if (m_artifactsPath.size() > sizeof(options.artifacts_folder)) { LOG_ERROR("Artifacts path too long."); return false; } std::strncpy(options.artifacts_folder, m_artifactsPath.c_str(), m_artifactsPath.size()); // Debug level undocumented. Assumed to be 0 to 3, where 0 is error only, 3 is the most verbose. // Debug level 3 introduces very high latencies. options.debug_level = m_verbose ? 2 : 0; status = OrtSessionOptionsAppendExecutionProvider_Tidl(sessionOptions, &options); if (status != nullptr) { LOG_ERROR("Failed to set TIDL execution provider."); return false; } return true; }
- Run TIDL inference
bool Tidl::runInference() const { auto run_options = Ort::RunOptions(); try { m_session->Run(run_options, *m_bindings); } catch (const Ort::Exception& e) { LOG_ERROR(e.what()); return false; } return true; }
- C7X kernel creation (called through vxCreateGraph())
static vx_status MyCustomKernel_Create(tivx_target_kernel_instance kernel, tivx_obj_desc_t *objDesc[], uint16_t numParams, void* privArg) { /* ... */ for (int ch = 8; ch <= 10; ch++) { if (VX_SUCCESS != dma_acquire(handles[i], ch)) { VX_PRINT(VX_ZONE_ERROR, "Failed to acquire channel %u!\n", ch); success = false; } } /* ... */ } static vx_status dma_acquire(app_udma_ch_handle_t& handle, const uint32_t ch) { vx_status status = VX_SUCCESS; handle = appUdmaCopyNDGetHandle(ch); if (nullptr == handle) { VX_PRINT(VX_ZONE_ERROR, "appUdmaCopyNDGetHandle() returned nullptr!\n"); status = VX_FAILURE; } return status; }
- Error
[C7x_1 ] 1026752.990518 s: [C7x_1 ] 1026752.990529 s: [C7x_1 ] 1026752.990541 s: [C7x_1 ] 1026752.990547 s: [C7x_1 ] 1026752.990554 s: UDMA : ERROR: UDMA channel open failed!! [C7x_1 ] 1026752.990584 s: VX_ZONE_ERROR:[dma_acquire:1188] appUdmaCopyNDGetHandle() returned nullptr! [C7x_1 ] 1026752.990606 s: VX_ZONE_ERROR:[MyCustomKernel_Create:109] Failed to acquire channel 10! [C7x_1 ] 1026752.990622 s: UDMA : ERROR: NULL Pointer!!! [C7x_1 ] 1026752.990633 s: UDMA : ERROR: Unable to delete channel 10 handle!!!
As you can see from the log, only the third channel initialization (channel 10) failed. I have tried using other channels, but it's always the third one that fails. I think it's because of how the driver handles the requests.
I was recently told by someone from TI that this was not expected behavior. Do you see something wrong with our flow?
Thank you,
Fred