PROCESSOR-SDK-J784S4: topk indices (int64) -> add yields wrong/zero result on emulator/DSP

Ghassen souissi

Tool/software:

Hello,

I’m using J784s4 with RTOS SDK 10_01_00_04 and TIDL version 10_01_00_01. My ONNX version is 1.15.0. I have a tiny model that contains a TopK node and an Add node.
The Add node takes the indices output of TopK as input together with a constant of type int64.

The model has 3 outputs: the values and the indices outputs of TopK, and the output of Add.

When I run inference on the emulator, the TopK output is correct but the Add output is wrong.
On the DSP, the TopK output is also correct but the Add output becomes all zeros.

For TopK, I used TIDL_refTopK for both the DSP and reference implementation.
After the slice node added for indices, I added a reshape node to change the element type to float, then a slice node which acts like identity node, because the DSP does not support a reshape with different type as the last node for TopK.

After some debugging, I found that the Add node always takes its inputs as int16 (which is float) and produces int16 output. Even if it takes its input as float, it still generates int16 output. This leads to wrong results because there is a mismatch between the input range (32 bits) and the output range (16 bits). That seems to be the reason why I get zeroes on DSP and wrong results on the emulator.

Can you please confirm if this is expected behavior?
Do you have any suggestions for a workaround that would allow a node to correctly consume inputs in int32 or int64 format and also produce its output in int32 or int64 (instead of int16)? The goal is to preserve the indices values without losing precision, since reducing them to int16 causes truncation/overflow and leads to wrong results on both emulator and DSP.
How to handle the indices to be carried out in a higher-width integer type that is supported by TIDL?
What is the recommended workaround to resolve this problem?

You can find here the model:add_topk.zip

And here you can find the model artifacts (in compilation and inference on emulator): add_topk_artifacts.zip

Thanks in advance.

24 days ago

0 Christina Kuruvilla 24 days ago

TI__Expert 5210 points

Hi Ghassen,

We have some fixes for TopK in the upcoming release. but I will double check if this issue is fixed in the latest version.

I will update after I investigate more. Since the issue seems initially due to the Add layer, I will have to verify to double check.

Warm regards,

Christina

0 Ghassen souissi 21 days ago in reply to Christina Kuruvilla

Prodigy 120 points

Hi Christina,

My TopK works fine: it generates the values and the indices. The problem is TIDL’s logic when those indices are fed into any EltWise node. TopK produces valid indices as a standalone output, but when I feed them into an EltWise (Add) the EltWise node treats them as zeros and produces a wrong output on the emulator and all zeros on DSP.
I believe this happens because TopK produces the indices as float while EltWise nodes are executed as int16 so the indices get reduced/treated as int16 that's why I get zeroes.

Do you have any suggestions for a workaround that would allow a node to correctly consume inputs in int32 or int64 format and also produce its output in int32 or int64 (instead of int16)? The goal is to preserve the indices values without losing precision, since reducing them to int16 causes truncation/overflow and leads to wrong results on both emulator and DSP.

0 Christina Kuruvilla 21 days ago in reply to Ghassen souissi

TI__Expert 5210 points

Hi Ghassen,

What were the model configurations you used to run this?

For the TopK the indices values are always stores as int32 bit and most other layers are execute in int16 or int8 (if set). Usually after a TopK, we see a Gather layer, which can take in the int32 indices properly.

As I was trying to understand your model to better advise , I wanted to ask what the reasoning behind the slice after the indices are for. Are you trying to implement a Gather layer functionality using the Slice and Add?

Also, just to confirm, but did you see that ONNXRT outputs the correct behavior for this?

Warm regards,

Christina

Processors

Processors forum

PROCESSOR-SDK-J784S4: topk indices (int64) -> add yields wrong/zero result on emulator/DSP