TDA4VM: MMA Load C Matrix

Axel Farrugia

Part Number: TDA4VM

Hi TI,

I'm using the MMA through the intrinsics.

I think I have a good understanding on how to use it now, however I still don't succeed in loading the C matrix properly.

Let's say I have 64x64 32b values (all ones for this example) that I want to load into the C matrix.
I would then setup the MMA as below (Only showing the relevant parameters) :

As the LDC instruction only handles 512b vectors, loading an entire C matrix row takes 4 calls of LDC.
Therefore, the swap and reset counters should be set to 4 x 64 (256) to fill the entire C matrix.
However when I get the C matrix rows back using the XFER then RCV intrinsics, I get the following result :

As we can (barely) see on the picture above, only the first 16 elements of the first row turned out to be set to 1.
This result is coherent though, as the CLSWPER and CLRSTPER counters are 8b registers.
Setting their values to 256 is like setting them to 0, leading to this behavior.

I have tried setting the load counters to 255, but as we might expect, I got zeros on the last 16 elements :

I have tried changing the LDDST register, but it did not solve my issue : I can't load the entire C matrix with 32b elements.

As I read in the documentation that loading the whole C matrix using the LDDST_X1 option was possible,
I am pretty sure that I miss something, but I can't find out what exactly...

Best regards,

Axel

over 1 year ago

0 Praveen Rao over 1 year ago

TI__Mastermind 48033 points

Hello Axel,

Can you give us a background as to why you are trying to use MMA and are you using it in the context of the TIDL? If not, can you detail the motivate for this?

Thanks.

0 Axel Farrugia over 1 year ago in reply to Praveen Rao

Prodigy 30 points

Hello Praveen,

As I understand it, the TIDL needs an OS to be used but my application has to be baremetal.
I am trying to create a custom kernel for a specific layer that cannot fit with one of the layers proposed by the MMALIB.

So I developed a prototype using the MMA intrinsics.

The only thing left for my custom kernel to work properly, is a way to load data into the C matrix.

Thanks a lot for your support,

Axel

0 Suman Anna over 1 year ago in reply to Axel Farrugia

TI__Guru** 111495 points

Hi Axel,

Thanks for the background.

The assigned engineer is currently not available, and we will respond back on this next week.

regards

Suman

0 Asha Bhandarkar over 1 year ago

TI__Genius 10170 points

Hi Axel,

Your observations are correct, since the C_CLSWPER and C_CLRSTPER fields are limited to 8 bits it is not possible to set these fields to 256 even though the matrix you have described should be able to fit in a memory array in the C matrix. These 8-bit fields are an architecture limitation, which the documentation does not make clear. I am currently raising a issue internally so this is clear in the documentation in the future. Otherwise, your understanding of setting the C matrix configuration parameters is correct.

With our current support strategy for MMA, we are not supporting the development of custom kernels or utilization of MMA outside of the context of MMALIB and TIDL, and I will only be able to provide very little support for your use case. However, I have outlined some potential workarounds and you can see if you can implement them yourself:

If loading 8 bit data into the C matrix is possible based on the range of your input data, you can utilize X4 mode so that the data is then promoted to 32-bit by the hardware. You would then set your configuration accordingly. C_CLSWPER and C_CLRSTPER ((8*64)/512) * 64 = 64 which will fit in that register field
To directly load the data as 32-bit, you could perform C = A x B and load A and B accordingly to obtain the C matrix you desire
Another method with directly loading the data as 32-bit, you could load the first half of the C matrix, close, and reopen with CLROW = 32 and load the second half of the matrix

Thank you for your patience when I was not available last week.

Best,

Asha

0 Axel Farrugia over 1 year ago in reply to Asha Bhandarkar

Prodigy 30 points

Hi Asha,

Thank you for your quick answer !

We eventually figured out your 3rd solution which suits quite well our needs.
Anyway, I am glad to know that I am not missing some key feature.

Thanks again,

Best regards,

Axel

Processors

Processors forum

TDA4VM: MMA Load C Matrix