This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Efficient way to transpose matrix?

Hi,

I would like to transpose (exchange columns and rows) a 64x64 matrix, which contains 1-byte elements (so 8x8 registers).

All solutions I have come up so far are horrible inefficient, and require a huge amount of instructions - so maybe there are some special tricks which would make it more efficient.

Thank you in advance, Clemens