Hi,
I would like to transpose (exchange columns and rows) a 64x64 matrix, which contains 1-byte elements (so 8x8 registers).
All solutions I have come up so far are horrible inefficient, and require a huge amount of instructions - so maybe there are some special tricks which would make it more efficient.
Thank you in advance, Clemens