TI experts-
Is there an app note, benchmark doc, or wiki that talks about optimizing C66x core arithmetic operations (add, xor, shift), for example using 8 byte data operands and performing multiple operations in a each cycle in order to fully utilize the pipeline?
I found the SLAA547 doc (C Implementation of Cryptographic Algorithms) but this is chip agnostic.
Thanks.
-Jeff
Signalogic