Another day there was a question about multiplication, which made me recall certain issue with C6400 code generator. Consider this
unsigned long long foo(unsigned int a, unsigned int b)
{ return (unsigned long long)a * b; }
C6400+ code generator, i.e. note plus, compiles it as single MPY32U instruction, which is totally cool. Now, provided that C6400 code generator, i.e. note lack of plus, compiles 32x32=32 multiplication as three 16-bit multiplications plus shift and pair of additions (totally cool), one would expect that 32x32=64 multiplication would be compiled as four 16-bit ones plus shifts and additions. But that's not what happens :-( Instead it generates function call to _mpyll, which apparently performs 64x64=64 multiplication with 10 16-bit multiplication and numerous shifts and additions. This is waste and naturally bad for performance...