Hello,
we measured the performance of C6678 DSPs in the field of image registration, using a derivative based approach that is capable of registering arbitrary data, like biological tissue for medical image processing. For example we were able to register two 4096x4096 pixel images within 93 ms while offloading the CPU by a factor of 20 and requiring 3.12 times less electrical energy.
The preprint of the paper is available on my webpage, as well as links to the (open-source) software. Interestingly the software can be a good starting point for any software that needs to DMA-transfer data over PCIe to a C6678 or DSPC-8681: http://embedded-software-architecture.com/?page_id=11
The final publication is available at link.springer.com: http://link.springer.com/article/10.1007/s11554-014-0457-3
Best regards,
Roelof Berg