What does TI recommend to get best use of the multiple cores in F28377D for parallel processing? What software would you recommend? Do we support OpenMP?
The biggest opportunity for us to parallelize would be a simple parallel for loop, where each loop performs the N point FFT (your optimized assembly code from your library, CFFT_F32() ) where we may wish to compute say 8 different variants (different inputs) and compare the 8 different FFT outputs, as such if we can split them on the cores it would only take maybe close to the time it takes to compute 4 on one core…
Is there any optimized existing code we can use, or do we need to look into using our own custom code to make use of the 2nd core efficiently?