This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

F28377D Multicore parallel processing example code

What does TI recommend to get best use of the multiple cores in F28377D for parallel processing? What software would you recommend?  Do we support OpenMP?

 

              The biggest opportunity for us to parallelize would be a simple parallel for loop, where each loop performs the N point FFT (your optimized assembly code from your library, CFFT_F32() ) where we may wish to compute say 8 different variants (different inputs) and compare the 8 different FFT outputs, as such if we can split them on the cores it would only take maybe close to the time it takes to compute 4 on one core… 

 

              Is there any optimized existing code we can use, or do we need to look into using our own custom code to make use of the 2nd core efficiently?

  • Hi Charles,

    Since you can't run (at least to my knowledge) something like embedded Linux on a C2000 device, I don't think you will be able to use OpenMP. Usually you would either run code on the bare metal, or use a real time operating system. It is probably worth posting in the TI-RTOS forum to see if their are any automatic mechanisms to do task distribution between cores: e2e.ti.com/.../

    For multi-core on C2000 devices you typically wouldn't see the type of symetric multiprocessing that you are describing. Instead, each of the cores (which are heterogeneous since you also have 2 CLAs on F2837xD) would each perform a specialized task. One CPU might be completely free to do FFT processing, while the other core juggles multiple communications interfaces, and finally the CLAs control some tight ADC-to-ePWM real-time control loops.

    That isn't however to say that you couldn't split the FFT processing in the way you describe. If I understand correctly, you have 8 different input sets that all need FFTs computed for them? Computing the FFTs in this case falls into the class of "embarrassingly parallel", so you probably don't need something as sophisticated as OpenMP to accomplish this; just load the 4 datasets for a given core into memory that that core can access and then let them compute the FFTs in parallel. Comparing the results, on the other hand, may be difficult to do in parallel, so this may have to be done on a single core.

    It is also worth noting that the F2837xD devices have an accelerator/ISA extension "VCU-II" that is specifically targeted towards increased FFT performance.