Implimentation of parallel programming base on C6678 architecture

dariush karami

Hi
I want to design an architecture to implement Parallel programming on C6678 (Data parallel).
I want Edma3 to sevice all the cores (1 to 7) and make L2SARAM of all the cores(1 to 7) to aping_pong buffer except Core_0;
In this regard, I designed an architecture as below:

1) I allocated Core_0 for Network and EDMA3 control, and management.
2) All the cores are related to Core_0 by definition of some registers in MSMC memory.
3) each core (1 to 7) has its own region in DDR3 memory.
4) if a core (1 to 7) is to use EDMA3, it should ask Core_0 to do so.

I should mention that the same task(program) run on core(1 to 7).
But I think this architecture may decrease the performance in practice!! Because whenever any specific Core wants to use EDMA3, it should ask
Core_0.

Another way is to use Core_0 for Network managing and Cores 1 to 7 for OpenMP; but this way is very complicated for me because of some problems such as belows:
1) I don't know how to execute OpenMP on Core 1 to 7 with respect to the Master Thread ID which is not 0 any more!!
2) I should maintanance cache coherency for all the Cores (1 to 7) when using OpenMP and it is very complicated especially when using several functions for OpenMP
programming.

It seems that using EDMA3 is very easier than OpenMP because OpenMP is not user freindly at all. (I have already used OpenMP programming in Visuall Studio
on desktop processor).

Do you have any better suggestion? guide me please?

over 9 years ago

0 dariush karami over 9 years ago

Genius 3165 points

Hi
please answer me , I'm waiting.

0 Sivaraj Kuppuraj over 9 years ago in reply to dariush karami

TI__Mastermind 35645 points

Hi,

Thanks for your post.

In general, the C6678 is designed to do as many transfers as possible in parallel, and the ones that do not technically happen in parallel will be done automatically as quickly as possible so it may appear to be in parallel.

Please refer to the example that comes with the EDMA3 LLD in the MCSDK package. The example is named edma3_driver.

According to the “Multicore Programming Guide” below, there are three models a) Master/Slave b) Data Flow and c) Open MP that you can identify the parallel task implementation, kindly refer section 2.1 from the below doc. for more information:

http://www.ti.com.cn/cn/lit/an/sprab27b/sprab27b.pdf

Also, kindly check Appendix B.1 for the typical steps & procedure involved in setting up a transfer from the EDMA3 user guide below for your reference:

http://www.ti.com/lit/ug/sprugs5b/sprugs5b.pdf

Thanks & regards,

Sivaraj K

-------------------------------------------------------------------------------------------------------

Please click the Verify Answer button on this post if it answers your question

-------------------------------------------------------------------------------------------------------

0 dariush karami over 9 years ago in reply to Sivaraj Kuppuraj

Genius 3165 points

Dear Sivaraj

I studied the documents that you referred(sprab27b,pdf and sprugs5b.pdf) and I have some questions now.

Please, introduce me some application program or example code about questions below.

Question 1:

How each core can use EDMA3 independently and simultaneously?

Question 2:

How can I use and program hardware semaphore?

Best Regards

0 Sivaraj Kuppuraj over 9 years ago in reply to dariush karami

TI__Mastermind 35645 points

Hi,

Thanks for your update.

To address Question#1,

Though you have triggered multiple read/write from separate cores, and data going to the separate cores are not different core transfers, they are different EDMA transfers, and the EDMA is independent of the CorePacs.

You have 3 EDMA3 modules that operate independently from each other. Each of those has multiple Transfer Controllers that all operate independently from each other, even on the same EDMA3 instance.

The C6678 is designed to do as many transfers as possible in parallel, and the ones that do not technically happen in parallel will be done automatically as quickly as possible so it may appear to be in parallel

To introduce EDMA sample application as per your request above:

There are EDMA based SYS/BIOS examples under EDMA_LLD of MCSDK 2.0 folder which has test/demo code to demonstrate the EDMA3 driver functionality and it does EDMA3 mem-to-mem data copy testcase, EDMA3 ping-pong buffers based data copy test case on BIOS6. The examples can be found in the path below:

~\ti\mcsdk_2_01_02_06\edma3_lld_02_11_05_02\examples\edma3_driver\evm6678\sample_app

Likewise, you also have CSL based EDMA examples at the below loaction:

~\ti\mcsdk_2_01_02_06\pdk_C6678_1_1_2_6\packages\ti\csl\example\edma\edma_test.c

The above is the EDMA example test code uses the EDMA CSL functional layer for the c6678 device.

To address Question #2,

LLD uses Semaphore as a mutex when allocating resources which create the semaphore using your O/S, then pass handle to LLD. A semaphore would be used as a mutex for resource protection within the EDMA driver.

To program the EDMA3 using LLD,

http://processors.wiki.ti.com/index.php/Programming_the_EDMA3_using_the_Low-Level_Driver_%28LLD%29

http://processors.wiki.ti.com/images/5/5e/EDMA3_LLD.pdf

Thanks & regards,

Sivaraj K

-------------------------------------------------------------------------------------------------------

Please click the Verify Answer button on this post if it answers your question

-------------------------------------------------------------------------------------------------------

0 dariush karami over 9 years ago in reply to Sivaraj Kuppuraj

Genius 3165 points

Hi

Thanks for your answers

It is true that the EDMA3 hardware is independent from CorePac but some things confused me. for programming and running EDMA3 the Channel Controller(PaRAM) should program and there are only 3 channel controllers but the number of CorePacs is 8. So all CorePacs can't access EDMA3 at the same time.

So I thought how 8 core can use and program 3 Channel Controller at the same time?!!!

Question 1:

when there is 3 EDMA3 it means the programmer can program 9 channel controller and problem solved, Am I thinking right?

or it is related to 8 shadow regions?

in " sprab27b_Multicore_Programming_Guide.pdf " in sections 5.2 and 5.3(page 26) it describe the difference between hardware and OS semaphore. I want to use hardware semaphore for arbitrating between cores and I don't know how I should do that!!!

Question 2:

I think that for arbitrating between cores I should use hardware semaphore, am I thinking right? how should I do that?

Best Regards

0 dariush karami over 9 years ago in reply to dariush karami

Genius 3165 points

Please answer me, I'm waiting.

Regards

0 Sivaraj Kuppuraj over 9 years ago in reply to dariush karami

TI__Mastermind 35645 points

Hi,

To address Question #1,

it is not related to shadow registers. There are 3 EDMA Channel Controllers on the C6678 DSP, EDMA3CC0, EDMA3CC1, and EDMA3CC2.

• EDMA3CC0 has two transfer controllers: EDMA3TC1 and EDMA3TC2.

• EDMA3CC1 has four transfer controllers: EDMA3TC0, EDMA3TC1, EDMA3TC2, and EDMA3TC3.

• EDMA3CC2 has four transfer controllers: EDMA3TC0, EDMA3TC1, EDMA3TC2, and EDMA3TC3.

So, total 3 channel controllers and 8 transfer controllers on C6678.

To address Question #2,

Hardware semaphores would be required when arbitrating between cores. Yes, your thinking is correct and within a core, a global flag or an OS semaphore can be used.

Thanks & regards,

Sivaraj K

-------------------------------------------------------------------------------------------------------

Please click the Verify Answer button on this post if it answers your question

-------------------------------------------------------------------------------------------------------

Processors

Processors forum

Implimentation of parallel programming base on C6678 architecture