I try to do some image processing using c6678. And I get a correct result (as the 1920x1080 resolution image) using only one core.
Then I try to use 8 cores to do the same image processing, but I get the result with some defects as the marked place in the following image.
I think it could be the coherency problem. So I search the forum and then use Cache_inv / Cache_wb to make sure the coherence. But I still can not solve the probelm. My steps are as follow:
1. Write the image raw data to DDR, named pucInData.
2. For each core,
Cache_inv(pucInData, nImgHeight*nImgWidth, Cache_Type_ALL, TRUE); //Invalidate image data to make sure I see the right data in each core.
//////////////////////////////////////////////
//do the image processing
Sobel(pucInDataPtr, pucDataOutPtr, nImgWidth, nProcessLength); //pucInDataPtr = pucInData + an offset to the proper start position.
//////////////////////////////////////////////
Cache_wb(pucDataOutPtr, nProcessLength, Cache_Type_ALL, TRUE); //write the data back to make sure I can get the right data in Core 0.
3. In Core 0, after each core complete the operation,
Cache_inv(pucDataOut, nImgWidth*nImgHeight, Cache_Type_ALL, TRUE); //Invalidate the data to make sure I can see the right data.
pucDataOutPtr is the address of the processing result in each core which I aligned to 16 byte (128 bit).
I print out some information:
Core 0 Process Length=258720 Index from 0 ( 0, 0) to 3F2A0 (1440, 134) Address of pucDataOutPtr from 81001500 to 810407A0
Core 1 Process Length=258720 Index from 3F2A0 (1440, 134) to 7E540 ( 960, 269) Address of pucDataOutPtr from 810407A0 to 8107FA40
Core 2 Process Length=258720 Index from 7E540 ( 960, 269) to BD7E0 ( 480, 404) Address of pucDataOutPtr from 8107FA40 to 810BECE0
Core 3 Process Length=258720 Index from BD7E0 ( 480, 404) to FCA80 ( 0, 539) Address of pucDataOutPtr from 810BECE0 to 810FDF80
Core 4 Process Length=258720 Index from FCA80 ( 0, 539) to 13BD20 (1440, 673) Address of pucDataOutPtr from 810FDF80 to 8113D220
Core 5 Process Length=258720 Index from 13BD20 (1440, 673) to 17AFC0 ( 960, 808) Address of pucDataOutPtr from 8113D220 to 8117C4C0
Core 6 Process Length=258720 Index from 17AFC0 ( 960, 808) to 1BA260 ( 480, 943) Address of pucDataOutPtr from 8117C4C0 to 811BB760
Core 7 Process Length=258720 Index from 1BA260 ( 480, 943) to 1F9500 ( 0, 1078) Address of pucDataOutPtr from 811BB760 to 811FAA00
Then I guess, the data align could be 128 byte. So I aligned the pucDataOutPtr in each core to 128 byte. But the result is still the same.
So, where is my mistake? How can I fixe it? Thanks.