Hello,
I am again asking my Question because I have not get any satisfying solution to that till yet. I am using C6678 and I have tried running Matrix - Matrix Multiplication on it both in Single Precision and Double precision. I have tried to Keep data at L1, L2 ,MSMC and DDR3 by CCS and RTCS. But, I want to see performance and Want to calculate Cycle Counts while putting my data in Nearest to core , I mean REGISTERS. Nothing Else. I am not saying about large size matrix, I am only interested in smallest size, take 2by 2, or even 3by 3.
Do I need to write Linear Assembly for that Which I am not sure. I have C code with my self for Matrix matrix Multiplication which I have written. Can Anybody please help me with this. I am trying some solution for this, But I don't know I am not getting any satisfying solutions to it. Provide me some links, (Please Don't provide me RTSC Links), or Any Documents if you don't have any solutions, But I want to Put My Data on registers and Want to calculate matrix multiplication over there.
Hope I will get some solutions this time.
Thanks and Regards,
Arun