I have an image in ddr2 captured by the video port where each row is much longer than each column. If I perform image processing by row, there is good potential for cache thrashing since I can only fit a few long rows in L1D cache. However if I switch to processing by column, I can fit many more columns in cache and could make much more efficient reuse of memory. I don't believe that I can use the video port to swap the x & y dimensions as the image is being captured... although by all means correct me if I'm wrong. I do know that I can use the edma to swap the array dimension as I move the data from ddr2 to L2 sram, which I can then cache into L1D for processing.
I am blissfully ignorant of the details of ddr2 operation, although I'm pretty sure that sequential address transfers are more efficient. Can anyone shed some light on the 'optimum' access pattern to ddr2? For example, if I transfer 1 byte from each row at a time what kind of overhead am I going to pay? Is there a magic number of sequential byte accesses for maximum bandwidth? Any answers or places to look for answers would be appreciated. I can't seem to find a definite answer in the data sheets I have been perusing.