Does anyone have any metrics on when to use DMA to do a copy verses just using a memcpy? I guess I am asking how large of a buffer is needed before DMA becomes more efficient? We can assume that the data is aligned on 32 bit boundaries and is whole words (32 bit). With DMA several registers will need programmed, then some polling for when the transfer is complete will add to the overhead.