Hi,
I have a code segment like below:
void test_func(uint rows, fixed mat_data[], uint mat_idx[], uint mptr[], mat *vec, mat *dest) { uint i,j; for(i = 0; i < rows; i ++) { for(j = mptr[i]; j < mptr[i + 1]; j ++) { fixed *ptr1 = &mat_data[j]; uint *ptr2 = &mat_idx[j]; fixed w = F_ADD(MAT_GET(dest, i), F_MUL( (signed short)__data20_read_short((unsigned long int)ptr1), // read from FRAM MAT_GET(vec, (uint)__data20_read_short((unsigned long int)ptr2)) // read from FRAM and FRAM2 ) ); MAT_SET(dest, w, i); // write back to FRAM } } }
The following variables lie in FRAM/SRAM as follows:
- "dest" : lower FRAM
- "mat_data", "mat_idx" : FRAM2 (after the 0x10000 boundary), hence need to use __data20_read_short() intrinsic for access
- rest of the variables are in SRAM
The above section of code works as intended but is quite slow. I suspect this is because of multiple read/writes to/from FRAM/FRAM2.
I read in slaa498b.pdf that the DMA can be used to make the data access faster. In this scenario - would it help ? do i need to use assembly code (as given in the document) to get the DMA working or is there any sample C-code I can have a look at ?
Also, would it be faster if I first performed a pre-buffering of a chunk of data from FRAM->SRAM and worked off from SRAM directly in a micro-batch fashion ? or would the overhead outweigh directly working off FRAM.
As you may have guessed the F_MUL and F_ADD performs multiplication and addition, so probably worth while moving these operations to LEA, but that FRAM access bottleneck would still be there.
Are there any other ways I can make the FRAM rd/wr access faster ?
Many thanks
Rosh