c6747 External Memory Touch Program

Harikrishna Vuppaladhadiam

Expert 1855 points

I have tested memory touch program for external memory with cache enabled. I have following results.

Questions:

1) When do we use touch function? How can we optimize on the access times?

2) Is direct access of external memory (with Cache enabled) is better than touch before access?

3) The results here show that sum of Touch+Access is greater than without touch? Any reason behind it?

4) Also Read and Write access times are different? Why is this so?

5) Can I take these no.of cycles (Optimization in the build option is 0)? If I relate them to access then 26667/8K bytes = approximately 3 cycles per byte? And similarly read less than 3 cycles per byte? How do we interpret these results?

With Touch		Without Touch
SDRAM:Read		SDRAM:Read
Iteration	Cycles	Iteration	Cycles
touch	9,030
1	26,667	1	33,223
2	26,667	2	26,667
3	26,667	3	26,667
touch	140



SDRAM:Write		SDRAM:Write
Iteration	Cycles	Iteration	Cycles
touch	9,073
1	22,577	1	27,282
2	22,575	2	22,575
3	22,575	3	22,575
touch	140

Cache Settings

L1D = 16K

L1P = 16K

L2D=128K

Reference L1D access for the same function:

L1D:Read
Iteration
1	26,667
2	26,667
3	26,667

#define SIZE_OF_ARR (1024*8)

#pragma DATA_ALIGN(Externbuf,256)

#pragma DATA_SECTION(Externbuf, ".DDRData:Externbuf")

char Externbuf[SIZE_OF_ARR];

#pragma CODE_SECTION(testWrite, ".L1Code:testWrite")

void testWrite(char *pBuf, int len)

{

for (i=0;i<len2;i++)

{

ptr[i]=val;

}

#pragma CODE_SECTION(testRead, ".L1Code:testRead")

int testRead(char *pBuf, int len)

{

for (i=0;i<len2;i++)

{

sum = ptr[i];

}

return(sum);

}

test()

{

BCACHE_wbInvAll(); // 8011 cycles

BCACHE_inv(Externbuf, SIZE_OF_ARR, TRUE); // 2811 cycles for 8K

if(touchenable) touch(Externbuf,SIZE_OF_ARR);

testRead(Externbuf,SIZE_OF_ARR);

if(touchenable) touch(Externbuf,SIZE_OF_ARR);

BCACHE_wbInvAll(); // 8011 cycles

BCACHE_inv(Externbuf, SIZE_OF_ARR, TRUE); // 2811 cycles for 8K

if(touchenable) touch(Externbuf,SIZE_OF_ARR);

testWrite(Externbuf,SIZE_OF_ARR);

if(touchenable) touch(Externbuf,SIZE_OF_ARR);

}

over 14 years ago

0 jc-ti over 14 years ago

TI__Mastermind 27585 points

Can you give more details on what a "touch memory program" is and does? What is the functionality of the touch() function?

Jeff

0 Harikrishna Vuppaladhadiam over 14 years ago in reply to jc-ti

Expert 1855 points

It is taken from one of the TI documents on cache.

.global _touch
.sect ".text"

_touch:
B .S2 loop ; Pipe up the loop
|| MVK .S1 128, A2 ; Step by two cache lines
|| ADDAW .D2 B4, 31, B4 ; Round up # of iters

B .S2 loop ; Pipe up the loop
|| CLR .S1 A4, 0, 6, A4 ; Align to cache line
|| MV .L2X A4, B0 ; Twin the pointer

B .S1 loop ; Pipe up the loop
|| CLR .S2 B0, 0, 6, B0 ; Align to cache line
|| MV .L2X A2, B2 ; Twin the stepping constant

B .S2 loop ; Pipe up the loop
|| SHR .S1X B4, 7, A1 ; Divide by 128 bytes
|| ADDAW .D2 B0, 17, B0 ; Offset by one line + one word

[A1] BDEC .S1 loop, A1 ; Step by 128s through array
|| [A1] LDBU .D1T1 *A4++[A2], A3 ; Load from [128*i + 0]
|| [A1] LDBU .D2T2 *B0++[B2], B4 ; Load from [128*i + 68]
|| SUB .L1 A1, 7, A0

loop:
[A0] BDEC .S1 loop, A0 ; Step by 128s through array
|| [A1] LDBU .D1T1 *A4++[A2], A3 ; Load from [128*i + 0]
|| [A1] LDBU .D2T2 *B0++[B2], B4 ; Load from [128*i + 68]
|| [A1] SUB .L1 A1, 1, A1
BNOP .S2 B3, 5 ; Return
.end

0 kcastille over 14 years ago in reply to Harikrishna Vuppaladhadiam

TI__Guru 55427 points

Harikrishna,

Referencing section 3.1.2 of http://www.ti.com/lit/ug/sprug82a/sprug82a.pdf:

In the 2 level DSP cache system, the L1D->L2 interface supports "pipelining of read misses". However, the L2->External interface does not. This is the reason you're not seeing a marked advantage between the two software implementations (with and without touch loop).

In order to speed up the benchmark, you should enable compiler optimization (via the -o3 flag). The touch loop is attempting to bring the relevant data into the L1D level. At that point, the DSPshould be able to perform 2 loads per cycle as long as the accesses hit in L1D.

In addition, to see a real benefit from the touch loop, you may try to manually copy the data into L2 SRAM via the EDMA. This will allow the touch loop to pipeline accesses between the L1D and L2 SRAM.

Regards
Kyle

Processors

Processors forum

c6747 External Memory Touch Program