L2 of ARM A8 as SRAM issue

Joey Lin

Intellectual 670 points

Hi, supporters:

When processing images on A8, I may need to use on-chip memory (L2) to speed up the performance.

Could A8 512K L2 be used as a SDRAM for DMA+PingPong Processing just like in dsp?

Is there register or memory map to do the job?

over 11 years ago

0 Pavel Botev over 11 years ago

TI__Guru**** 170625 points

Joey,

Joey Lin said:
Could A8 512K L2 be used as a SDRAM for DMA+PingPong Processing just like in dsp?

No, I do not think you can use the Cortex-A8 L2 cache as RAM for DMA transfer. The DSP L2 is stated as cache and/or RAM and is mapped at start address 0x40800000/0x00800000, while Cortex-A8 L2 is stated as just cache and no start address is available in the Memory Map.

See also this E2E thread:

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/243883.aspx

There is RAM inside the Cortex-A8, mapped to start address 0x402F0400, but I think you can not use it for a DMA transfer as this RAM is small in size (64KB) and stated as internal for the Cortex-A8 (only accessible by the Cortex-A8).

Joey Lin said:
When processing images on A8, I may need to use on-chip memory (L2) to speed up the performance.

You can try with the OCMC L3 SRAM (128KB) mapped at start address 0x40300000.

Regards,
Pavel

0 Matthijs van Duin over 11 years ago

Mastermind 8040 points

Joey Lin said:

When processing images on A8, I may need to use on-chip memory (L2) to speed up the performance.

Could A8 512K L2 be used as a SDRAM for DMA+PingPong Processing just like in dsp?

Certainly. You can use L2 Cache Lockdown to effectively turn part of L2 cache into local RAM (in steps of 1/8-th of the total L2 cache), and then use the PreLoad Engine (PLE) to move data in/out of cache in the background while you process the previously loaded data in parallel.

More information can be found in the ARM Cortex-A8 Technical Reference Manual, specifically section 3.2.54 for the details of cache lockdown, section 8.4 for an overview of the preload engine and sections 3.2.59-3.2.67 for its details. Useful background info is also in the ARMv7-A/R Architecture Reference Manual, for example section B.2.2 on caches in general.

0 Joey Lin over 11 years ago in reply to Matthijs van Duin

Intellectual 670 points

Hi, Matthijs:

Thank you for your information. I have read the ARM Cortex-A8 Technical Reference Manual (DDI0344K_cortex_a8_r3p2_trm.pdf) for quite sometimes.

I am currently stuck on modifying CP15 register for configuring PLE. It seems to me that I need to write assembly code to achieve this. Do you think I need to start with ARM assembly tutorial or I can resolve this issue without knowing it?

Thank you very much,

Joey from Altek

0 Matthijs van Duin over 11 years ago in reply to Joey Lin

Mastermind 8040 points

You don't really need to learn ARM assembly for this in any detail, as the TRM explicitly shows the instruction needed, which you can use in GCC inline assembly. Some (untested) examples:

// get bitmap of channels running
u32 running;
asm( "mrc p15, 0, %0, c11, c0, 2" : "=r" (running) );

// get and set current channel
u32 channel;
asm( "mrc p15, 0, %0, c11, c2, 0" : "=r" (channel) );
asm( "mcr p15, 0, %0, c11, c2, 0" : : "r" (channel) );

// start engine
asm( "mcr p15, 0, %0, c11, c3, 1" : : "r" (0) );

I personally often use the clang compiler, which has intrinsics for mrc and mcr which, using a tiny wrapper class, allow me to make coprocessor registers accessible as if they were global variables:

template< uint p, uint n, uint op1, uint m, uint op2 >
class cp {
public:
        cp() {}
        operator uint () {
                return __builtin_arm_mrc( p, op1, n, m, op2 );
        }
        uint operator = ( uint val ) {
                __builtin_arm_mcr( p, op1, val, n, m, op2 );
                return val;
        }
        void operator |= ( uint val ) { *this = *this | val; }
        void operator &= ( uint val ) { *this = *this & val; }
        void operator ^= ( uint val ) { *this = *this ^ val; }
};

static cp<15,11,0, 0,0> ple_present;    //r-
static cp<15,11,0, 0,2> ple_running;    //r-
static cp<15,11,0, 0,3> ple_stopping;   //r-
static cp<15,11,0, 1,0> ple_useraccess; //rw
static cp<15,11,0, 2,0> ple_select;     //rw
static cp<15,11,0, 3,0> ple_stop;       //-w
static cp<15,11,0, 3,1> ple_start;      //-w
static cp<15,11,0, 3,2> ple_clear;      //-w
static cp<15,11,0, 4,0> ple_control;    //rw
static cp<15,11,0, 5,0> ple_vaddr;      //rw
static cp<15,11,0, 7,0> ple_size;       //rw
static cp<15,11,0, 8,0> ple_status;     //rw
static cp<15,11,0,15,0> ple_context;    //rw

0 Joey Lin over 11 years ago in reply to Matthijs van Duin

Intellectual 670 points

Hi, Matthijs:

Thank you for replying. It is kinda neat to use class wrapper, but unfortunately I have not used C++ for quite some time. Currently I am coding api in C for each operation. I am checking now for how to convert virtual address of symbol in C to assembly r0 as below for setting the start address. Could you provide your version for reference?

int func( int start_add, int byte_count,..){

asm(" LDR r0, =start_add ); <== Load virtual address to r0?

asm(" MCR p15, #0, r0, c11, c5, #0 ; Write PLE Internal Start Address Register");

...

}

Best regards,

Joey from Altek

0 Matthijs van Duin over 11 years ago in reply to Joey Lin

Mastermind 8040 points

Look more closely at my examples: the value being written to a coprocessor register is a C expression, and likewise when reading a coprocessor register you simply name the variable where you want the result to end up. The "%0" in the assembly instruction will be replaced with the register which the compiler allocated for the argument.

So you don't need any other assembly instructions than the mrc and mcr, just

asm( "mcr p15, 0, %0, c11, c5, 0" : : "r" (start_addr) );

Some more notes:

Mind the double colon when using mcr versus a single colon when using mrc. This is because the general format is: asm( "..." : output arguments : input arguments : other stuff affected );
The preload engine affects memory from the compiler's point of view, so you need to tell it this to prevent it from e.g. moving memory loads/stores across a PLE operation. You can do this by placing a "compiler barrier"
- right before starting an eviction (L2 -> memory), and
- after completion of a preload, before accessing the data.
The syntax for such a barrier is:
```
asm( "" : : : "memory" );  // compiler barrier
```
You can also mark an instruction itself as "affecting memory", but since in this case it's not really any single instruction which is affecting memory I think using a barrier is clearer.
A little detail about GCC inline asm which can be important to know: if an instruction has one or more outputs but you do not use any of them, the optimizer will think the instruction wasn't needed and is allowed to remove it as dead code. You can use asm volatile( ... ); to prevent this. If the instruction has no outputs then it is implicitly marked volatile.

0 Matthijs van Duin over 11 years ago in reply to Matthijs van Duin

Mastermind 8040 points

Note that all this is assuming you are using GCC. I have no experience with TI's own compiler for ARM.

0 Joey Lin over 11 years ago in reply to Matthijs van Duin

Intellectual 670 points

Hi, Matthijs:

Thank you very much for your note and explanation. Indeed the TI compiler does not accept

passing the C arguments to inline assembly just the way GCC does.

I would start a new thread for this issue.

Best regards,

Joey from Altek

Processors

Processors forum

L2 of ARM A8 as SRAM issue