This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28377D: Not enough LSxRAM for CLA tasks

Part Number: TMS320F28377D

Hello

I would like to use CLA tasks to implement jobs in parallel with main C28x core, but I'm facing one problem: I don't have enough RAM space for program. Is there any solution for this, for instance, some kind of dynamic memory assignment for CLA instructions...

Thank you 

Maite

  • Hi Maite,

    CLA has access to LSxRAM only but CPU has other RAMs. I hope you are not sharing the LSxRAM with CPU also?

    Regards,

    Vivek Singh

  • Hi Vivek,

    I'm using RAMLS0 to RAMLS4 for CLA program  and RAMLS5 for CLA data, CPU is not using this memory. I've tried to optimize my code but still it does not fit in RAM space.

    Thanks

    Maite

  • If you are writing C code, what compiler switch you are using for optimization? Also how much more RAM you are looking for? If it's small data RAM then you could use CLAtoCPU MSG RAM if that's not being used already in your application. 

    Regards,

    Vivek Singh 

  • Hi Vivek,

    it's much more RAM than the space of CLAtoCPU MSG RAM. I've tried with all levels of optimization but still does not fit. I've checked that the same code compiled for CPU1 needs around 3 times less RAM than compiled for CLA. I've also checked in assembly that CLA compiler adds a lot of MNOPs instructions.

    What I'm trying to do is to load from flash to RAM not all the code since depending on my final device setup, not all the code will be used. I will use in .cmd a UNION to assign the same RAM memory for two different CLA program sections. I think it could work...

    Thank you

    Maite

  • Hi Vivek

    loading from flash to RAM only some parts of the code using UNION in .cmd file is working. But now I'm facing another problem. The same code executed in CPU1 or in CLA1, in CLA lasts around 1.5 times than CPU execution time. As I told you in a previous post, CLA .asm file has a lot of MNOP operations... could it be the reason?

    I also see in .asm files that CLA uses CPU-FPU (in fact, it is a floating point unit) while CPU1 uses (CPU-ALU)

    This is an example of CLA .asm file lines:

    	.dwpsn	file "../AFQ_CLA_funcs.cla",line 206,column 5,is_stmt,isa 0
            MMOVZ16   MR0,@_adapt+3         ; [CPU_FPU] |206| 
            MMOVIZ    MR1,#65535            ; [CPU_FPU] |206| 
            MLSL32    MR0,#16               ; [CPU_FPU] |206| 
            MMOVXI    MR1,#65535            ; [CPU_FPU] |206| 
            MASR32    MR0,#16               ; [CPU_FPU] |206| 
            MCMP32    MR1,MR0               ; [CPU_FPU] |206| 
            MNOP      ; [CPU_FPU] 
            MNOP      ; [CPU_FPU] 
            MNOP      ; [CPU_FPU] 
            MBCNDD    $C$L239,NEQ           ; [CPU_FPU] |206| 
            MNOP      ; [CPU_FPU] 
            MNOP      ; [CPU_FPU] 
            MNOP      ; [CPU_FPU] 

    The same in CPU is:

        MOVW      DP,#_adapt+3           ; [CPU_ARAU] 
            CMP       @_adapt+3,#-1       ; [CPU_ALU] |649| 
            B         $C$L4,NEQ                ; [CPU_ALU] |649| 
    

    Is there some solution for this? Are this MNOP needed?

    Thank you

  • Hi,

    Sorry for late reply. Yes, MNOP will take addition cycle hence execution will take more time and MNOP may be needed for correct operation.

    Regards,

    Vivek Singh