This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Instruction QMACL P,loc32,*XAR7++

Hi,

I tested the code below:

          MOVL  XAR2,#_buf1

          MOVL  XAR7,#_buf2

          SPM -5

          ZAPA

          RPT #31

         || QMACL P,*XAR2++,*XAR7++

         ADDL ACC, P<<PM

         This code takes 71 cycles when XAR2 = 0x90C0 and XAR7 = 0x9100

          end takes only 39 cycles when XAR=0x90C0 and XAR7 = 0x8400

          Why is that?

          What is the general rules to achieve single cycle for QMACL P,loc32,*XAR7++?

          

           

         

  

         
 

         

  • Ari Mendes dos Santos said:

             This code takes 71 cycles when XAR2 = 0x90C0 and XAR7 = 0x9100

              end takes only 39 cycles when XAR=0x90C0 and XAR7 = 0x8400

    Pipeline stalls.  The RAM on 28x devices is single access RAM (SARAM).  The instruction is attempting to read two values - one via XAR2 in data space and one via XAR7 in program space.  The memory will stall the pipeline so that only one read takes place at a time.  When you move one of the inputs to another physical block, you remove this stall.

    Ari Mendes dos Santos said:
     What is the general rules to achieve single cycle for QMACL P,loc32,*XAR7++?

    The opcode itself (program execution) and each operand need to all three be in separate physical memory blocks.

    -Lori

  • One correction - since this is a RPT instruction, the opcode is only fetched once, so having code execute in a separate block is not so critical.

  • Lori,

     What define a block? 

     Ex. 0x8000 to 0x8FFF is one block and 0x9000 to 0x9FFF is another block?

     Only the instructions that load two values from memory have this issue?

     Ari

  • Ari Mendes dos Santos said:

    Lori,

     What define a block? 

     Ex. 0x8000 to 0x8FFF is one block and 0x9000 to 0x9FFF is another block?

     Only the instructions that load two values from memory have this issue?

     Ari

    I don't know which device you are using, so lets take 28035 as an example.  The memory map is in the data manual.  www.ti.com/lit/SPRS584 for the 2803x family of devices.

    Each of the blocks shown in the memory map are physically different blocks (L0 is a physical block on the device, L1 is another physical block of SARAM etc..)

    The memory map of the device you are using may be different.

    -Lori

  • Lori,

          So to get the maximum performance of the F28035 the code should run from L0 SARAM, data should be alocated in the L1 DPSARAM and constants alocated in the L2 DPSARAM? 

            Is there risk of pipelane stall if allocate code and data in the L0 SARAM? Ex. Code (ramfuncs) from 0x8000 to 0x8400 and data (.ebss section) from 0x8400 to 0x8800. 

            Ari.

       

             

     

  • Ari Mendes dos Santos said:

          So to get the maximum performance of the F28035 the code should run from L0 SARAM, data should be alocated in the L1 DPSARAM and constants alocated in the L2 DPSARAM? 

            Is there risk of pipelane stall if allocate code and data in the L0 SARAM? Ex. Code (ramfuncs) from 0x8000 to 0x8400 and data (.ebss section) from 0x8400 to 0x8800. 

    The general guideline is if code accesses (read or write) data (in the form of constants, variables, etc) then there will likely be a pipeline stall if they are within the same physical memory block.  Ideally you would fetch a new instruction every cycle.  If a previous instruction enters the Read or Write stage and is trying to Read or Write to the same physical memory block as the fetch (opcode), the fetch will be stalled. 

    Take a look at the pipeline information in www.ti.com/lit/spru430 for more information.

    If the data is unrelated to the code, (ie the code doesn't access that data) then there is no stall. 

    -Lori