This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3358: PRU constant table entry McASP0 DMA

Part Number: AM3358
Other Parts Discussed in Thread: PRU-CGT

I am playing around with the some PRU code on the AM3358. I am using the PRU Software Support Package as a starting point. According to the system memory map, TRM Section 2.1, McASP0 is exposed at two different memory addresses. Additionally the PRU dedicates a constant table entry for MCASP0 in TRM Section 4.4.1.1 which points to the L3 data port. I can define the following in code:

volatile far uint8_t CT_MCASP0 __attribute__((cregister("MCASP0_DMA", near), peripheral));
#define MCASP0_CT  (*((volatile uint32_t *) &CT_MCASP0))
#define MCASP0_DAT (*((volatile uint32_t *) 0x46000000))
#define MCASP0_CFG (*((volatile uint32_t *) 0x48038000))

The McASP Rev register is at offset zero, so reading from any of the above defines should return the same known value. However I am seeing MCASP0_CT and MCASP0_DAT always return zero. Only MCASP0_CFG returns the correct result. I do have the OCP master port enabled.

CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;

Is there a reason why accessing McASP0 through the data port and the constant table would fail? Does the PRU not actually have access to the McASP through the constants table?

  • What software is running on the ARM? Have you ensured that McASP0 is not used from ARM side?
  • I am running Linux on the ARM side. McASP0 is not being used by the ARM core. Since the McASP REV register is read only shouldn't the PRU read the correct value regardless of ARM core activity?

  • Hello Andrew,

    The TRM section "22.3.10.1 Data Transmission and Reception" makes it sound like the data port is not intended to be used for configuration, so perhaps it only allows access to a limited register set. If that's the case, it would make sense that MCASP0_DAT does not yield the values in the REV register, while the MCASP0_CFG does. Can you access XBUF and RBUF registers through the data port?

    Let me know if you need additional clarification on anything and I'll do some poking around on this end.

    Regards,
    Nick
  • That seems like a reasonable explanation and I will give it a shot.

    If that is the case however I would question the typical PRU linker command file from the PRU Support Package. The linker command from the constants table example, examples/am335x/PRU_access_const_table/AM335x_PRU.cmd, defines the following:

    MCASP0_DMA : org = 0x46000000 len = 0x00000100 CREGISTER=8

    To me that would imply that the McASP constant table entry is setup to use the configuration registers. Pretty much all of the McASP configuration registers are at offsets below 0x100. With the length set to 0x100 you can't even reach the buffers given that XBUF and RBUF are at offsets 0x200 and 0x280 right?
  • Hello Andrew,

    That is a fantastic point. I am not sure were the length value of 0x100 came from, and I have not tested to see what happens when you try to access a location beyond the length. I will look into why the linker command file was written the way it was and get back to you.

    Regards,
    Nick
  • Hello Andrew,

    Here's what I've found so far:
    It looks like the LBCO assembly instruction allows for an offset from an immediate value <256 OR from a register.

    The first generation of PRU programmers I talked to only used immediate values <256 for the offset from the CREGISTER address, so any register set that was longer than 0x100 (like McASP0_DMA) was just listed as len=0x100 in the linker command file.

    The next generation of programmers went through and changed the linker command file len values to the actual length of the register set - that's why your linker command file has CREGISTER entries like PRU_IEP len=0x31C. It looks like they just forgot to give MCASP0_DMA its actual length.

    So what's the difference between len<=0x100 and len>0x100? I was told in your line
    volatile far uint8_t CT_MCASP0 __attribute__((cregister("MCASP0_DMA", near), peripheral));
    the second argument passed into cregister should be "near" if len<=0x100, and "far" if len>0x100. You can see this difference in pru-icss-5.1.0/include/am335x/pru_ecap.h and pru_iep.h. At the bottom of the file, pru_ecap.h uses near, while pru_iep uses far.


    I have not been able to test the compiler's behavior quite yet, so I'm not sure what the PRU C compiler does with "near" vs "far" - e.g., toggle between LBCO and a different instruction, toggle between an immediate value offset and a offset stored in a register, etc. I'm also not sure what happens if your C code tries to access a register location with an offset of greater than 0x100, but the declaration is near, with len=0x100.

    Regards,
    Nick

  • Nick,

    Thank you for the detailed response. Based on your reply I attempted a little experiment.

    Modify the linker to:

    MCASP0_DMA : org = 0x46000000 len = 0x000001010 CREGISTER=8

    Define the following in my source code:

    volatile far uint8_t CT_MCASP0 __attribute__((cregister("MCASP0_DMA", far), peripheral));
    #define RBUF_OFFSET (0x280 / 4)
    #define MCASP0_CT_RBUF (*(((volatile uint32_t *) &CT_MCASP0) + RBUF_OFFSET))

    I then assign some variable to equal MCASP0_CT_RBUF. Looking at the assembly generated by the compiler I see:

    LDI32     r0, ||CT_MCASP0||     ; [ALU_PRU] |723| CT_MCASP0
    LDI       r1.w0, 0x0280         ; [ALU_PRU] |723|
    LBBO      &r14, r0, r1.w0, 4    ; [ALU_PRU] |723|

    Based on that I am thinking the constant table might be a no go for McASP0 when using the pru-cgt. I am sure you could inline some assembly if you absolutely needed it. For my application the cycle count difference isn't going to matter. Thanks again Nick. I do appreciate your help in understanding what I was seeing.

  • Note: Updated 5/30/2018 in accordance with comments here

    Hello Andrew,

    Actually, it looks like the constant table should work for you! 

    Summary: Create a structure like the templates in include/am335x/, and pass "near" into the cregister function rather than "far". I expect this to work as long as the offset from the constant table entry is 16 bits or less(updated 5/30/2018) if len<=0x100, pass "near" into the cregister function. if len>0x100, pass "far" into the cregister function.

    Here's my test code. Tests assume linker command file has updated the MCASP0_DMA len value to 0x1010:

    /* Test code by Nick Saulnier May 18 2018 */
    /* update 5/30/2018: test 1 is appropriate for len > 0x100, test 2 is appropriate for len<=0x100 */ /* testing CREGISTER C compiler behavior */ /* tested on MCSPI0 and MCASP0_DMA, but only MCASP0_DMA code shown here: */ /* these tests are for e2e post http://e2e.ti.com/support/arm/sitara_arm/f/791/t/687937 */
    /* */
    /* tests assume linker command file has updated the MCASP0_DMA len value to 0x1010 */ #include <stdint.h> #include <stdio.h> #include <pru_cfg.h> #include <pru_intc.h> #include <rsc_types.h> #include "resource_table_0.h" /* this didn't work, try defining struct like in header files instead */ /*volatile __far uint32_t CT_MCSPI0 __attribute__((cregister("MCSPI0",near),peripheral));*/ /*#define MCSPI0_CT (*((volatile uint32_t *) &CT_MCSPI0)) */ /* McASP0_DMA - Test 1 (use far instead of near) --------------- */ /* MCASP0_DMA register set */ typedef struct { /* REV register bit field */ union { volatile uint32_t REV; }; // 0x0 uint32_t rsvd4[159]; // 0x4 - 0x27C /* RBUF_0 register bit field */ union { volatile uint32_t RBUF_0; }; // 0x280 } McASP0; /* note use of "far" in test 1 */ /*volatile __far McASP0 CT_MCASP0_DMA __attribute__((cregister("MCASP0_DMA", far), peripheral));*/ /* McASP0_DMA - Test 2 (use near) ---------------------------------- */ /* note use of "near" in test 2 */ volatile __far McASP0 CT_MCASP0_DMA __attribute__((cregister("MCASP0_DMA", near), peripheral)); /* * main.c */ void main(void) { /* Allow OCP master port access by the PRU so the PRU can read external memories */ CT_CFG.SYSCFG_bit.STANDBY_INIT = 0; /* tests registers both less than and greater than 0x100 from the CREGISTER value */ uint32_t rev = CT_MCASP0_DMA.REV; uint32_t rbuf0 = CT_MCASP0_DMA.RBUF_0; }

    Here's the generated assembly (copied from two different assembly files):

    .global	__PRU_CREG_PRU_CFG
    	.global	__PRU_CREG_PRU_INTC
    	.global	__PRU_CREG_MCASP0_DMA
    	.weak	||CT_CFG||
    ||CT_CFG||:	.usect	".creg.PRU_CFG.noload.near",68,1
    	.weak	||CT_INTC||
    ||CT_INTC||:	.usect	".creg.PRU_INTC.noload.far",5380,1
    	.weak	||CT_MCASP0_DMA||
    ||CT_MCASP0_DMA||:	.usect	".creg.MCASP0_DMA.noload.far",644,1
    
    ||main||:
    ;* --------------------------------------------------------------------------*
    ;* r0_0  assigned to $O$C1
            LBCO      &r0, __PRU_CREG_PRU_CFG, $CSBREL(||CT_CFG||+4), 4 ; [ALU_PRU] |110| CT_CFG
            CLR       r0, r0, 0x00000004    ; [ALU_PRU] |110| 
            SBCO      &r0, __PRU_CREG_PRU_CFG, $CSBREL(||CT_CFG||+4), 4 ; [ALU_PRU] |110| CT_CFG
    
    ; these lines generated from test 1
            LDI32     r0, ||CT_MCASP0_DMA|| ; [ALU_PRU] |117| $O$C1,CT_MCASP0_DMA
            LBBO      &r1, r0, 0, 4         ; [ALU_PRU] |117| $O$C1
            LDI       r1.w0, 0x0280         ; [ALU_PRU] |118| 
            LBBO      &r0, r0, r1.w0, 4     ; [ALU_PRU] |118| $O$C1
    
    ; these lines generated from test 2
    ; Update 5/30/2018:
    ; note that $CSBREL(||CT_MCASP0_DMA||+640) will try to use an immediate value of 640
    ; if this doesn't cause compiler errors, it is expected to cause linker errors
    LBCO &r0, __PRU_CREG_MCASP0_DMA, $CSBREL(||CT_MCASP0_DMA||+0), 4 ; [ALU_PRU] |114| CT_MCASP0_DMA LBCO &r0, __PRU_CREG_MCASP0_DMA, $CSBREL(||CT_MCASP0_DMA||+640), 4 ; [ALU_PRU] |115| CT_MCASP0_DMA JMP r3.w2 ; [ALU_PRU]

    Note that using "near" will use LBCO, while using "far" will result in LDI32, then LBBO (or LDI, then LBBO). (updated 5/30/2018) Note that different compile options (-o0, -o2, etc) may result in different assembly instructions generated for test 1. LBCO should result in a savings of several clock cycles per call, but I have not stepped through the code in CCS yet to verify. (update 5/30/2018: still haven't tested this yet) If true, I need to update our header files in the SW Support Package from using "far" to using "near" to actually take advantage of the constant table cycle savings.

    Thanks for bringing this up Andrew!

    Regards, 

    Nick

  • This is working for me. Thanks Nick!

  • Just as a follow up. I wasn't seeing this earlier, but once I started integrating the rest of my code I now get the following compiler error:

    | "pru-mcasp.c", line 206: error #17003-D: relocation from function "mcasp_read"
    |    to symbol "CT_MCASP0_DMA" overflowed; the 10-bit relocated address 0x280 is
    |    too large to encode in the 8-bit unsigned field (type =
    |    'R_PRU_FRDSO_U8_C32_So16s8_P0XFFFFFFFF' (17), file = "gen/pru-mcasp.object",
    |    offset = 0x00000008, section = ".text:mcasp_read")

    I am guessing this error is the result of the compiler trying to use LBCO with an immediate 8-bit value instead of a register offset.

  • Hmm, that's interesting. I have not seen that on my end yet, I'll dig into it a bit more. Your linker command file did not get replaced with one that has len=0x100, right? Any other potential clues you noticed?

    Regards,
    Nick
  • I have the linker set as follows:

    MCASP0_DMA		: org = 0x46000000 len = 0x00001010	CREGISTER=8

    Constant table access to RBUF_0 is setup like this:

    typedef struct {
    	uint32_t rsvd[160];
    	union {
    		volatile uint32_t RBUF_0;
    	};	// 0x280
    } McASP0;
    
    volatile __far McASP0 CT_MCASP0_DMA __attribute__((cregister("MCASP0_DMA", near), peripheral));

    The only oddity I can think of is that I had to disable optimization on some functions as the compiler was reordering writes to volatile memory locations. It kind of wrecked my McASP initialization sequence. I thought reordering memory access was frowned on but it may be perfectly legal. I'm not up on all of the standards. Regardless I am using a few of the following pragma statements to get around reordered register writes.

    #pragma FUNCTION_OPTIONS (function_name, "--opt_level=0")

    I don't think this is causing the error though as I get the same behavior when these statements are commented out.

  • I had a chance to play around with this a little more. I can reliably reproduce the error when compiling multiple source files and CT_MCASP0_DMA is used in the file that does not contain main(). The error does not appear when using CT_MCASP0_DMA in the file that contains main().

  • Sorry to keep adding to this thread. But I was looking at the post I marked as the solution again. Isn't the generated assembly from your test case 2 still wrong?

    According to the PRU Assembly Instruction User Guide, LBCO uses either an 8-bit immediate count or a 16-bit register count. Using format 1 with an immediate count should be invalid since 0x280 > 0xFF. This is essentially what the compiler produced for your test case 2. This is also the instruction produced in my code that the linker is complaining about.

    LBCO      &r2, c8, 0x280, 4

    To use the constant table with an offset greater than 0xff but less than 0x10000 I would expect the compiler to use the LBCO instruction format 2 which uses a register count.

    LDI       r1.w0, 0x0280
    LBCO      &r2, c8, r1.w0, 4

    This should still be faster than the code generated by your test case 1.

    LDI32     r0, ||CT_MCASP0_DMA|| ; [ALU_PRU] |117| $O$C1,CT_MCASP0_DMA
    LBBO      &r1, r0, 0, 4         ; [ALU_PRU] |117| $O$C1
    LDI       r1.w0, 0x0280         ; [ALU_PRU] |118| 
    LBBO      &r0, r0, r1.w0, 4     ; [ALU_PRU] |118| $O$C1

  • Hello Andrew,

    Thank you for keeping us updated! I suppose one part of the puzzle is that I'm not sure what $CSBREL() is doing in my generated code (if it is results in an immediate value rather than the value in a register, I would certainly expect it to give errors during compile time).

    LBCO      &r0, __PRU_CREG_MCASP0_DMA, $CSBREL(||CT_MCASP0_DMA||+640), 4

    I have passed your debug notes on to the compiler team. I will let you know when I learn anything knew.

    Regards, 

    Nick

  • Hello Andrew,

    I talked with the compiler team, and my hypothesis earlier in the thread was wrong: if the cregister region is greater than 255, "far" must be used. If the cregister region is less than 255, "near" should be used. I will update the previous post appropriately. 

    You are right: it looks like if we use "near" with a cregister that has a register offset > 255, the compiler might not give an error when it generates LBCO. However, this would cause linker issues later on.

    Thank you for helping to run this to ground!

    Regards, 

    Nick