This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6442: PRU accessing HW-Spinlocks with __xin __xout

Part Number: AM6442

Hello,

I try to access the PRU_ICSSG Spinlocks (of the AM6442 ) to protect a critical path that is shared between PRU 0 and PRU1.

The code I'm using right now is:

#define SPINLOCK_DEV 0x90  /* PRU XFR Device für Spinlocks */
#define BASE_REGISTER 1
#define USE_REMAPPING 0

static inline void spinlock_acquire(uint8_t lock_id) {
uint32_t obj = lock_id & 0x3FU;
do {
__xin(SPINLOCK_DEV, BASE_REGISTER, USE_REMAPPING, obj);
// result bit is in R1.b3
if (((obj >> 24) & 0x01) == 1) {
//acquired
return;
}
// not acquired
} while (1);
}

static inline void spinlock_release(uint8_t lock_id) {
uint32_t obj = (lock_id & 0x3F) << 24; // why do I have to shift here???
__xout(SPINLOCK_DEV, BASE_REGISTER, USE_REMAPPING, obj);
}

The question is alread in the code: I have no idea, why I need to shift to release the lock. If I don't shift, it doesn't work. Maybe I just misunderstand Table 6-98 (p632) in the TRM, but the comment states: "This assertion will clear the flag selected by R1.b0/own_req_vector".

Also I find it very difficult to find (C-) sample code for that topic.

 

Thank you for your help!

  • I created Create HW spinlock usage examples both assembly/C in open-pru/academy · Issue #120 · TexasInstruments/open-pru and added assembly based example.
    Summary

    • R1.b0 shall hold the spinlock ID (defined by firmware, you can select from 0 to 63 range), this is the input spinlock HW widget which snoops R1.b0 (lower 6 bits) to select spinlock instance
    • xin and xout shall use R1.b3 for acquiring and releasing (this explains why the modification is needed for your example to work)
  • thank you for your quick reply and the example code.

    My problem is the understanding of the uint32_t to the R1.bX mapping. I assume it is
    uint32_t val << 0 = b0, val << 8 = b1, val << 16 = b2 and val << 24 = b3.

    My code and your code acquires a spinlock by writing the lock id to b0. The status is returned in (val << 24) = b3. I can see that working with my code, although I don't find the part in the TRM, that states: you have to write lock_id to b0. I found: "The result of this arbitration is returned to
    R1.b3[0]" which does work -> the result can be found by val << 24 (=b3)

    To release the lock, I cannot write lock_id to b0, because this does not work. Instead, I write the lock_id to any other byte b1, b2 or b3. That works. I shift (at least) << 8.

    But the TRM says: "This assertion will clear the flag selected by R1.b0", which would mean:
    uint32_t value = (lock_id & 0x3F);    //  <- no shifting happens -> b0 (as in your example code)
    __xout(SPINLOCK_DEV, SPINLOCK_BASE_REGISTER, SPINLOCK_USE_REMAPPING, value);

    I tested that for long - it doesn't work.

    So either the assumption of which byte from uint32_t value goes to which byte in the register is different (at least for the release-operation), or the documentation (or maybe I have a complete wrong view of how that stuff works).

    You say: "xin and xout shall use R1.b3 for acquiring and releasing", but in your acquire example you write to b0:
    r1.b0 = SPINLOCK_ID

    And as hint: "Must be called after M_SPINLOCK_ACQUIRE". Where does this come from? My application releases all the locks as the initial state on startup - I do not check before, if it is set and I didn't find any problem with that...

    Thanks again

  • Hello Stefan,

    I am checking with Pratheesh about the note about calling M_SPINLOCK_RELEASE after M_SPINLOCK_ACQUIRE.

    To confirm: does Pratheesh's code work for you? 

    I need to dig more into the hardware design to interpret the information in the TRM, and potentially fix some wording. But if Pratheesh's code works, that is a good starting point for us.

    For future readers, you can see the full explanation here:
    https://github.com/TexasInstruments/open-pru/issues/120

    But the code summary is

    ; first, acquire the spinlock
    M_SPINLOCK_ACQUIRE .macro
    
    ; do NOT need to XOUT to R1.b0, just set R1.b0
    LDI R1.b0, SPINLOCK_ID ; can range from 0-63, fixed in PRU firmware
    
    ; XIN to request ownership of spinlock
    ; 0 means did not get ownership, so loop
    $1:
    XIN INT_SPIN_XID, &R1.b3, 1
    QBBC $1, R1.b3, 0
    .endm
    
    ; later, release the spinlock with XOUT
    M_SPINLOCK_RELEASE .macro
    XOUT INT_SPIN_XID, &R1.b3, 1
    .endm

    About the register.byte notation

    Yes, the register notation is [r1.b3][r1.b2][r1.b1][r1.b0]. More information in the OpenPRU's PRU Assembly Instruction Cheat Sheet:
    https://github.com/TexasInstruments/open-pru/blob/main/docs/PRU%20Assembly%20Instruction%20Cheat%20Sheet.md

    What about the TRM chapter on spinlock? 

    I do not understand the TRM documentation either. I will look into exactly what the hardware is doing and file a bug to update the TRM chapter.

    Outstanding questions so far:

    1) In the table, what is the meaning of Internal/External, Internal / External0 (broadside ID 0x91 - can PRU cores even call this?) / External1 (broadside ID 0x92, same question)

    2) How to use R1.b0? Write to it and otherwise don't touch it? XOUT R1.b0 to do something? If so, what does XOUT accomplish, since Spinlock already snoops it?

    3) What does "cause arbitration action" / fixed arbitration mean? How does this relate to "an arbitration event"?

    4) For releasing a spinlock (clearing), should you XOUT R1.b3, or R1.b0?

    5) To confirm: PRU cores should interact w/ broadside interface, things outside of the PRU subsystem should interact through the ICSSG_SPIN_LOCK0 / ICSSG_SPIN_LOCK1 registers? What about the other PRU subsystem, would it use an alternate broadside ID or read/write to the registers?

    Regards,

    Nick

  • Hello Nick,

    I've checked Pratheesh asm code, and I can confirm that it works.

    I wrote a small test-program that compares the assembler function with the C-functions:

    The assembler part is Pratheesh code:

        .global spinlock_acquire
    spinlock_acquire:
        MOV    R1.b0, R14.b0
    $1:
        XIN INT_SPIN_XID, &R1.b3, 1
        QBBC $1, R1.b3, 0
        JMP    R3.w2
    
        .global spinlock_release
    spinlock_release:
        MOV    R1.b0, R14.b0
        XOUT INT_SPIN_XID, &R1.b3, 1
        JMP    R3.w2

    and the C part:

    (sorry for putting a picture - code insertion didn't work)

    This is what I observe:

    Running the C-version works,if:
    * you do any (8, 16 or 24) shift operation at IMPORTANT LINE in the code
    * you do not have optimization on (eg -O2)

    If you compile with O2, both cores cannot acquire the lock (only DEBUG_PIN_SHIFT_C of each core toggles), even if you call spinlock_release (followed by a delay) initially.

    If you do not do the shift operation at IMPORTANT LINE, only one core acquires (and releases) the lock (switches DEBUG_PIN_SHIFT_B on and off) and never has to wait for it (never toggles its DEBUG_PIN_SHIFT_C). The other core cannot acquire the lock and continuously waits for it (toggles its DEBUG_PIN_SHIFT_C).

    Regards,

    Stefan

  • Hello Stefan,

    Thanks for the debug so far. Hopefully between the two of us we can figure out the expected behavior.

    Followup on your tests

    Could I get you to check the generated assembly code for the different test codes in C? That might help us figure out why you are seeing different behaviors with different optimization settings.

    One potential issue I can think of is that R1.b0 value gets snooped by the spinlock, regardless of XIN/XOUT (as far as we can tell). But I do not think the C compiler typically keeps track of register numbers - off the top of my head, I cannot remember if you can define R1 like you defined R30 and expect C to then leave that register value alone. Perhaps the C compiler is writing a different value to that register bit field, messing up the spinlock command.

    More resources on combining C and assembly is in the PRU Getting Started Labs here:
    https://dev.ti.com/tirex/explore/node?isTheia=false&node=A__AfG7HXFogfKtAWtxs36Cag__AM64-ACADEMY__WI1KRXP__LATEST 

    For guidance on how to keep the generated assembly code, refer to the PRU Getting Started Labs > Lab 3: Compiling.

    If using CCS: Advanced Options > Assembler options
    https://dev.ti.com/tirex/explore/node?isTheia=false&node=A__AUR8lo3Cik3JV1avCUKaGw__AM64-ACADEMY__WI1KRXP__LATEST

    If uisng makefiles: C compiler settings
    https://dev.ti.com/tirex/explore/node?isTheia=false&node=A__ARYD7uMfUFtdM7yNuJ9Tlg__AM64-ACADEMY__WI1KRXP__LATEST

    Updates from looking at the source 

    Still digging through everything here. In the TRM table, you can ignore the "External0 / External1" entries.

    Each PRU_ICSSG has 1 spinlock instance. It looks like "internal" is when you are accessing the spinlock within your own PRU subsystem, and "External0 / External1" probably refer to direct connections that allow PRU cores in one PRU_ICSSG instance to use the broadside interface to quickly access the spinlock in another PRU_ICSSG instance. It seems likely that this part of the docs was copy/pasted from AM65x, which has 3 PRU_ICSSG instances (hence 2 "external" spinlock instances). AM64x only has 2 PRU_ICSSG instances, so I wouldn't expect both to apply... though I am still tracing the signals.

    Regards,

    Nick

  • Additional updates: the most important thing is to preserve the value of R1.b0 

    Ok, from looking at the source code, it does NOT actually seem like the spinlock uses any data other than the spinlock instance that you want to be interacting with, which is stored in R1.b0. So whatever you do, leave that byte of data alone.

    For now, I have only investigated the usecase of PRU cores accessing the spinlock in their own PRU subsystem. Still figuring out how signals connect across different PRU subsystems.

    Steps to request a spinlock 

    1) set value of spinlock instance in R1.b0

    2) do an XIN to the 0x90. You need to provide the command with some place to store the result of the request (success/fail). So you could XIN to R1.b3 as documented above, but I would expect this to work with any combination: R1.b1, R2.b3, etc.

    3) verify whether you actually got the spinlock by inspecting the return value

    Steps to release a spinlock 

    1) set value of spinlock instance in R1.b0 (or just ensure that the value has not been modified)

    2) Do an XOUT to 0x90. The spinlock does not actually use any data from the XOUT, so it does not matter if you use R1.b3, R1.b1, etc, and it does not matter what values are in that byte of data.

    Outstanding questions 

    What does the connection look like between PRU subsystems?
    - Is there even an 0x91 or 0x92 connection on AM64x? If so, which one is connected?
    - which PRU cores are attached? all 6 cores at once? 4 cores at once, where you need to toggle a bit to select between PRU cores or TX_PRU cores?
    - if only 4 signals are output from ICSSGm 0x91/92, but there are 6 input signals coming into ICSSGn 0x91/92, are all of those signals actually live? 

    What happens if you do an XIN or XOUT with more than 1 byte width? e.g., is the XIN zero-padded for all widths, or would higher bits not be overwritten?

    Regards,

    Nick

  • Hey Nick,

    here are asm files (the output of --keep_asm), to diff. One with O2, and the other without optimization enabled.

    Thanks for checking!

    main_O2.asmmain_noO2.asm

  • Hello Stefan,

    Interpreting the generated ASM

    ok, several things to note here. First, there is no XOUT command in the generated file with optimization turned on. Either the C compiler erroniously thought that this code was not actually used, and optimized it out, or maybe something else happened with all the ifdef switches in that code. No XOUT == never releasing the spinlock.

    ok, so what should an example spinlock call from PRU C code look like?

    There are some challenges with using C code to interact with the spinlock. All the information I will discuss here is from the PRU Optimizing C Compiler User's Guide. (as of this response, v2.3 / rev C is the latest version of the doc)

    First, if you have any kind of optimization on, then the C compiler does not allow you to set specific registers to specific values (outside of __R30 & __R31). So the only way to directly write a value to R1.b0 in combination with compiler optimizations like -O2 would be with direct __asm() writes. We need to be VERY careful when modifying registers with __asm, since the C compiler does not keep track of which registers it is using/modifying and which registers an __asm call is using/modifying.

    Second, we need to make sure that the function does not get optimized out. I am surprised that I did not see a XOUT command or that function in the main_O2.asm file that you attached. A summary of experiments a future reader could run to make sure neither function gets optimized out (generated w/ AI, verify before trusting):

      For a static inline void function_name() { __xout(...); } at -O2:
    
      ┌──────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────┐
      │                         Goal                         │                    Mechanism from guide                    │
      ├──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
      │ Keep function alive (linker)                         │ #pragma RETAIN(function_name) or __attribute__((retain))   │
      ├──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
      │ Keep function alive (compiler, external entry point) │ #pragma FUNC_EXT_CALLED(function_name)                     │
      ├──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
      │ Keep function alive (used attribute)                 │ __attribute__((used))                                      │
      ├──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
      │ Reduce optimization for this function only           │ #pragma FUNCTION_OPTIONS(function_name, "--opt_level=0")   │
      ├──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
      │ Force inlining                                       │ #pragma FUNC_ALWAYS_INLINE(function_name)                  │
      ├──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
      │ Prevent inlining (keep as real symbol)               │ #pragma FUNC_CANNOT_INLINE(function_name) (removes inline) │
      ├──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
      │ Keep named register (R30/R31 only)                   │ volatile register unsigned int __R30;                      │
      ├──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
      │ Other registers                                      │ __asm(...) — never removed, but can be rearranged          │
      ├──────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
      │ Prevent __xout object accesses from removal          │ Declare the object volatile                                │
      └──────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────┘
    

    Using __xin / __xout could be one option, but in addition to the issues listed above, it looks like you cannot specify a specific Byte to pass to the XIN/XOUT command other than Rx.b0. So passing something like R1.b3 would not work, and the returned value from XIN would just overwrite the important R1.b0 value.

    I would probably solve this problem with assembly functions, as documented in the PRU Getting Started Labs here:open-pru/academy/getting_started_labs/c_and_assembly/solution/firmware at main · TexasInstruments/open-pru · GitHub

    /*
     * SPDX-License-Identifier: BSD-3-Clause
     * Copyright (C) 2026 Texas Instruments Incorporated - http://www.ti.com/
     */
    
    #include <stdint.h>
    
    /* Declaration of the external assembly functions (defined in spinlock.asm) */
    /*
     * NOTE: These functions are written to use BS_ID = 0x90, which is the spinlock
     * in the local PRU subsystem. A separate function would be needed for
     * accessing spinlocks in a different PRU subsystem.
     */
    uint8_t spinlock_acquire(uint8_t flag_id);
    void spinlock_release(uint8_t flag_id);
    
    /* Spinlock values */
    #define SPINLOCK_FLAG 11 /* Value 0-63 */
    
    void main(void)
    {
            uint8_t result = 0;
            while(1) {
                    /* loop until we get the spinlock */
                    result = 0;
                    do {
                            result = spinlock_acquire(SPINLOCK_FLAG);
                    } while (result != 1);
    
                    /* now release the spinlock */
                    spinlock_release(SPINLOCK_FLAG);
    
                    /*
                     * NOTE: Firmware should have >= 8 clocks after releasing the spinlock
                     * with XOUT, before attempting to acquire the spinlock with XIN
                     *
                     * FOLLOWUP QUESTIONS:
                     *  - is this before ANY initiator attempts to acquire the spinlock with XIN?
                     *    Or just if the SAME initiator attempts to acquire the spinlock with XIN?
                     *  - Does this warning apply ONLY if the core attempts to acquire the same
                     *    flag? Or if the core attempts to acquire ANY of the other 63 flags as well?
                     */
            }
    
            /* This program will not reach __halt because of the while loop */
            __halt();
    }
    

    ; SPDX-License-Identifier: BSD-3-Clause
    ; Copyright (C) 2026 Texas Instruments Incorporated - http://www.ti.com/
    
    ;******************************************************************************
    ; Build Configuration
    ;******************************************************************************
    
    ; Required for building .out with assembly file
        .retain
        .retainrefs
    
    ;******************************************************************************
    ; uint8_t spinlock_acquire(uint8_t flag_id);
    ;******************************************************************************
    
    ; .sect ".text:spinlock_acquire" places all code below the .sect directive into
    ; the .text section, grouped into a subsection named "spinlock_acquire"
        .sect       ".text:spinlock_acquire"
        .clink
        .global     spinlock_acquire
    
    spinlock_acquire:
    
    ;------------------------------------------------------------------------------
    ;   Function input arguments are stored in R14-R29.
    ;   flag_id is a uint8_t (8-bit), so it is stored in R14.b0.
    ;   The return value (uint8_t) is stored in R14.b0.
    ;
    ;   For more details about how function arguments are stored in registers,
    ;   reference the document "PRU Optimizing C/C+ Compiler User's Guide",
    ;   section "Function Structure and Calling Conventions"
    ;------------------------------------------------------------------------------
    
        ; 1) Copy flag_id from R14.b0 to R1.b0 for the spinlock hardware
        MOV         R1.b0, R14.b0
    
        ; 2) XIN: Request spinlock flag. The spinlock hardware writes the result
        ;    (0 = not acquired, 1 = acquired) into R14.b0.
        ;    NOTE: Uses BS_ID = 0x90 (spinlock in the local PRU subsystem).
        ;    A separate function would be needed for a different PRU subsystem.
        XIN         0x90, &R14.b0, 1
    
        ; Return from spinlock_acquire. Return value (lock status) is in R14.b0.
        JMP         r3.w2
    
    ;******************************************************************************
    ; void spinlock_release(uint8_t flag_id);
    ;******************************************************************************
    
        .sect       ".text:spinlock_release"
        .clink
        .global     spinlock_release
    
    spinlock_release:
    
    ;------------------------------------------------------------------------------
    ;   Function input arguments are stored in R14-R29.
    ;   flag_id is a uint8_t (8-bit), so it is stored in R14.b0.
    ;   No return value.
    ;
    ;   For more details about how function arguments are stored in registers,
    ;   reference the document "PRU Optimizing C/C+ Compiler User's Guide",
    ;   section "Function Structure and Calling Conventions"
    ;------------------------------------------------------------------------------
    
        ; 1) Copy flag_id from R14.b0 to R1.b0 for the spinlock hardware
        MOV         R1.b0, R14.b0
    
        ; 2) XOUT: Release the spinlock flag. The spinlock hardware looks at
        ;    R1.b0 for the flag ID when it receives an XOUT command. The register
        ;    used as the XOUT source (R14.b0) does not affect the release operation.
        ;    NOTE: Uses BS_ID = 0x90 (spinlock in the local PRU subsystem).
        ;    A separate function would be needed for a different PRU subsystem.
        XOUT        0x90, &R14.b0, 1
    
        ; Return from spinlock_release
        JMP         r3.w2
    

    Does this code work for you? Let me know if you have any feedback, and I will add both a C and an assembly version of the spinlock example in the future.

    Regards,

    Nick

  • Hello Nick,


    in order to check the functionality, I've changed your main a litte:

    volatile register uint32_t __R30;
    void main(void)
    {
            uint8_t result = 0;
            while(1) {
                    /* loop until we get the spinlock */
                    result = 0;
                    do {
                            __R30 ^= (1 << DEBUG_PIN_SHIFT_C);
                            result = spinlock_acquire(SPINLOCK_FLAG);
                    } while (result != 1);
    
                    __R30 |= (1 << DEBUG_PIN_SHIFT_B);
    
                    //different delay for each core
                    #if PRU0
                    __delay_cycles(1000);
                    #else
                    __delay_cycles(300);
                    #endif
                    __R30 &= ~(1 << DEBUG_PIN_SHIFT_B);
    
                    /* now release the spinlock */
                    spinlock_release(SPINLOCK_FLAG);
    

    to see it working on the logicanalyzer.

    The code then compiles and loads, but it doesn't work: the pins DEBUG_PIN_SHIFT_C are always toggling on both cores (both are waiting for spinlock_acquire).

    If I exchange your asm line:
    XIN         0x90, &R14.b0, 1

    with those two:
    XIN         0x90, &R1.b3, 1
    MOV         R14.b0, R1.b3

    everything is working as expected.
                    
    For completeness, I've attached the orignal test code from the picture, in order to see the #ifdef-paths more easy.

    #include "resource_table.h"         /*!< Resourcetable for SK_AM64 needed to tell the PRU which resources are used */
    #include "intc_map_0.h"
    
    // different cores use different outpins for debugging
    #ifdef PRU0
        #define DEBUG_PIN_SHIFT_B 4     // D2 
        #define DEBUG_PIN_SHIFT_C 11    // D4
    #else
        #define DEBUG_PIN_SHIFT_B 12    // D3
        #define DEBUG_PIN_SHIFT_C 9     // D5
    #endif
    
    #define SPINLOCK_DEV 0x90                       /*!< PRU XFR Device für Spinlocks */
    #define SPINLOCK_BASE_REGISTER 1                /*!< Base register to use */
    #define SPINLOCK_USE_REMAPPING 0                /*!< We do not use remapping */
    #define SPINLOCK_ID_COUNT 64                    /*!< How many spinlocks are possible */
    #define SPINLOCK_ID_MASK SPINLOCK_ID_COUNT - 1  /*!< Mask, for sanity checks of spinlock id */
    
    #define SPINLOCK_ID 11  // any Id: just for testing
    
    volatile register uint32_t __R30;
    
    // define what version of spinlock_acquire we use (asm or C)
    // #define USE_ASM
    
    #ifdef USE_ASM
    extern void spinlock_acquire(uint8_t lock_id);
    extern void spinlock_release(uint8_t lock_id);
    #else
    static inline void spinlock_acquire(uint8_t lock_id)
    {
        uint32_t obj = lock_id & SPINLOCK_ID_MASK;
        do {
            __xin(SPINLOCK_DEV, SPINLOCK_BASE_REGISTER, SPINLOCK_USE_REMAPPING, obj);
            if (((obj >> 24) & 0x01) == 1)
            {
                /* acquired */
                return;
            }
            __R30 ^= (1 << DEBUG_PIN_SHIFT_C);
            /* not acquired */
        } while (1);
    }
    
    static inline void spinlock_release(uint8_t lock_id)
    {
        uint32_t obj = (lock_id & SPINLOCK_ID_MASK) << 24;   // <- IMPORTANT LINE
        __xout(SPINLOCK_DEV, SPINLOCK_BASE_REGISTER, SPINLOCK_USE_REMAPPING, obj);
    }
    #endif
    
    int main()
    {
        while(1) {
            spinlock_acquire(SPINLOCK_ID);
    
            __R30 |= (1 << DEBUG_PIN_SHIFT_B);
    
            //different delay for each core
            #if PRU0
            __delay_cycles(1000);
            #else
            __delay_cycles(300);
            #endif
            __R30 &= ~(1 << DEBUG_PIN_SHIFT_B);
    
            spinlock_release(SPINLOCK_ID);
            // __delay_cycles(300);
            
        }
    }
    
    

    I've also rechecked the result when compiling for C (not the asm-codepath), with and without O2 set.
    I got the same result: no XOUT instruction in the optimized version found. The only xout is:
    $C$DW$8    .dwtag  DW_TAG_subprogram
        .dwattr $C$DW$8, DW_AT_name("__xout")
        .dwattr $C$DW$8, DW_AT_TI_symbol_name("__xout")


    Firmware should have >= 8 clocks before XIN, after XOUT:
    -> they didn't write this in the TRM for INT_SPIN_ID (our 0x90), they mentioned that only for EXT_SPIN_IDs (eg 0x91, 0x92)

  • Hello Stefan,

    Spinlock communication must use R1 and what that means

    Good debug, thanks for the feedback that using a register other than R1 to receive the spinlock XIN does not work.

    If the spinlock return status used the standard broadside interface path, we could have used any of the registers. But it looks like the return status signal is actually a separate side-channel. I do not have the expertise to trace exactly how it is consumed by the PRU cores in the source code, but your tests make it look like the signal is in some way tied to R1. In that case, I will just document the byte documented in the TRM, R1.b3.

    For my own reference, did you say earlier that it also worked if you did the other R1 bytes?
    XIN         0x90, &R1.b2, 1
    XIN         0x90, &R1.b1, 1

    If we have to use R1 and we do not want to overwrite the flag value in R1.b0, I think that rules out using __xin() in your C code. As far as I can tell, __XIN is not able to write to .b1, .b2, .b3 without also writing to .b0.

    Anything else you need to get going? 

    Do you need anything else from your side to keep making progress?

    I am still looking into spinlock connections between PRU subsystems, and still need to update the TRM docs & publish example code, but that should not be a blocker on your side.

    Updated example code 

    Thanks for posting your example code, I appreciate it. I'll pull in elements of your example like this:

    /*
     * SPDX-License-Identifier: BSD-3-Clause
     * Copyright (C) 2026 Texas Instruments Incorporated - http://www.ti.com/
     */
    
    #include <stdint.h>
    
    /* Declaration of the external assembly functions (defined in spinlock.asm) */
    /*
     * NOTE: These functions are written to use BS_ID = 0x90, which is the spinlock
     * in the local PRU subsystem. A separate function would be needed for
     * accessing spinlocks in a different PRU subsystem.
     */
    uint8_t spinlock_acquire(uint8_t flag_id);
    void spinlock_release(uint8_t flag_id);
    
    /* Spinlock values */
    #define SPINLOCK_FLAG 11 /* Value 0-63 */
    
    /* Debug signals */
    /* R30 is used to write to PRU GPO signals */
    /* TODO: implement pinmuxing for signals as discussed in GPIO lab */
    volatile register uint32_t __R30;
    /* TODO: Verify PRU0 defined in makefile */
    /* TODO: update shift value based on specific board pinmuxing */
    #if PRU0
    #define DEBUG_PIN_SHIFT 4 /* PRU0 debug pin */
    #else
    #define DEBUG_PIN_SHIFT 5 /* PRU1 debug pin */
    #endif
    
    /* Number of clocks for each PRU to hold the spinlock */
    #if PRU0
    #define HOLD_SPINLOCK_TIME 1000 /* PRU0 holds for 1000 clocks */
    #else
    #define HOLD_SPINLOCK_TIME 2000 /* PRU1 holds for 2000 clocks */
    #endif
    
    void main(void)
    {
            uint8_t result = 0;
            /* zero out all PRU GPO signals */
            __R30 = 0x00000000;
    
            while(1) {
                    /* loop until we get the spinlock */
                    result = 0;
                    do {
                            result = spinlock_acquire(SPINLOCK_FLAG);
                    } while (result != 1);
    
                    /* toggle debug signal high while we hold the spinlock */
                    __R30 |= (1 << DEBUG_PIN_SHIFT);
    
                    __delay_cycles(HOLD_SPINLOCK_TIME);
    
                    /* now release the spinlock */
                    spinlock_release(SPINLOCK_FLAG);
    
                    /* toggle debug signal low after releasing the spinlock */
                    __R30 &= ~(1 << DEBUG_PIN_SHIFT);
            }
    
            /* This program will not reach __halt because of the while loop */
            __halt();
    }
    

    ; SPDX-License-Identifier: BSD-3-Clause
    ; Copyright (C) 2026 Texas Instruments Incorporated - http://www.ti.com/
    
    ;******************************************************************************
    ; Build Configuration
    ;******************************************************************************
    
    ; Required for building .out with assembly file
        .retain
        .retainrefs
    
    ;******************************************************************************
    ; uint8_t spinlock_acquire(uint8_t flag_id);
    ;******************************************************************************
    
    ; .sect ".text:spinlock_acquire" places all code below the .sect directive into
    ; the .text section, grouped into a subsection named "spinlock_acquire"
        .sect       ".text:spinlock_acquire"
        .clink
        .global     spinlock_acquire
    
    spinlock_acquire:
    
    ;------------------------------------------------------------------------------
    ;   Function input arguments are stored in R14-R29.
    ;   flag_id is a uint8_t (8-bit), so it is stored in R14.b0.
    ;   The return value (uint8_t) is stored in R14.b0.
    ;
    ;   For more details about how function arguments are stored in registers,
    ;   reference the document "PRU Optimizing C/C+ Compiler User's Guide",
    ;   section "Function Structure and Calling Conventions"
    ;------------------------------------------------------------------------------
    
        ; 1) Copy flag_id from R14.b0 to R1.b0 for the spinlock hardware
        MOV         R1.b0, R14.b0
    
        ; 2) XIN: Request spinlock flag. The spinlock hardware writes the result
        ;    (0 = not acquired, 1 = acquired) into R1.b3.
        ;    NOTE: Uses BS_ID = 0x90 (spinlock in the local PRU subsystem).
        ;    A separate function would be needed for a different PRU subsystem.
        XIN         0x90, &R1.b3, 1
        MOV         R14.b0, R1.b3
    
        ; Return from spinlock_acquire. Return value (lock status) is in R14.b0.
        JMP         r3.w2
    
    ;******************************************************************************
    ; void spinlock_release(uint8_t flag_id);
    ;******************************************************************************
    
        .sect       ".text:spinlock_release"
        .clink
        .global     spinlock_release
    
    spinlock_release:
    
    ;------------------------------------------------------------------------------
    ;   Function input arguments are stored in R14-R29.
    ;   flag_id is a uint8_t (8-bit), so it is stored in R14.b0.
    ;   No return value.
    ;
    ;   For more details about how function arguments are stored in registers,
    ;   reference the document "PRU Optimizing C/C+ Compiler User's Guide",
    ;   section "Function Structure and Calling Conventions"
    ;------------------------------------------------------------------------------
    
        ; 1) Copy flag_id from R14.b0 to R1.b0 for the spinlock hardware
        MOV         R1.b0, R14.b0
    
        ; 2) XOUT: Release the spinlock flag. The spinlock hardware looks at
        ;    R1.b0 for the flag ID when it receives an XOUT command.
        ;    NOTE: Uses BS_ID = 0x90 (spinlock in the local PRU subsystem).
        ;    A separate function would be needed for a different PRU subsystem.
        XOUT        0x90, &R1.b3, 1
    
        ; Return from spinlock_release
        JMP         r3.w2

    Regards,

    Nick

  • Hello Nick,

    thanks for asking - I'm ok: I can go with the current C implementation (because I don't use optimization - still wondering, why the compiler drops the __xout then, but ok).

    It looks wired, but it's tested:

    uint32_t obj = lock_id & SPINLOCK_ID_MASK;
    __xin(SPINLOCK_DEV, SPINLOCK_BASE_REGISTER, SPINLOCK_USE_REMAPPING, obj);

    will change byte[3] within obj - it will reflect R1.b3
    that is, why

    if (((obj >> 24) & 0x01) == 1) { /* acquired */ }

    will work.

    Although it looks like a call by value, somehow the value get changed.
    But the compiler user guide, makes it clear:
    void __xout ( unsigned int device_id, unsigned int base_register , unsigned int use_remapping , void& object );
    -> as fas as void& is "clear" in C.

    To your question, if we can use something else then R1.b3, the answer is no: only R1.b3 works (tested).


    "[...] did you say earlier that it also worked if you did the other R1 bytes": What I was saying is, that -in C-Code- I can use 3 (out of 4) bytes to release the spinlock:

    uint32_t obj = (lock_id & SPINLOCK_ID_MASK) << 24;   // works
    uint32_t obj = (lock_id & SPINLOCK_ID_MASK) << 16;   // works
    uint32_t obj = (lock_id & SPINLOCK_ID_MASK) << 8;    // works
    uint32_t obj = (lock_id & SPINLOCK_ID_MASK)          // does not work

    __xout(SPINLOCK_DEV, SPINLOCK_BASE_REGISTER, SPINLOCK_USE_REMAPPING, obj);

    which was the original question.

    For the ASM-version instead, it actually DOES matter, which byte of R1 we set to unlock the specific spinlock:

    LDI         R1, 0
    MOV         R1.b0, R14.b0
    XOUT        0x90, &R14.b0, 1

    it has to be R1.b0, the others don't work (tested).

    So, I am still wondering -in case of the C-Code with __xout- who is setting R1.b0 to the correct spinlock_id? One could think: If I set b0 of my uint32 to spinlock_id, it will map to R1.b0. Which is not the case. That is the only situation, where R1.b0 is not set correct. All other cases are doing the job. But who is making that mapping? I mean, __xout is a generic "function" (not only for spinlocks), I thought it uses always the same "mapping": uint32[b0] goes to R1.b0 and so on.

    Regards,

    Stefan