This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS/LAUNCHXL-F28377S: Accessing c-struct within CLA

Part Number: LAUNCHXL-F28377S

Tool/software: Code Composer Studio

Two-part question.

I previously create a simple ring-buffer class that I was using to share a queue between the CPU and CLA. The class header contains the c struct definition used to store it. However, the CLA tasks were never able to access anything inside the struct ( variable->buffer ). So I split the various parts of the ring buffer into individual variables, and this works just fine. However, I'm curious if there was something else I could test, or if structs are just too opaque for the CLA compiler.

Second part. The above struct (and now variables) are stored in the CPUtoCLA message ram (since the CLA only needed to read, not write to the queue). However, I just realized that the CLA data space is read/write accessible from the CPU - and I can make it as large as I like. I'm curious if there are any downsides to using the CLA data space, as I have other large variables elsewhere that are very constrained by the message rams.

  • Adam Jones said:
    However, the CLA tasks were never able to access anything inside the struct ( variable->buffer ).

    If your structure has pointers it may be the reason for this. Pointers are different sizes on the CLA and CPU, and if you haven't defined them as a union of a pointer and a uint32 you are going to see access issues on the CLA side. You can find the section on "Dealing with Pointers" in the firmware users guide for your device. 

    For the 2837xD you will find this under C:\ti\controlSUITE\device_support\F2837xD\v210\doc\F2837xD-FRM-EX-UG.pdf, section 4.7.1.

    Adam Jones said:
    I'm curious if there are any downsides to using the CLA data space, as I have other large variables elsewhere that are very constrained by the message rams.

    Arbitration. If the CPU is writing to the structure and at the same time the CLA were reading from it, then you could end up with stale data as the arbitration between cores is a round robin scheme, see section 2.11.1.6 of the TRM

  • Ah, I see. I had read that document too quickly before. I see now that the "leaf" functions can't be more than 1 level deep. I'm not sure I understand how to write a c struct that will work across the CPU and CLA, given these pointer differences. All of the data types are 16 bits wide, and the struct is just:

    typedef struct {
    uint16_t *buffer; // the buffer
    uint16_t size; // buffer size
    uint16_t available; // available
    uint16_t used; // used
    uint16_t read_index; // read index
    uint16_t write_index; // write index
    } ring_buffer_t;
  • I think I'll try this (although I can't verify it right now).

    typedef union {
    uint16_t *ptr; //aligned to lower 16-bits
    uint32_t pad; //32-bits
    } cla_uint16_t_ptr;

    typedef struct {
    uint16_t size; // buffer size
    uint16_t available; // available
    uint16_t used; // used
    uint16_t read_index; // read index
    uint16_t write_index; // write index
    cla_uint16_t_ptr buffer; // the buffer
    } ring_buffer_t;
  • Adam Jones said:
    I had read that document too quickly before. I see now that the "leaf" functions can't be more than 1 level deep

    That restriction pertains to older (6.2.0 and prior) compilers. You can have more than 1 level of nesting with the newer compilers, the limit is determined by the amount of scratchpad allocated.

  • Now, a week or so later, I was able to test this. Although it complies just fine, I cannot access any of the struct members inside the cla functions themselves.

    I have copied the relevant code below. I can get the used value without issue from the CPU, but I cannot read it from the CLA.

    in ring_buffer.h

    typedef union {
        uint16_t *ptr;  //aligned to lower 16-bits
        uint32_t pad;   //32-bits
    } aligned_uint16_t;
    
    typedef struct {
        uint16_t size;          // buffer size
        uint16_t available;     // available
        uint16_t used;          // used
        uint16_t read_index;    // read index
        uint16_t write_index;   // write index
        aligned_uint16_t buffer;    // the buffer
    } ring_buffer_t;

    in dsp_output.c

    #pragma DATA_SECTION(dsp_code_queue_low, "cla_data");
    ring_buffer_t *dsp_code_queue_low;
    

    in dsp_output_cla.h

    extern ring_buffer_t *dsp_code_queue_low;

    in dsp_output_cla.cla

    __interrupt void cla_task2_process_dsp_strobe_queues(void) {
    
        //increase the tic
        dsp_current_tic = dsp_code_queue_low->used;
        //real code just increases the tic until a rollover (debugger shows that normal increase works just fine)
    }
    
    

  • What is the address of dsp_code_queue_low? and also could you post the disassembly associated with this line of code

    dsp_current_tic = dsp_code_queue_low->used;

    I see where you have defined the pointer to ring_buffer_t (dsp_code_queue_low) but i dont see the ring_buffer_t object that its supposed to point to. The object it points to also has to reside in the section cla_data

  • I was just about to post. I think the 1st issue is that the creation of the ring buffer relies on calloc and malloc (and, CLA doesn't have a heap). So, I tried rewriting a few things so that the compiler creates the necessary memory.

    in dsp_output.c

    #pragma DATA_SECTION(dsp_code_queue_low_struct, "cla_data");
    ring_buffer_t dsp_code_queue_low_struct;
    #pragma DATA_SECTION(dsp_code_queue_low, "cla_data");
    ring_buffer_t *dsp_code_queue_low;
    #pragma DATA_SECTION(dsp_code_queue_low_buffer, "cla_data");
    uint16_t dsp_code_queue_low_buffer[code_ring_buffer_low_length];
    
    ...
    void init_dsp_output() {
        //manually init the ring_buffer
        dsp_code_queue_low = &dsp_code_queue_low_struct;
        dsp_code_queue_low->size = code_ring_buffer_low_length;
        dsp_code_queue_low->available = code_ring_buffer_low_length;
        dsp_code_queue_low->used = 0;
        dsp_code_queue_low->read_index = 0;
        dsp_code_queue_low->write_index = 0;
        dsp_code_queue_low->buffer.ptr = dsp_code_queue_low_buffer;
    }
    
    However, this also doesn't work. 

  • Ok, which RAMLSx (s) did you assign cla_data to in the linker command file. Have you configured those RAMLSx blocks to be CLA data memory (through the MemCfgRegs)?
  • I'm using the following:

        // Select RAMLS0 and RAMLS1 to be the programming space for the CLA
        // First configure LS0 and LS1 to be shared with CLA and then
        // set the spaces to be program blocks (blocks out CPU access)
        MemCfgRegs.LSxMSEL.bit.MSEL_LS0 = 1;        //CLA accessible
        MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS0 = 1;    //program block (no CPU access)
        MemCfgRegs.LSxMSEL.bit.MSEL_LS1 = 1;        //CLA accessible
        MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS1 = 1;    //program block (no CPU access)
    
        //Next configure RAMLS2 and RAMLS3 as data spaces for the CLA
        // First configure LS2 and LS3 to be shared with CLA and then
        // set the spaces to be data blocks (still CPU accessible)
        MemCfgRegs.LSxMSEL.bit.MSEL_LS2 = 1;        //CLA accessible
        MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS2 = 0;    //data block (CPU access)
        MemCfgRegs.LSxMSEL.bit.MSEL_LS3 = 1;        //CLA accessible
        MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS3 = 0;    //data block (CPU access)

    And all if the other 2 dozen or so CLA variables (including primitives and arrays of primitives) in "cla_data"(LS2,3) work perfectly fine. This is the only stuct and/or struct pointer that I've tried to use.

  • Hmm, I was hoping for the easy fix.

    In that case we need to look at the disassembly. Could you post the disassembly for these lines,

    dsp_code_queue_low = &dsp_code_queue_low_struct;
    dsp_code_queue_low->size = code_ring_buffer_low_length;
    dsp_code_queue_low->available = code_ring_buffer_low_length;
    dsp_code_queue_low->used = 0;
    dsp_code_queue_low->read_index = 0;
    dsp_code_queue_low->write_index = 0;
    dsp_code_queue_low->buffer.ptr = dsp_code_queue_low_buffer;

    and the address of dsp_code_queue_low and dsp_code_queue_low_struct
  • Vishal,

    Thank your for the quick responses. I was able to get the code, mostly, working last night with a few changes:

    //defines (in c)
    #pragma DATA_SECTION(dsp_code_queue_low_struct, "cla_data");
    ring_buffer_t dsp_code_queue_low_struct;
    #pragma DATA_SECTION(dsp_code_queue_low_buffer, "cla_data");
    uint16_t dsp_code_queue_low_buffer[code_ring_buffer_low_length+1];  //needs 1 extra to fit ring_buffer
    
    //init (in c)
    dsp_code_queue_low_struct.size = code_ring_buffer_low_length;
    dsp_code_queue_low_struct.available = code_ring_buffer_low_length;
    dsp_code_queue_low_struct.used = 0;
    dsp_code_queue_low_struct.read_index = 0;
    dsp_code_queue_low_struct.write_index = 0;
    dsp_code_queue_low_struct.buffer.ptr = dsp_code_queue_low_buffer;
    
    //cla usage (random snippets)
    dsp_code_queue_low_struct.used > 0
    cla_ring_buffer_read(&dsp_code_queue_low_struct, &dsp_current_code); //a cla copy of the same ring buffer function used in c
    

    What I do find very interesting is although the queues (there are two, although I only show one here) are working, when I run up against adding new elements to a full queue, it will always lock up. If, instead I never try to add more than the queue length at one time, it will empty quickly (the cla task is timed to process 2 elements per millisecond).

    void send_dsp_code(uint16_t code, bool is_high_priority) {
        //assumes that it can wait for room
    
        if (is_high_priority) {
            while (dsp_code_queue_high_struct.available == 0);
    
            ring_buffer_write(&dsp_code_queue_high_struct, &code);
        } else {
            while (dsp_code_queue_low_struct.available == 0);
    
            ring_buffer_write(&dsp_code_queue_low_struct, &code);
        }
    }

    However, if I make two changes to the code: 1) adding a post-task ISR that just adds some wait time if either queue is almost full AND 2) changing the above send_dsp_code to do the same.

    __interrupt void cla_task2_process_dsp_strobe_queues_isr() {
        static uint16_t last_low_available;
        static uint16_t last_high_available;
    
        //check if either queue was consumed
        if (dsp_code_queue_high_struct.available !=last_high_available || dsp_code_queue_low_struct.available != last_low_available) {
    
            //if less than 2 spaces available, add a delay to prevent queues from locking up
            if (dsp_code_queue_high_struct.available < 2 || dsp_code_queue_low_struct.available < 2) {
                DELAY_US(1000);   //tiny delay to allow buffer to be consumed by CLA
            }
    
            last_low_available = dsp_code_queue_low_struct.available;
            last_high_available = dsp_code_queue_high_struct.available;
        }
    
        //acknowledge CLA end of task interrupt
        PieCtrlRegs.PIEACK.bit.ACK11 = 1;   //all post-CLA isr are in group 11
    }
    
    void send_dsp_code(uint16_t code, bool is_high_priority) {
        //assumes that it can wait for room
    
        if (is_high_priority) {
            if (dsp_code_queue_high_struct.available < 2) {DELAY_US(1000);}
    
            ring_buffer_write(&dsp_code_queue_high_struct, &code);
        } else {
            if (dsp_code_queue_low_struct.available < 2) {DELAY_US(1000);}
    
            ring_buffer_write(&dsp_code_queue_low_struct, &code);
        }
    }

    I'll also have to ask for any advice you have on getting the disassembly, as although I've turned on creation of the .lst files, it only creates them for .c files, not .cla, and I can't really make sense of where the correlations between the c code and the assembly code are.

  • Adam,

    DELAY_US calls a function, F28x_usDelay, which is a c28x assembly function. It wont work on the CLA.

    Adam Jones said:
    I'll also have to ask for any advice you have on getting the disassembly, as although I've turned on creation of the .lst files, it only creates them for .c files, not .cla, and I can't really make sense of where the correlations between the c code and the assembly code are.

    While you are debugging code you can open up the disassembly window through the CCS menu bar (View->Disassembly window) and just copy code out of there. Alternatively, in the project propeties (C2000 compiler > Advanced Options -> assembler options) you can turn on --keep_asm (to keep the generated assembly file) and --c_src_interlist (to have the C code and its correspoinding assembly appear in the .asm file). This makes it easier to analyze the assembly.

  • The Delay_US is only called from the CPU. Note that I said the two functions are called from the CPU, with one being the post-task ISR. I'll give your directions a shot on getting the assembly.