MSP430FR5994: Matrix multiplication on the MSP430 with and without LEA

Siddhant Kishor Yeole

Part Number: MSP430FR5994

I am trying to implement matrix multiplication for multiple matrices on the MSP430FR5994. After referring a couple of old questions on the forum, I used the answers mentioned there to write the code for my implementation. The idea is to replicate a neural network layer and thus, the calculation involves a matrix multiplication of an input matrix with another matrix containing the weights of the network followed by the addition of another matrix containing the bias values of the neural network. While executing these operations, I realized that the values need to be quantized and did so before feeding the input, weights or biases to the matrices. The problem that I currently encounter is the result of the matrix calculations are shifted right by 1 bit 15 times before being stored in the result matrix. I understand that this behavior is in line with how '_q15' paramaters are treated and also have looked at the code where this shifting is done. One possible solution to remove this shifting was available in the following question - https://e2e.ti.com/support/microcontrollers/msp-low-power-microcontrollers-group/msp430/f/msp-low-power-microcontroller-forum/716353/msp430fr5992-msp-dsplib-msp_matrix_mpy_q15 - however, here the solution in case of using the MSP LEA is not mentioned. I did try something out to change the multiplication function in a way that it would use int16_t/uint16_t values instead of the _q15 parameters. The modified matrix multiplication function - incorporating the changes mentioned in the above question - looks as follows:

/* --COPYRIGHT--,BSD
 * Copyright (c) 2016, Texas Instruments Incorporated
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * *  Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *
 * *  Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * *  Neither the name of Texas Instruments Incorporated nor the names of
 *    its contributors may be used to endorse or promote products derived
 *    from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
 * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
 * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
 * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 * --/COPYRIGHT--*/

#include "../../include/DSPLib.h"

#if defined(MSP_USE_LEA)

msp_status msp_matrix_mpy_q15(const msp_matrix_mpy_q15_params *params, const uint16_t *srcA, const uint16_t *srcB, uint16_t *dst)
{
    uint16_t srcARows;
    uint16_t srcACols;
    uint16_t srcBRows;
    uint16_t srcBCols;
    msp_status status;
    MSP_LEA_MPYMATRIXROW_PARAMS *leaParams;

    /* Initialize the row and column sizes. */
    srcARows = params->srcARows;
    srcACols = params->srcACols;
    srcBRows = params->srcBRows;
    srcBCols = params->srcBCols;

#ifndef MSP_DISABLE_DIAGNOSTICS
    /* Check that column of A equals rows of B */
    if (srcACols != srcBRows) {
        return MSP_SIZE_ERROR;
    }

    /* Check that the data arrays are aligned and in a valid memory segment. */
    if (!(MSP_LEA_VALID_ADDRESS(srcA, 4) &
          MSP_LEA_VALID_ADDRESS(srcB, 4) &
          MSP_LEA_VALID_ADDRESS(dst, 4))) {
        return MSP_LEA_INVALID_ADDRESS;
    }

    /* Acquire lock for LEA module. */
    if (!msp_lea_acquireLock()) {
        return MSP_LEA_BUSY;
    }
#endif //MSP_DISABLE_DIAGNOSTICS

    /* Initialize LEA if it is not enabled. */
    if (!(LEAPMCTL & LEACMDEN)) {
        msp_lea_init();
    }

    /* Allocate MSP_LEA_MPYMATRIXROW_PARAMS structure. */
    leaParams = (MSP_LEA_MPYMATRIXROW_PARAMS *)msp_lea_allocMemory(sizeof(MSP_LEA_MPYMATRIXROW_PARAMS)/sizeof(uint32_t));

    /* Set status flag. */
    status = MSP_SUCCESS;

    /* Iterate through each row of srcA */
    while (srcARows--) {
        /* Set MSP_LEA_MPYMATRIXROW_PARAMS structure. */
        leaParams->rowSize = srcBRows;
        leaParams->colSize = srcBCols;
        leaParams->colVector = MSP_LEA_CONVERT_ADDRESS(srcB);
        leaParams->output = MSP_LEA_CONVERT_ADDRESS(dst);

        /* Load source arguments to LEA. */
        LEAPMS0 = MSP_LEA_CONVERT_ADDRESS(srcA);
        LEAPMS1 = MSP_LEA_CONVERT_ADDRESS(leaParams);

        /* Invoke the LEACMD__MPYMATRIXROW command with interrupts enabled. */
        LEAPMCB = LEACMD__MPYMATRIXROW | LEAITFLG1;

        /* Clear DSPLib flags, restore interrupts and enter LPM0. */
        msp_lea_ifg = 0;
        msp_lea_enterLPM();

#ifndef MSP_DISABLE_DIAGNOSTICS
        /* Check LEA interrupt flags for any errors. */
        if (msp_lea_ifg & LEACOVLIFG) {
            status = MSP_LEA_COMMAND_OVERFLOW;
            break;
        }
        else if (msp_lea_ifg & LEAOORIFG) {
            status = MSP_LEA_OUT_OF_RANGE;
            break;
        }
        else if (msp_lea_ifg & LEASDIIFG) {
            status = MSP_LEA_SCALAR_INCONSISTENCY;
            break;
        }
#endif //MSP_DISABLE_DIAGNOSTICS

        /* Increment srcA and dst pointers. */
        srcA += srcACols;
        dst += srcBCols;
    }

    /* Free MSP_LEA_MPYMATRIXROW_PARAMS structure. */
    msp_lea_freeMemory(sizeof(MSP_LEA_MPYMATRIXROW_PARAMS)/sizeof(uint32_t));

    /* Free lock for LEA module and return status. */
    msp_lea_freeLock();
    return status;
}

#else //MSP_USE_LEA

msp_status msp_matrix_mpy_q15(const msp_matrix_mpy_q15_params *params, const uint16_t *srcA, const uint16_t *srcB, uint16_t *dst)
{
    uint16_t cntr;
    uint16_t srcARows;
    uint16_t srcACols;
    uint16_t srcBRows;
    uint16_t srcBCols;
    uint16_t dst_row;
    uint16_t dst_col;
    uint16_t row_offset;
    uint16_t col_offset;
    uint16_t dst_row_offset;

    /* Initialize the row and column sizes. */
    srcARows = params->srcARows;
    srcACols = params->srcACols;
    srcBRows = params->srcBRows;
    srcBCols = params->srcBCols;

#ifndef MSP_DISABLE_DIAGNOSTICS
    /* Check that column of A equals rows of B */
    if (srcACols != srcBRows) {
        return MSP_SIZE_ERROR;
    }
#endif //MSP_DISABLE_DIAGNOSTICS

    /* In initialize loop counters. */
    cntr = 0;
    dst_row = 0;
    dst_col = 0;
    row_offset = 0;
    col_offset = 0;
    dst_row_offset = 0;

#if defined(__MSP430_HAS_MPY32__)
    /* If MPY32 is available save control context, set to fractional mode, set saturation mode. */
    uint16_t ui16MPYState = MPY32CTL0;
    MPY32CTL0 = MPYFRAC | MPYDLYWRTEN | MPYSAT;

    /* Loop through all srcA rows. */
    while(srcARows--) {
        /* Loop through all srcB columns. */
        while (dst_col < srcBCols) {
            /* Reset result accumulator. */
            MPY32CTL0 &= ~MPYC;
            RESLO = 0; RESHI = 0;
            
            /* Loop through all elements in srcA column and srcB row. */
            while(cntr < srcACols) {
                MACS = srcA[row_offset + cntr];
                OP2 = srcB[col_offset + dst_col];
                col_offset += srcBCols;
                cntr++;
            }
            
            /* Store the result */
            dst[dst_row_offset + dst_col] = RESHI * 32768 + RESLO;

            /* Update pointers. */
            dst_col++;
            cntr = 0;
            col_offset = 0;
        }

        /* Update pointers. */
        dst_row++;
        dst_col = 0;
        row_offset += srcACols;
        dst_row_offset += srcBCols;
    }

    /* Restore MPY32 control context, previous saturation state. */
    MPY32CTL0 = ui16MPYState;

#else //__MSP430_HAS_MPY32__
    uint32_t result;

    /* Loop through all srcA rows. */
    while(srcARows--) {
        /* Loop through all srcB columns. */
        while (dst_col < srcBCols) {
            /* Initialize accumulator. */
            result = 0;
            
            /* Loop through all elements in srcA column and srcB row. */
            while(cntr < srcACols) {
                result += (int32_t)srcA[row_offset + cntr] * (int32_t)srcB[col_offset + dst_col];
                col_offset += srcBCols;
                cntr++;
            }

            /* Saturate and store the result */
            dst[dst_row_offset + dst_col] = (int32_t)__saturate(result, INT32_MIN, INT32_MAX);

            /* Update pointers. */
            dst_col++;
            cntr = 0;
            col_offset = 0;
        }

        /* Update pointers. */
        dst_row++;
        dst_col = 0;
        row_offset += srcACols;
        dst_row_offset += srcBCols;
    }
#endif //__MSP430_HAS_MPY32__

    return MSP_SUCCESS;
}

#endif //MSP_USE_LEA

Despite changing the input type of the matrices to 'uint16_t' and also modifying the way result is stored by eliminating the shifting by 15, the code is still unable to calculate the matrix values in the integer format correctly. My complete code for the matrix multiplication is as follows:

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <msp430.h>
#include "DSPLib.h"
#include "math.h"

#pragma DATA_SECTION(lea1, ".leaRAM")
#pragma DATA_SECTION(lea2, ".leaRAM")
#pragma DATA_SECTION(leadest, ".leaRAM")

DSPLIB_DATA(lea1, 4)
uint16_t lea1[2][2] = {{7, 2}, {1, 2}};
DSPLIB_DATA(lea2, 4)
uint16_t lea2[2][2] = {{4, 5}, {2,3}};
DSPLIB_DATA(leadest, 4)
uint16_t leadest[2][2];


volatile uint32_t cycleCount = 0;
int main()
{
        msp_status status;
        msp_matrix_mpy_q15_params mpyParams;

        WDTCTL = WDTPW + WDTHOLD;

        mpyParams.srcARows = 2;
        mpyParams.srcACols = 2;
        mpyParams.srcBRows = 2;
        mpyParams.srcBCols = 2;

        status = msp_matrix_mpy_q15(&mpyParams, *lea1, *lea2, *leadest);
        cycleCount = msp_benchmarkStop(MSP_BENCHMARK_BASE);
        msp_checkStatus(status);
        return 0;

}

I am not sure how I can deal with the right shifts - either by removing them or by changing the function in a way that the result of the matrix multiplication is the original integer value obtained by standard mathematical calculations. If anyone could help me out with some possible solutions that I can try out and observe the bhevaior of MSP430, it would be extremely helpful. Do let me know if any other information is needed to provide more clarity. Thanks.

over 2 years ago

0 David Schultz over 2 years ago

Guru 21925 points

There is nothing mysterious about fixed point multiplication. The result will have as many fractional digits as the sum of the two inputs. So if you multiply two numbers with 15 fractional bits, the result will have 30. So you have to shift the result right by 15 bits to get back to 15 fractional bits.

After performing that shift you are going to want to check for overflow before converting from a 32 bit integer to 16. And think about what to do in that case since it will almost certainly happen. Probably somewhere in the middle of that multiply accumulate where you will not notice it.

Take for example this section of the code:

            /* Loop through all elements in srcA column and srcB row. */
            while(cntr < srcACols) {
                MACS = srcA[row_offset + cntr];
                OP2 = srcB[col_offset + dst_col];
                col_offset += srcBCols;
                cntr++;
            }
            
            /* Store the result */
            dst[dst_row_offset + dst_col] = RESHI * 32768 + RESLO;

This tries (badly) to stuff a 32 bit value into a 16 bit hole. Without even moving the binary point which means that almost all of the bits you are interested in are thrown away.

+1 Siddhant Kishor Yeole over 2 years ago in reply to David Schultz

Intellectual 285 points

Thanks for the clarification about the fixed point multiplication. However, I am using integer values in my matrix multiplication throughout - and I am not sure how I can negate the shifting in that case by modifying the 'msp_matrix_mpy_q15.c' file which defined the function for performing the matrix multiplication. I am unsure about shifting the result as multiple values when shifted right by 15 bits give the same result and applying a right shift by 15 bits to all the integer results might not get back the original values. That is the reason I wanted to try and modify the multiplication function itself, so that the shifting could be negated from the very first time this function is called. However, the link I referred to only explains the cases when we do not use LEA for calculations, as my matrices are large, I would like to use the LEA for calculation. In such a case, what could be a possible idea to change the multiplication code since I am only dealing with integer values and not the values with fractional bits?

+1 David Schultz over 2 years ago in reply to Siddhant Kishor Yeole

Guru 21925 points

I have never used the LEA but after looking at the command reference (slau850) it supports only fixed point types. The closest I can see to what you want is LEACMD_MAC which accepts Q15 as input with Q31 as output.

The natural result of multiplying two Q15s together is a Q30 result. So you would have to shift that Q31 result right one bit.

0 Siddhant Kishor Yeole over 2 years ago in reply to David Schultz

Intellectual 285 points

Alright, I will have a look at the command reference and also check what I get by using this command. If I have any further questions I will create a new question for the same. I will mark this question as resolved. Thank you for your response.

**Attention** This is a public forum

Because of the holidays, TI E2E™ design support forum responses will be delayed from Dec. 25 through Jan. 2. Thank you for your patience.

MSP low-power microcontrollers

MSP low-power microcontroller forum

MSP430FR5994: Matrix multiplication on the MSP430 with and without LEA