I am trying to implement matrix multiplication for multiple matrices on the MSP430FR5994. After referring a couple of old questions on the forum, I used the answers mentioned there to write the code for my implementation. The idea is to replicate a neural network layer and thus, the calculation involves a matrix multiplication of an input matrix with another matrix containing the weights of the network followed by the addition of another matrix containing the bias values of the neural network. While executing these operations, I realized that the values need to be quantized and did so before feeding the input, weights or biases to the matrices. The problem that I currently encounter is the result of the matrix calculations are shifted right by 1 bit 15 times before being stored in the result matrix. I understand that this behavior is in line with how '_q15' paramaters are treated and also have looked at the code where this shifting is done. One possible solution to remove this shifting was available in the following question - https://e2e.ti.com/support/microcontrollers/msp-low-power-microcontrollers-group/msp430/f/msp-low-power-microcontroller-forum/716353/msp430fr5992-msp-dsplib-msp_matrix_mpy_q15 - however, here the solution in case of using the MSP LEA is not mentioned. I did try something out to change the multiplication function in a way that it would use int16_t/uint16_t values instead of the _q15 parameters. The modified matrix multiplication function - incorporating the changes mentioned in the above question - looks as follows:
/* --COPYRIGHT--,BSD * Copyright (c) 2016, Texas Instruments Incorporated * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * * Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * * Neither the name of Texas Instruments Incorporated nor the names of * its contributors may be used to endorse or promote products derived * from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * --/COPYRIGHT--*/ #include "../../include/DSPLib.h" #if defined(MSP_USE_LEA) msp_status msp_matrix_mpy_q15(const msp_matrix_mpy_q15_params *params, const uint16_t *srcA, const uint16_t *srcB, uint16_t *dst) { uint16_t srcARows; uint16_t srcACols; uint16_t srcBRows; uint16_t srcBCols; msp_status status; MSP_LEA_MPYMATRIXROW_PARAMS *leaParams; /* Initialize the row and column sizes. */ srcARows = params->srcARows; srcACols = params->srcACols; srcBRows = params->srcBRows; srcBCols = params->srcBCols; #ifndef MSP_DISABLE_DIAGNOSTICS /* Check that column of A equals rows of B */ if (srcACols != srcBRows) { return MSP_SIZE_ERROR; } /* Check that the data arrays are aligned and in a valid memory segment. */ if (!(MSP_LEA_VALID_ADDRESS(srcA, 4) & MSP_LEA_VALID_ADDRESS(srcB, 4) & MSP_LEA_VALID_ADDRESS(dst, 4))) { return MSP_LEA_INVALID_ADDRESS; } /* Acquire lock for LEA module. */ if (!msp_lea_acquireLock()) { return MSP_LEA_BUSY; } #endif //MSP_DISABLE_DIAGNOSTICS /* Initialize LEA if it is not enabled. */ if (!(LEAPMCTL & LEACMDEN)) { msp_lea_init(); } /* Allocate MSP_LEA_MPYMATRIXROW_PARAMS structure. */ leaParams = (MSP_LEA_MPYMATRIXROW_PARAMS *)msp_lea_allocMemory(sizeof(MSP_LEA_MPYMATRIXROW_PARAMS)/sizeof(uint32_t)); /* Set status flag. */ status = MSP_SUCCESS; /* Iterate through each row of srcA */ while (srcARows--) { /* Set MSP_LEA_MPYMATRIXROW_PARAMS structure. */ leaParams->rowSize = srcBRows; leaParams->colSize = srcBCols; leaParams->colVector = MSP_LEA_CONVERT_ADDRESS(srcB); leaParams->output = MSP_LEA_CONVERT_ADDRESS(dst); /* Load source arguments to LEA. */ LEAPMS0 = MSP_LEA_CONVERT_ADDRESS(srcA); LEAPMS1 = MSP_LEA_CONVERT_ADDRESS(leaParams); /* Invoke the LEACMD__MPYMATRIXROW command with interrupts enabled. */ LEAPMCB = LEACMD__MPYMATRIXROW | LEAITFLG1; /* Clear DSPLib flags, restore interrupts and enter LPM0. */ msp_lea_ifg = 0; msp_lea_enterLPM(); #ifndef MSP_DISABLE_DIAGNOSTICS /* Check LEA interrupt flags for any errors. */ if (msp_lea_ifg & LEACOVLIFG) { status = MSP_LEA_COMMAND_OVERFLOW; break; } else if (msp_lea_ifg & LEAOORIFG) { status = MSP_LEA_OUT_OF_RANGE; break; } else if (msp_lea_ifg & LEASDIIFG) { status = MSP_LEA_SCALAR_INCONSISTENCY; break; } #endif //MSP_DISABLE_DIAGNOSTICS /* Increment srcA and dst pointers. */ srcA += srcACols; dst += srcBCols; } /* Free MSP_LEA_MPYMATRIXROW_PARAMS structure. */ msp_lea_freeMemory(sizeof(MSP_LEA_MPYMATRIXROW_PARAMS)/sizeof(uint32_t)); /* Free lock for LEA module and return status. */ msp_lea_freeLock(); return status; } #else //MSP_USE_LEA msp_status msp_matrix_mpy_q15(const msp_matrix_mpy_q15_params *params, const uint16_t *srcA, const uint16_t *srcB, uint16_t *dst) { uint16_t cntr; uint16_t srcARows; uint16_t srcACols; uint16_t srcBRows; uint16_t srcBCols; uint16_t dst_row; uint16_t dst_col; uint16_t row_offset; uint16_t col_offset; uint16_t dst_row_offset; /* Initialize the row and column sizes. */ srcARows = params->srcARows; srcACols = params->srcACols; srcBRows = params->srcBRows; srcBCols = params->srcBCols; #ifndef MSP_DISABLE_DIAGNOSTICS /* Check that column of A equals rows of B */ if (srcACols != srcBRows) { return MSP_SIZE_ERROR; } #endif //MSP_DISABLE_DIAGNOSTICS /* In initialize loop counters. */ cntr = 0; dst_row = 0; dst_col = 0; row_offset = 0; col_offset = 0; dst_row_offset = 0; #if defined(__MSP430_HAS_MPY32__) /* If MPY32 is available save control context, set to fractional mode, set saturation mode. */ uint16_t ui16MPYState = MPY32CTL0; MPY32CTL0 = MPYFRAC | MPYDLYWRTEN | MPYSAT; /* Loop through all srcA rows. */ while(srcARows--) { /* Loop through all srcB columns. */ while (dst_col < srcBCols) { /* Reset result accumulator. */ MPY32CTL0 &= ~MPYC; RESLO = 0; RESHI = 0; /* Loop through all elements in srcA column and srcB row. */ while(cntr < srcACols) { MACS = srcA[row_offset + cntr]; OP2 = srcB[col_offset + dst_col]; col_offset += srcBCols; cntr++; } /* Store the result */ dst[dst_row_offset + dst_col] = RESHI * 32768 + RESLO; /* Update pointers. */ dst_col++; cntr = 0; col_offset = 0; } /* Update pointers. */ dst_row++; dst_col = 0; row_offset += srcACols; dst_row_offset += srcBCols; } /* Restore MPY32 control context, previous saturation state. */ MPY32CTL0 = ui16MPYState; #else //__MSP430_HAS_MPY32__ uint32_t result; /* Loop through all srcA rows. */ while(srcARows--) { /* Loop through all srcB columns. */ while (dst_col < srcBCols) { /* Initialize accumulator. */ result = 0; /* Loop through all elements in srcA column and srcB row. */ while(cntr < srcACols) { result += (int32_t)srcA[row_offset + cntr] * (int32_t)srcB[col_offset + dst_col]; col_offset += srcBCols; cntr++; } /* Saturate and store the result */ dst[dst_row_offset + dst_col] = (int32_t)__saturate(result, INT32_MIN, INT32_MAX); /* Update pointers. */ dst_col++; cntr = 0; col_offset = 0; } /* Update pointers. */ dst_row++; dst_col = 0; row_offset += srcACols; dst_row_offset += srcBCols; } #endif //__MSP430_HAS_MPY32__ return MSP_SUCCESS; } #endif //MSP_USE_LEA
Despite changing the input type of the matrices to 'uint16_t' and also modifying the way result is stored by eliminating the shifting by 15, the code is still unable to calculate the matrix values in the integer format correctly. My complete code for the matrix multiplication is as follows:
#include <stdint.h> #include <stdlib.h> #include <stdio.h> #include <assert.h> #include <msp430.h> #include "DSPLib.h" #include "math.h" #pragma DATA_SECTION(lea1, ".leaRAM") #pragma DATA_SECTION(lea2, ".leaRAM") #pragma DATA_SECTION(leadest, ".leaRAM") DSPLIB_DATA(lea1, 4) uint16_t lea1[2][2] = {{7, 2}, {1, 2}}; DSPLIB_DATA(lea2, 4) uint16_t lea2[2][2] = {{4, 5}, {2,3}}; DSPLIB_DATA(leadest, 4) uint16_t leadest[2][2]; volatile uint32_t cycleCount = 0; int main() { msp_status status; msp_matrix_mpy_q15_params mpyParams; WDTCTL = WDTPW + WDTHOLD; mpyParams.srcARows = 2; mpyParams.srcACols = 2; mpyParams.srcBRows = 2; mpyParams.srcBCols = 2; status = msp_matrix_mpy_q15(&mpyParams, *lea1, *lea2, *leadest); cycleCount = msp_benchmarkStop(MSP_BENCHMARK_BASE); msp_checkStatus(status); return 0; }
I am not sure how I can deal with the right shifts - either by removing them or by changing the function in a way that the result of the matrix multiplication is the original integer value obtained by standard mathematical calculations. If anyone could help me out with some possible solutions that I can try out and observe the bhevaior of MSP430, it would be extremely helpful. Do let me know if any other information is needed to provide more clarity. Thanks.