This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

problem regarding 64x+ processor

hi all

i have written a progam in c and call a function ( which is written in assembly language). i have pass the three array to the funcion mul .  When program return back to the main program it change the address of array. my progeam is follwing.

main program in c:

// C code for generation of two arrays.

#include<stdio.h>

#define row 8

//#define col 8

 

void main()

{

int A[row];

int B[row];

int C[row];

int i;

int ele,*a;

 

ele=row;

a = &A[0];

for(i=0;i<row;i+=2)

{

 

 

A[i]= 64;

B[i]= 64;

A[i+1]= 32;

B[i+1]= 32;

 

}

mul(A,B,C,ele);

// return 0;

}

 

 

function in assembly :

 

;========================================================
; Assembly Program to add two matrix.
; mul(A,B,C,max)
; C= A *B, max = length of array.
;=====================CYCLES=============================

; no of cycle required is
;  cycles= 12+ array size*0.75 .

;========================================================
;====================ASSUMPTION==========================
; The array size must be multiple of 4.
; Both array having same size...


;========= SYMBOLIC REGISTER ASSIGNMENTS ================

  .asg A4,  A_Input
  .asg B4,  B_Input

  .asg A6,  A_output
  .asg B16, B_output

  .asg B6,  B_length

  .asg A12, A_data1
  .asg B12, B_data1
  .asg A13, A_data2
  .asg B13, B_data2
  .asg A14, A_data3
  .asg B14, B_data3
  .asg A15, A_data4
  .asg B15, B_data4

  .asg A16, A_m1
  .asg A17, A_m2
  .asg B18, B_m3
  .asg B19, B_m4

;* ==========================================================
  .text
  .global _mul
_mul:
;* ==========================================================

  SHR     .S2     B_length,  2,  B_length   ; N/4
  MVC  .S2  B_length, ILC

  ZERO .S1  A_m1
||  ZERO .S2  B_m3

  ZERO .S1  A_m2
||  ZERO .S2  B_m4

  ADD  .L2  A_output,8,B_output
 

  SPLOOP 3

  LDDW  .D1  *A_Input++,A_data2:A_data1
||  LDDW  .D2  *B_Input++,B_data2:B_data1

  LDDW  .D1  *A_Input++,A_data4:A_data3
||  LDDW  .D2  *B_Input++,B_data4:B_data3
  
  NOP  4

  MPY32  .M1X A_data1,B_data1,A_m1
||  MPY32  .M2X A_data3,B_data3,B_m3

  MPY32  .M1X A_data2,B_data2,A_m2 
||  MPY32  .M2X A_data4,B_data4,B_m4
  
  NOP     1
  NOP     1
  NOP     1

  SPKERNEL 4,0
||  STDW  .D1  A_m2:A_m1,*A_output++[2]
||  STDW  .D2  B_m4:B_m3,*B_output++[2]

;============================================================
  .end
;============================================================

  • Two suggestions:

    1. Write your multiplication function in C and set the -k switch to keep the compiler's assembly output. Get this working, then modify the relevant portions of it to implement your optimizations.

    2. Re-read the C Compiler User's Guide Section 7 on Run-Time Environment with special attention to Section 7.3 Register Conventions and Section 7.5 Interfacing C and C++ With Assembly Language.

    It will go much easier for you to spend your time trying to get the C implementation optimized rather than trying to write C64x+ assembly code by hand. There are a lot of compiler optimizations you can learn about in the C Compiler User's Guide, and you can use intrinsics to call specific assembly instructions, when needed.