problem regarding 64x+ processor

brajesh kumar19415

hi all

i have written a progam in c and call a function ( which is written in assembly language). i have pass the three array to the funcion mul . When program return back to the main program it change the address of array. my progeam is follwing.

main program in c:

// C code for generation of two arrays.

#include<stdio.h>

#define row 8

//#define col 8

void main()

{

int A[row];

int B[row];

int C[row];

int i;

int ele,*a;

ele=row;

a = &A[0];

for(i=0;i<row;i+=2)

{

A[i]= 64;

B[i]= 64;

A[i+1]= 32;

B[i+1]= 32;

}

mul(A,B,C,ele);

// return 0;

}

function in assembly :

;========================================================
; Assembly Program to add two matrix.
; mul(A,B,C,max)
; C= A *B, max = length of array.
;=====================CYCLES=============================

; no of cycle required is
; cycles= 12+ array size*0.75 .

;========================================================
;====================ASSUMPTION==========================
; The array size must be multiple of 4.
; Both array having same size...

;========= SYMBOLIC REGISTER ASSIGNMENTS ================

.asg A4, A_Input
.asg B4, B_Input

.asg A6, A_output
.asg B16, B_output

.asg B6, B_length

  .asg A12, A_data1
  .asg B12, B_data1
  .asg A13, A_data2
  .asg B13, B_data2
  .asg A14, A_data3
  .asg B14, B_data3
  .asg A15, A_data4
  .asg B15, B_data4

  .asg A16, A_m1
  .asg A17, A_m2
  .asg B18, B_m3
  .asg B19, B_m4

;* ==========================================================
.text
.global _mul
_mul:
;* ==========================================================

SHR .S2 B_length, 2, B_length ; N/4
MVC .S2 B_length, ILC

ZERO .S1 A_m1
|| ZERO .S2 B_m3

ZERO .S1 A_m2
|| ZERO .S2 B_m4

ADD .L2 A_output,8,B_output

SPLOOP 3

LDDW .D1 *A_Input++,A_data2:A_data1
|| LDDW .D2 *B_Input++,B_data2:B_data1

  LDDW  .D1  *A_Input++,A_data4:A_data3
||  LDDW  .D2  *B_Input++,B_data4:B_data3

  NOP  4

MPY32 .M1X A_data1,B_data1,A_m1
|| MPY32 .M2X A_data3,B_data3,B_m3

  MPY32  .M1X A_data2,B_data2,A_m2
||  MPY32  .M2X A_data4,B_data4,B_m4

  NOP     1
  NOP     1
  NOP     1

  SPKERNEL 4,0
||  STDW  .D1  A_m2:A_m1,*A_output++[2]
||  STDW  .D2  B_m4:B_m3,*B_output++[2]

;============================================================
.end
;============================================================

over 15 years ago

0 RandyP over 15 years ago

TI__Guru* 84110 points

Two suggestions:

1. Write your multiplication function in C and set the -k switch to keep the compiler's assembly output. Get this working, then modify the relevant portions of it to implement your optimizations.

2. Re-read the C Compiler User's Guide Section 7 on Run-Time Environment with special attention to Section 7.3 Register Conventions and Section 7.5 Interfacing C and C++ With Assembly Language.

It will go much easier for you to spend your time trying to get the C implementation optimized rather than trying to write C64x+ assembly code by hand. There are a lot of compiler optimizations you can learn about in the C Compiler User's Guide, and you can use intrinsics to call specific assembly instructions, when needed.

Processors

Processors forum

problem regarding 64x+ processor