C674x DSPF_sp_mat_mul benchmark

Elron A Yellin

Expert 1040 points

Can someone confirm the benchmarks published for the C674x DSPF_sp_mat_mul (DSPLib 3.1.0.0)?

DSPF_sp_mat_mul_674LE_LE_ELF_c674-LE-ELF-CGT:7.2.4

(C674xCPUCycleAccurateSim-LE)

4.2.3.00004

Passed

" 1/2*r1*c2*c1 + 12/2*r1*c2 + 9/2*r1 + 42"

" 864 bytes"

For r1=32, c1 = 64, c2 = 1, I'm getting about 2658 compared to the 1402 suggested by the docs. The code and data are in L1. I'm linking to dsplib.ae674 from the distribution, but I get slightly worse results when I compile it myself with optimizations on (cl6x 7.4.1). I'm using TSC to count cycles.

The docs don't mention anything about bank alignment of the buffers, and it's a bit hard to figure out the best alignment after pipelining (would be great if the compiler made suggestions in the cases where the code doesn't assert any bank alignments). I tried a few combinations of bank alignments to no avail.

I wrote specialized code for my particular case (double word aligned buffers, removed one loop, unrolled inner loop by 4 and padded the matrix with zeros when necessary, interrupt threshold -1) and got the count down to 1700, but I'd still like to know if the docs are wrong.

Thanks

over 13 years ago

Processors

Processors forum

C674x DSPF_sp_mat_mul benchmark