This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PRU signed multiply using hardware multiplier

Guru* 84110 points
Other Parts Discussed in Thread: AM3358

Using the AM3358, I need to do some small amount of filtering on samples being read by the PRU, and I would like to do that filtering within the PRU. I have been looking at the various choices for doing this and am trying to find the best/fastest way to get a signed result from a 32x32 signed multiply.

The fact that the hardware MAC is unsigned-only makes this a bit more difficult so I wrote it in assembly but am not very proud of the number of cycles it takes to complete. In my case, I am taking the whole 64-bit result for the accumulations (since it is there) and will truncate later.

If anyone has a better way to achieve signed multiplies on an unsigned multiplier, I would like to see it. I am hoping someone has a brilliant observation of signed/unsigned math that suggests a better way.

In my algorithm, I am testing the sign of the result by XORing the two multiplicands and saving the msb of that result, then I take the absolute value of each of those multiplicands. After multiplying the two now-non-negative numbers, I pull out the 64-bit result and negate it if the result is supposed to be negative (from that first XOR).

That takes a lot of cycles, but still less than doing a shift-and-add multiply algorithm for that many bits.

	; determine sign of si * y, then abs both args
	XOR		r18, r14, r15	; r18.t31 = sign of si * y
	QBBC	BCMAA1, r14, 31		; go around if si >= 0
	RSB		r14, r14, 0	; si = |si|
BCMAA1:
	MOV		r28, r14
	QBBC	BCMAA2, r15, 31		; go around if y >= 0
	RSB		r15, r15, 0	; y = |y|
BCMAA2:
	MOV		r29, r15
	MOV		r29, r15	; delay cycle for MPY to complete

;	XOR		r19, r16, r17	; r19.t31 = sign of co * x, early to save a cycle

	XIN		0, &r26, 2*4	; get 64-bit unsigned result
	QBBC	BCMAA3, r18, 31		; skip if result >= 0
	NOT		r20, r26
	NOT		r21, r27
	ADD		r20, r20, 1
	ADC		r21, r21, 0
	QBA		BCMAA4
BCMAA3:
	MOV		r20, r26
	MOV		r21, r27
BCMAA4:

This takes up to 14 cycles. Do you have a faster way, please?


Regards,
RandyP

  • Hi Randy,

    I will ask the PRU experts to look at this. They will respond directly here.
  • Please note that addition, subtraction, and multiplication (hence when combined, polynomial evaluation) is exactly the same in signed arithmetic and unsigned arithmetic, provided the number of bits in the result is no bigger than that of the inputs. In other words, although 32x32->64 bit multiply is different in signed arithmetic, the lower 32 bits still match the unsigned multiply result.

    To turn the unsigned interpretation of a 32-bit number into its signed interpretation, you need to subtract 232 if its sign bit is set. Expanding this in a multiplication yields this. Truncating that result to 64 bits yields

    (result_lsw,result_msw) = unsigned 64-bit multiply of x and y
    if( x.bit31 ) result_msw -= y;
    if( y.bit31 ) result_msw -= x;

    I'm no PRU expert so converting this optimally to PRU assembly is left as exercise to the reader ;-)

    Of course if you later truncate the result to 32 bits then all of this is unnecessary (even if intermediate results don't fit in 32 bits, as long as the final result does then intermediate overflows are not a problem).

  • Matthijs,

    This is very helpful, thanks. I will compare your comments to what I have been seeing in my testing.

    A conversion to PRU would be fewer cycles than what I did. If I do that, I will post it here.

    The texpaste.com site you linked to is blocked inside TI, and it may end up being removed from the post by the Admins. Could you post it as a link using the E2E insert file method, please? When you are editing a post, click 'Use rich formatting' in the lower right corner, then there will be an 'Insert file' icon on the tool ribbon.

    Thanks for your reply.

    Regards,
    RandyP
  • RandyP said:
    The texpaste.com site you linked to is blocked inside TI

    Ehh what, why? Perhaps complain to your system administrators?

    RandyP said:
    it may end up being removed from the post by the Admins.

    Ehh what, why? I would consider it rather offensive if admins were to edit my posts without very good reason.

    RandyP said:
    Could you post it as a link using the E2E insert file method

    No since it's not a file. It was just a bit of nicely formatted mathematical formulae, for which I used TeXpaste since the TI forum software offers no such functionality (other than manually editing the post as HTML to mark text up as superscript or subscript, which is extremely tedious).

    Here's the raw content (TeX code) of the snippet:

    $$
    \big(x - 2^{32} · s(x)\big) · \big(y - 2^{32} · s(y)\big) = \\
    x · y - 2^{32} · \big(s(x) · y + x · s(y)\big) + 2^{64} · s(x) · s(y)
    $$ where $s(x) \in \{0,1\}$ is bit 31 of $x$.

  • Matthijs,

    We do not allow file sharing sites inside the company firewall. Many companies have that restriction. In the overall scheme of network protection, I support that restriction. And that is why we supply the file-attach/insert feature that I mention. You can zip up files to attach that way, if needed.

    You are obviously well-versed in the use of the forum tools since you are using the Quote feature. Nice to see. Your advice and pointers are very much appreciated.

    Regards,
    RandyP
  • texpaste.com has no file sharing functionality whatsoever.