This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Question about alignment restrictions of dsp_fir_gen in C674x DSPLIB

We are using dsp_fir_gen from DSPLIB, and found that there is a restriction that the output buffer needs to be word aligned, so  we are using a pragma to achieve that.  A second restriction is that the output length has to be a multiple of 4.  Since that is not always guarnateed, we filter the first few samples with generic C code until the remaining samples is a multiple of 4, and then filter those with dsp_fir_gen.  For example, if the output lenght is 23, we filter 3 samples with a C loop, then the final 20 samples with dsp_fir_gen.  In this case, the value passed in to dsp_fir_gen for the output buffer is the buffer base (word aligned) plus 3 (not word aligned).  So what I was thinking was have the user pass in a buffer that is (in this example) 24 samples long so that y[3] is word aligned:

   Non-Aligned         Aligned
    +-------+         +-------+
0   |  y[0] |     0   |   -   |
    +-------+         +-------+
1   |  y[1] |     1   |  y[0] |
    +-------+         +-------+
2   |  y[2] |     2   |  y[1] |
    +-------+         +-------+
3   |  y[3] |     3   |  y[2] |
    +-------+         +-------+
4   |  y[4] |     4   |  y[3] |
    +-------+         +-------+
        .                 .   
        .                 .   
        .                 .   
    +-------+         +-------+
21  | y[21] |     21  | y[20] |
    +-------+         +-------+
22  | y[22] |     22  | y[21] |
    +-------+         +-------+
                  23  | y[22] |
                      +-------+

Before we update our user documentation to give insructions on how to manually align the key sample, I want to confirm with you that the restriction is valid.  The reason I ask, is our unit test passes even with the non-aligned memory arrangment!  Did we just get "lucky"?

  • Ruben,

    Just to verify: we're talking about the C64+ DSPLIB, correct?  You used the function name from the C64+ release, but your post is tagged C674x, so I thought I should check.  The rest of my post will assume that we're talking about the C64+ DSPLIB.

    I've taken a few minutes to look at the source for the DSP_fir_gen function, and I have a few comments:

    • There appears to be an undocumented requirement that the input buffer (x) must be double-word aligned
    • Your workaround to pass the FIR function a properly aligned buffer seems valid, but is likely to incur considerable extra work (i.e. copying data after the fact to get a contiguous data buffer)

    I have an alternative (hopefully easier) recommendation to handle the buffer length and alignment requirements.  Suppose you want to compute N filtered output values, but N is not divisible by 4.  Then:

    1. Allocate your y buffer with space for M = N + (N % 4) output values
    2. Call DSP_fir_gen to compute M outputs from your filter
    3. Only use the first N calculated output values

    Does this make sense to you?  I hope I'm understanding the situation correctly.

  • Joe,

    Yes...it is C64+ DSPLIB.  Can you confirm that restriction on the inbut buffer length?  I guess we could use a similar over calculate and toss method to deal with this.

    If you look at our solution, there is actually no extra copy.  We are filling the first (N%4) samples with a generic C loop to fill the start of the buffer.  That said I do see how your solution addresses our problem, but wouldn't you want to have M = N + 4 - (N%4)?  That is the expression to ensure you are the next multiple of 4 greater than N.  Actually for (N%4) = 0 you get 4 extra samples but you get the idea...

    Darrell

  • Darrell,

    The restriction is actually on the output buffer length, but the input buffer length has a very specific relation to that:

    nr % 4 = 0
    nx = nr + nh - 1

    If you use an output buffer length that is not divisible by 4, then the function will write a few extra data values immediately after the output buffer.  This could cause problems if there's another buffer or variable in that space.  The relevant line in DSP_fir_gen.c is this:

    _mem8(&r[j]) = _itoll(sum_32, sum_10);

    This means that the function only writes to the output buffer in 8-byte chunks, each of which contains four Int16 values.  If the buffer length is not evenly divisible by 4, then the final write will run over the end of the output buffer.  You will also waste some cycles calculating these overrun output values.  My workaround is really just aimed at making sure the extra write doesn't cause problems; you lose those cycles either way.

    Also, you're correct about the value of M from my workaround.

  • Joe,

    Thanks...I do understand that the input buffer length has to be a multiple of 4 since the output buffer is also a multiple of 4, but does the input buffer have to be double-word aligned?

    Darrell

  • Darrell,

    Actually, it looks like the input buffer does not need to be double word aligned; I was misreading the source code and got the wrong idea.  Sorry for the confusion.

    Also, nx does not need to be a multiple of 4; instead it's set by the values of nr and nh.  For example, nr = 32 and nh = 8 would require that nx = 32 + 8 - 1 = 39.  In practice, the first nh -1 values of the input buffer are the last values from the "previous" input buffer.  They can also just be set to 0 if you don't care about continuity from one FIR call to the next.

  • Hi Joe,

          I had a question regarding this line from DSP_fir_gen.c :

          _mem8(&r[j]) = _itoll(sum_32, sum_10);

    Since the intrinsic used here is _mem8 and not _amem8, we were wondering if the output array really needs to be word aligned. As Darrell mentioned in his first post, our unit test works even when then output array being passed to DSP_fir_gen is not word aligned, so we just wanted to check if the word alignment requirement for the output array is valid.

    Thanks,

    Divya

  • Divya,

    You are probably correct, but I'm hesitant to recommend that you disregard a documented requirement.  I would want to do some testing of my own before I would go that far. :)