This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Help! Any way to make DSPLib functions interruptible (C6713/C6748)?

Other Parts Discussed in Thread: OMAPL138

Dear Gentle(wo)men,

in some of old projects, based on C6713 DSP, we have successfully used the DSPLib functions DSPF_sp_cfftr2_dit and DSPF_sp_icfftr2_dif. The main advantage of these was an “in-place” computation.

Now we have to redevelop these projects using C6748 DSP under some different conditions. The main problem is to make these functions interruptible, since we can’t avoid some additional background operation.

I would ask the DSPLib experts to estimate the steps needed to make the named functions interruptible.

Thanks in advance

GenPol

  • GenPol,

    The DSPLIB Reference Guide has the C source for the functions. You can compile this code with full optimization and set the -mi compiler switch to the maximum number of cycles that you want interrupts to be disabled. This will let you control the latency of your interrupts.

    If you need help with optimizing the C source, you can search the E2E forums and the TI Wiki Pages for "optimize" or "optimization" to find hints and links to training material. There is a C6000 Optimization Workshop archived on the TI Wiki Pages that is very helpful, with a student guide and labs to work through.

    Hope this helps.

    Regards,
    RandyP

  • RandyP,

    Unfortunately it doesn’t help because our problem is not as simple as you imagine: we used a DSPLib linear assembler implementation.

    Are there some tricks also known to optimize the linear assembler functions (excluding hand-made optimization)?

    Kindly

    GenPol

  • GenPol,

    Some of this depends greatly on the target processor. In the text, you said C6713 and in the title you also included the C6748. The C6748 has some significantly better capabilities that it inherited from the fixed point C64x+ family. This includes interruptible loops.

    Do you have a choice of processor, or do you require the same code to work on both?

    You can try building and simulating the code for the C674x architecture and see what differences you get. The new architecture might not do anything for you, but it has the potential for natural improvements.

    Regards,
    RandyP

  • RandyP,

    Maybe I was relatively deeply in my task and have omitted some details, sorry.

    So, the old projects were based on the C6713 DSP, the redeveloped – should be on the C6748.

    In the actual C674X DSPLib the forenamed function were not modified, considering the new C6748 architecture such as the SPLOOP, but only borrowed from the older C6713 DSPLib.

    The standard steps of hand-made optimization are described in the spru198 – to localize pipe loop prolog, kernel and epilog, configure and activate the SPLOOP etc. I believed these steps can be made automatically on the assembly optimizer level but, alas, have found no solution and therefore asked the DSPLib experts for help.

    Kindly
    GenPol

  • GenPol,

    You were more specific in your first post than I realized. Thank you for the confirmation, though.

    Did you develop your own linear assembly version of these functions? I only see scheduled assembly (.s) and C versions in the src folder. My version is 3.1.0.0, so perhaps you have something different.

    Looking at the scheduled assembly, it is massively optimized with very high utilization of the 8 execution units. And it does disable interrupts during the entire duration of the function. If you were to change anything to make it interruptible, you would be very likely to lose performance. This would have to be examined in great detail to find a way to do that.

    The option that TI provides is to use the Compiler's switches on the C versions, and you can also do that with your linear assembly version. You will likely not get as good performance as the scheduled assembly, but you could improve your interrupt latency.

    The -mi switch can be used with linear assembly to limit the amount of time that interrupts are disabled. I tried a simple test using the dotp linear assembly code from the Compiler User Guide

    With -o3 optimization, the inner loop was 2 cycles with both cycles fully loaded on all 8 functional units. This inner loop was not interruptible because of the loop size.
    With -mi 100, the inner loop was 6 cycles long so that it could be interrupted.

    The -mi 100 version will likely take at least 3x as long for the inner loop to execute. So that tradeoff must be considered.

    My suggestions, in order of my preference:

    1. Try the switches I have mentioned above and let us know how the results work for you.

    2. We will ask a Moderator to move this thread to the Compiler Forum so the best people can comment on the compiler options available to you. They may need you to post your linear assembly version, and they may ask you about the compiler's pipeline comments in the assembly output (use the -k compiler switch to keep the assembly file). They may be able to address why SPLOOP is not being used.

    3. With a measurement of the execution time of these functions while interrupts are disabled, examine the impact of this on the latency of your interrupt(s). You may be able to find another system solution to your interrupt issue, such as using the EDMA3 to keep data moving when a peripheral is ready or using the PRU to off-load some simple ISR functions.

    4. FFT functions can be broken into smaller FFTs and then executed separately. This is a technique that I first learned about from our MultiCore team for running a very long FFT (up to 1,000,000 points) by breaking it up across several DSP cores. This technology might not be publicly available from TI (you can search TI.com for it), but it is probably available from some of the universities that share on the internet. You could use several smaller executions, and those would allow interrupts to occur in between those calls, reducing your latency.

    5. Consider moving to the OMAPL138 which would add the ARM9 processor for you to use for servicing these interrupts.

    Regards,
    RandyP

  • RandyP,

    Many thanks for the suggestions, especially for the idea to use the ARM for the interrupt processing. Unfortunately the C-implementation (even optimized) degrades time performance approx. 14x and therefore isn’t applicable, the DMA and spitted FFT we use already. Anyway, you have confirmed my worst fears that I had priorily to the discussion – it will be not a one-click action, but a full load job including hand-made assembly optimization.

    In principle I have already enough information to estimate and plan our work schedule and manpower and I could close the thread as the answered questions but I do it some later, awaiting some further posts. If I will be directly involved in the development process later I open a new optimization related thread, as you proposed.

    On this occasion I would like to provide a feedback to the C6000 DSPLib team and would be grateful if you could help me.

    The newest C674x DSPLib release positions the DSPF_sp_fftSPxSP and DSPF_sp_ifftSPxSP as the interruptible functions, which will be only supported further. In fact they do not allow “in-place” computation. As for me, I would prefer to have a full set of different, reworked, interrupt configurable FFT-related functions, for what the C6000 DSP team should make efforts. And I’m absolutely sure this point of view will be supported and appreciated from lot of DSP developer.

    Kindly

    GenPol