This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenMP problem with large arrays

Other Parts Discussed in Thread: SYSBIOS

Hello,

I am working with a C6678 EVM and I want to parallelize a simple FIR-filter using OpenMP. Therefore I have defined two data arrays where the input and output samples should be stored in. In case of "large" arrays (10 000 or more entries) the following error message appears as soon as the parallel region is entered:

[C66xx_0] ti.sysbios.knl.Task: line 340: E_spOutOfBounds: Task 0x90003c80 stack error, SP = 0x8f956b28.
xdc.runtime.Error.raise: terminating execution

In case of smaller arrays everything works fine.

I guess this error has something to do with stack being too small, so I tried to increase the stack size in my linker.cmd file to 300 000 (-stack 0x493e0) and also I tried increasing the stack size of the OpenMP module to 1 000 000 and put the OpenMP stack into shared region 2 which is DDR3 by adding the following statements to my omp_config.cfg file:

OpenMP.stackSize = 1000000;
OpenMP.stackRegionId = 2;

But still the same error message appears.

The platform file I am using is ti.omp.examples.platforms.evm6678.

Thanks for your help!

Best regards,

Erik

  • Hi Erik,

    This is not a stack overflow error. It is a invalid SP pointer. Here is the explanation from the SYS/BIOS API Reference. Take a look at the Task tabs in ROV. Does another look fishy?

    What versions of OpenMP and SYS/BIOS are you using?

    Todd

  • Hello Todd,

    thanks for your reply.

    I am using OpenMP runtime library 1.1.2.03_beta, compiler version 7.4.0.B2 and SYS/BIOS version 6.33.5.46.

    Task 0x90003c80 which the error comes from is ti_omp_utils_OpenMP_mainTask_I. ROV gives the following information about this task:

    - arg0: 0x00000000

    - arg1: 0x00000000

    - stackSize: 1000000

    - stackBase: 0x90003cc80

    Obviously the task adress and the stackBase aren't the same, the stackBase contains one more "c". Could this cause the error?

    I am still wondering why this error doesn't occur when I use smaller arrays...

    Best regards,

    Erik

  • Hi Erik,

    It sounds like something is corrupting the Task Object. Can you give a snapshot of the Task Detail tab in ROV.

    Totally guessing here...maybe with the larger array, memory is arranged differently and something is writing past an array or a uninitialized pointer is corrupting the Task object.

    Todd

  • Hi Todd,

    this is the snapshot of Task Detail tab in ROV. Unfortunately it doesn't provide much information, it only says "SP outside stack!" when moving the mouse over the stackPeak column...

    Is there another way to gain more information on this issue?

    Best regards,

    Erik

  • Hi Erik,

    Can you post a small code fragment that shows how you declare your input/output buffers and the OpenMP pragmas.  I suspect that your getting an extra copy of the buffers by declaring them private or firstprivate and overflowing the stack.

    Thanks, Eric

  • Hi Eric,

    I have attached the source file.

    0513.omp_hello.c

    As you can see, both buffers are declared outside the parallel region, so they should be shared.

    I have increased the stack size to 300000 in my .cfg file. When MAXSIZE is 130000 or even larger, the described error occurs. With MAXSIZE being 120000 or smaller, it works fine.

    But when I set the stack size to 200000 and MAXSIZE to 120000, it also works fine. Shouldn't this cause a stack overflow as two arrays of length 120000 are declared? When MAXSIZE is 130000, the error occurs again. So the behavior seems to be the same for two different stack sizes.

    This is somehow confusing...

    Best regards,

    Erik

  • Erik,

    There are 2 sets of stacks in an OpenMP program - 1 set is the system stack used by BIOS and the OpenMP runtime and is mapped to L2SRAM - there is one of these stacks per core and the size is determined by Program.stack. The other set of stacks are the OpenMP stacks - one per OpenMP thread, including the main/master thread.These stacks are in shared memory and their size and location is controlled by OpenMP.stackSize and OpenMP.stackRegionId.

    In your example, both buffers reside on the main thread's stack. This stack is allocated out of Shared Region 2 as specifiedby stackRegionId. The OpenMP stack size needs to be large enough to hold these buffers plus any stack requirements from function calls etc. An implementation detail - Shared region 2 is also used by BIOS malloc() to allocate Task structures once shared memory has been initialized. So, when you have a stack overflow, you could be overwriting BIOS Task data structures, resulting in unpredictable behavior.

    Here's a probable explanation of the confusing behavior you're seeing with MAXSIZE and stackSize - In your example, you were not writing to all of data_out - so, in certain cases, the stack overflow did not overwrite Task structures, resulting in correct execution.

    I'd suggest leaving the -stack parameter in the linker command file as is, since it corresponds to the system stack. Also ensure that OpenMP.stackSize is large enough to hold the buffers and any stack requirements due to function calls from within the OpenMP thread.

    Hope this helps,

    Ajay

  • Hi Ajay,

    thanks for your reply. It helped me to better understand a few things.

    As you mentioned both buffers are stored into the main thread's stack which is allocated in SharedRegion 2. This region starts at address 0x90000000 and ends at 0x9FFFFFFF, so the base addresses of these two buffers should be within this memory region. Is this correct so far?

    Now I have tried out some combinations of different Program.stack and OpenMP.stack sizes and had a look at the base addresses of the data arrays. For example with Program.stack = 4096 (which is the initial value from the Hello World example) and OpenMP.stack located at SharedRegion2 the base addresses are the following:

    [C66xx_0] Base address of data_in: 8ffee6b0
    [C66xx_0] Base address of data_out1: 8fff82f0
    [C66xx_0] Base address of data_out2: 90001f30

    So obviously they are not completely stored into SharedRegion 2. Then this error message occurs:

    [C66xx_0] ti.sysbios.knl.Task: line 340: E_spOutOfBounds: Task 0x90003c80 stack error, SP = 0x8fffeb80.
    xdc.runtime.Error.raise: terminating execution

    So the address of  the Task overlaps with data_out2. This would explain the unpredictable behavior.

    Do you have an idea why the buffers are not correctly alloceted to region 2?

    Best regards,
    Erik

  • Erik,

    You're correct - the base addresses of data_in, data_out1 and data_out2 should not be outside the shared region. In the process of replicating your issue, I set my OpenMP stack much smaller than the local arrays I had in the test case and I ended up with similar behavior. My shared region starts at 0x89000000 and the addresses were: 0x888000d8, 0x88c000d8, 0x890000d8. What is OpenMP.stackSize set to when you noticed the base address out of shared region behavior?

    Ajay

  • Hi Ajay,

    I have realized that the arrays are out of shared region 2 when I reduced OpenMP.stackSize from 1.000.000 to 100.000. These were the results:

    1. Programm.stack = 100.000, OMP.stack = 1.000.000
    [C66xx_0] Base address of data_in: 0x900da8f0
    [C66xx_0] Base address of data_out1:0x 900e4530
    [C66xx_0] Base address of data_out2: 0x900ee170

    2. Programm.stack = 100.000, OMP.stack = 100.000
    [C66xx_0] Base address of data_in: 0x8fffed50
    [C66xx_0] Base address of data_out1: 0x9008990
    [C66xx_0] Base address of data_out2: 0x90125d0

    3. Programm.stack = 100.000, OMP.stack = 10.000
    [C66xx_0] Base address of data_in: 0x8ffe8dc0
    [C66xx_0] Base address of data_out1: 0x8fff2a00
    [C66xx_0] Base address of data_out2: 0x8fffc640

    One can see that the difference between the base addresses of data_in in the first and the second case are exactly 900.000 which is also the difference of OpenMP.stackSize. So there is an error in memory arrangement, I guess.

    I should also mention that data_out2 gets corrupted even in the first case, when the stack is really big enough. So the problem is not solved with a large stack.

    Best regards,
    Erik

  • Hello,

    as I haven't heard of you for a while I just wanted to ask if there are some news related to my topic.

    My workaround is to use a large stack and to set a breakpoint at the end of the main function. This avoids data_out getting corrupted by some SYS/BIOS functions.

    Do you have a better solution for this problem?

    Best regards,
    Erik

  • Erik,

    We've recently released a production version of the OpenMP runtime. You can get it at: http://software-dl.ti.com/sdoemb/sdoemb_public_sw/bios_mcsdk/02_01_00_03/index_FDS.html. This version requires an updated compiler toolchain version, 7.4.1, available from https://www-a.ti.com/downloads/sds_support/TICodegenerationTools/download.htm.

    Can you try your test case with the updated tools/runtime and let me know if you continue to have this issue? If you do, an option is for you to send me a test case that exposes the problem and I can take a look at it.

    Ajay