C674x internal resource conflict exception

Eric Chin

Prodigy 120 points

Other Parts Discussed in Thread: OMAP-L137

Hardware:

Custom OMAP-L137 board

CGT 7.2.5

DSP/BIOS 5.41.09

dsplink 1.63

Linking pre-built c67xmathlib 2.01 FastMath library

We're getting an internal resource conflict exception during runtime that we're trying to understand. It can take anywhere from a few minutes to an hour before an exception occurs, after which the dsp ends up in UTL_halt indefinatley. We've traced the exceptions to two specific lines, but don't see why these would generate the type of exception that we're seeing.

What are the possible sources for a resource conflict exception? Any ideas about why we're seeing this?

Note that we're linking in the FastMath library rather than using the standard RTS.

== DETAILS ==

For some reason, the ROV log that should have the exception trace information only has "null value" in the fields, but by setting a breakpoint at hwi1, we're able to get the relavent exception information:

EFR = 0x2 (internal exception)

IERR = 0x10 (RCX, Resource conflict exception)

From what I understand, B3 is the function return pointer, and NRP is the execution pointer at the time of the interrupt. By following these back, we trace the source to one of two lines from the same function in the c code. Out of 7 runs, the dsp crashed here 4 times:

NRP = 0xC3E13BFC

B3 = 0xC3E13BEC

4616 eml_x_0 = 0.125 * sin(2.0 * eml_x);

0xC3E13BBC: 033DA3E6 LDDW.D2T2 *+SP[13],B7:B6

0xC3E13BC0: 0280A35A MVK.L2 0,B5

0xC3E13BC4: 0297DE8A SET.S2 B5,30,30,B5

0xC3E13BC8: 0200A35A MVK.L2 0,B4

0xC3E13BCC: 01F2AC28 MVK.S1 0xffffe558,A3

0xC3E13BD0: 0210C702 MPYDP.M2 B7:B6,B5:B4,B5:B4

0xC3E13BD4: 01E1F1E8 MVKH.S1 0xc3e30000,A3

0xC3E13BD8: 00008000 NOP 5

0xC3E13BDC: 000C1362 B.S2X A3

0xC3E13BE0: 01834162 ADDKPC.S2 C$RL424 (PC+12 = 0xc3e13bec),B3,2

0xC3E13BE4: 02101FD8 OR.L1X 0,B4,A4

0xC3E13BE8: 02941FD8 OR.L1X 0,B5,A5

C$RL424:

0xC3E13BEC: 0380A358 MVK.L1 0,A7

0xC3E13BF0: 039EDD89 SET.S1 A7,22,29,A7

0xC3E13BF4: 0300A358 || MVK.L1 0,A6

0xC3E13BF8: 02188700 MPYDP.M1 A5:A4,A7:A6,A5:A4

0xC3E13BFC: 00010000 NOP 9

0xC3E13C00: 023E23C4 STDW.D2T1 A5:A4,*+SP[17]

And here 3 times:

NRP = 0xC3E13E10

B3 = 0xC3E13DF8

4638 eml_a[2] = -((cos(2.0 * eml_x) + 1.0) / 2.0 + 0.75);

0xC3E13DC8: 023DA3E6 LDDW.D2T2 *+SP[13],B5:B4

0xC3E13DCC: 0380A35A MVK.L2 0,B7

0xC3E13DD0: 039FDE8A SET.S2 B7,30,30,B7

0xC3E13DD4: 0300A35A MVK.L2 0,B6

0xC3E13DD8: 01F12028 MVK.S1 0xffffe240,A3

0xC3E13DDC: 02188702 MPYDP.M2 B5:B4,B7:B6,B5:B4

0xC3E13DE0: 01E1F1E8 MVKH.S1 0xc3e30000,A3

0xC3E13DE4: 00008000 NOP 5

0xC3E13DE8: 000C1362 B.S2X A3

0xC3E13DEC: 01864162 ADDKPC.S2 C$RL428 (PC+24 = 0xc3e13df8),B3,2

0xC3E13DF0: 02101FD8 OR.L1X 0,B4,A4

0xC3E13DF4: 02941FD8 OR.L1X 0,B5,A5

C$RL428:

0xC3E13DF8: 0380A358 MVK.L1 0,A

0xC3E13DFC: 039E9D89 SET.S1 A7,20,29,A7

0xC3E13E00: 0300A359 || MVK.L1 0,A6

0xC3E13E04: 0302F02A || MVK.S2 0x05e0,B6

0xC3E13E08: 0210C319 ADDDP.L1 A7:A6,A5:A4,A5:A4

0xC3E13E0C: 0361F26A || MVKH.S2 0xc3e40000,B6

0xC3E13E10: 00180362 B.S2 B6

0xC3E13E14: 0280A35A MVK.L2 0,B5

0xC3E13E18: 0297DE8A SET.S2 B5,30,30,B5

0xC3E13E1C: 01892162 ADDKPC.S2 C$RL429 (PC+36 = 0xc3e13e24),B3,1

0xC3E13E20: 0200A35A MVK.L2 0,B4

eml_x, eml_x_0, and eml_a[2] are all doubles. We're using embedded matlab to generate the C code, so the names and style are somewhat strange. In both cases, it looks like the the exception is happening around the sin and cos calls.

The asm sections look like this:

;----------------------------------------------------------------------

; 4616 | eml_x_0 = 0.125 * sin(2.0 * eml_x);

;----------------------------------------------------------------------

LDDW .D2T2 *+SP(104),B7:B6 ; |4616|

ZERO .L2 B5

SET .S2 B5,0x1e,0x1e,B5

ZERO .L2 B4 ; |4616|

MVKL .S1 _sin,A3

MPYDP .M2 B7:B6,B5:B4,B5:B4 ; |4616|

MVKH .S1 _sin,A3

NOP 5

$C$DW$2335 .dwtag DW_TAG_TI_branch

.dwattr $C$DW$2335, DW_AT_low_pc(0x00)

.dwattr $C$DW$2335, DW_AT_name("_sin")

.dwattr $C$DW$2335, DW_AT_TI_call

CALL .S2X A3 ; |4616|

ADDKPC .S2 $C$RL424,B3,2 ; |4616|

MV .L1X B4,A4 ; |4616|

MV .L1X B5,A5 ; |4616|

$C$RL424: ; CALL OCCURS {_sin} {0} ; |4616|

ZERO .L1 A7

SET .S1 A7,0x16,0x1d,A7

|| ZERO .L1 A6 ; |4616|

MPYDP .M1 A5:A4,A7:A6,A5:A4 ; |4616|

NOP 9

STDW .D2T1 A5:A4,*+SP(136) ; |4616|

.dwpsn file "sourcefile.c",line 4617,column 3,is_stmt

;----------------------------------------------------------------------

; 4638 | eml_a[2] = -((cos(2.0 * eml_x) + 1.0) / 2.0 + 0.75);

;----------------------------------------------------------------------

LDDW .D2T2 *+SP(104),B5:B4 ; |4638|

ZERO .L2 B7

SET .S2 B7,0x1e,0x1e,B7

ZERO .L2 B6 ; |4638|

MVKL .S1 _cos,A3

MPYDP .M2 B5:B4,B7:B6,B5:B4 ; |4638|

MVKH .S1 _cos,A3

NOP 5

$C$DW$2339 .dwtag DW_TAG_TI_branch

.dwattr $C$DW$2339, DW_AT_low_pc(0x00)

.dwattr $C$DW$2339, DW_AT_name("_cos")

.dwattr $C$DW$2339, DW_AT_TI_call

CALL .S2X A3 ; |4638|

ADDKPC .S2 $C$RL428,B3,2 ; |4638|

MV .L1X B4,A4 ; |4638|

MV .L1X B5,A5 ; |4638|

$C$RL428: ; CALL OCCURS {_cos} {0} ; |4638|

ZERO .L1 A7

SET .S1 A7,0x14,0x1d,A7

|| ZERO .L1 A6 ; |4638|

|| MVKL .S2 __divd,B6

ADDDP .L1 A7:A6,A5:A4,A5:A4 ; |4638|

|| MVKH .S2 __divd,B6

$C$DW$2340 .dwtag DW_TAG_TI_branch

.dwattr $C$DW$2340, DW_AT_low_pc(0x00)

.dwattr $C$DW$2340, DW_AT_name("__divd")

.dwattr $C$DW$2340, DW_AT_TI_call

CALL .S2 B6 ; |4638|

ZERO .L2 B5

SET .S2 B5,0x1e,0x1e,B5

ADDKPC .S2 $C$RL429,B3,1 ; |4638|

ZERO .L2 B4 ; |4638|

$C$RL429: ; CALL OCCURS {__divd} {0} ; |4638|

ZERO .L1 A7

MVKH .S1 0x3fe80000,A7

|| ZERO .L1 A6 ; |4638|

ADDDP .L1 A7:A6,A5:A4,A5:A4 ; |4638|

ZERO .L2 B4 ; |4638|

SET .S2 B4,31,31,B5 ; |4638|

NOP 5

XOR .L2X A5,B5,B5 ; |4638|

MV .L2X A4,B4 ; |4638|

STDW .D2T2 B5:B4,*+SP(168) ; |4638|

.dwpsn file "sourcefile.c",line 4639,column 8,is_stmt

compiler options: cl6x -g -d"_DEBUG" --no_compress -ss -al -q -pdr -pden -ml3 -mv6740 --disable:sploop

linker options: cl6x -pdr -z -w -i/opt/TI/bios/packages/ti/bios/lib -i/opt/TI/c6000cgt/lib -i/opt/TI/bios/packages/ti/rtdx/lib/c6000 -i/opt/TI/bios/packages/ti/psl/lib -c -q -x -l/opt/TI/c67xmathlib_2_01_00_00/lib/c674xfastMath.lib

ROV shows that the c and task stacks are large enough.

Possible relate issues:

http://e2e.ti.com/support/development_tools/compiler/f/343/t/120757.aspx

http://e2e.ti.com/support/embedded/f/355/p/61925/314208.aspx#314208

over 14 years ago

0 Todd Hahn over 14 years ago

TI__Expert 3455 points

I would suspect program memory is being overwritten somehow, especially if you know this code is being executed correctly many times before the crash. However, if this is the first time this code is being executed, and it fails on the first time through, the only thing I can think of is sin and cos have instructions that write registers after the function has returned. This would be unlikely however, as the compiler does not allow instructions to write after the function in which those instructions are located has returned. i.e. the compiler would make sure all instructions write their results before cos returned to its caller.

0 Eric Chin over 14 years ago in reply to Todd Hahn

Prodigy 120 points

Hi Todd,

Thanks for your response. I did a quick diff between the pre-crash and post-crash dissassembly near the exception point, but didn't see any differences. Is there an easy way to verify the state of the program memory after the exception with the .out file in CCS? Could there be a issue with the cache?

I've been trying to find all the potential causes for a resource conflict exception in the documentation, but haven't found a complete list. Do you know what these are? This might help us pinpoint the cause.

Also, the dsplink default makefile has sploop disabled. Since we didn't know why, we left it in. However, we're linking the pre-compiled fastmath library, which may or may not use sploop. Is a possibility for issues here?

Eric

0 Todd Hahn over 14 years ago in reply to Eric Chin

TI__Expert 3455 points

Look also in the sin and cos functions for program memory changes.

There isn't a documented list, but it is basically things like two different latency instructions trying to use a functional unit's read or write port to the register in the same cycle. Or, in the case of program memory corruption, you could have two instructions trying to operate on the same functional unit.

Should be no issues w/ SPLOOP and fastmath rts.

0 Eric Chin over 14 years ago in reply to Todd Hahn

Prodigy 120 points

Hi Todd,

I did a diff between the entire contents of the .text section and the .bios sections in memory pre and post crash and didn't see any differences between them. Are there any other places I should look? Could there be any cache issues?

I also tried some experiments with SPLOOP and didn't find any effects, as you predicted.

We've also tried replacing all of the sindp and cosdp with single precision operations and so far haven't seen any crashes. We're not convinced that this is a solution to the problem, but this maybe a clue as the underlying issue. Any thought about this?

Thanks for your help!

Eric

0 Eric Chin over 14 years ago in reply to Eric Chin

Prodigy 120 points

Here's the resolution:

It looks like the exception is being caused by a bug in the fastmath library that returns while a mpydp.m1 instruction is still in progress for certain inputs. This can causes a resource conflict if the calling function tries to use the .m1 functional unit soon after the call. As far as we know, this affects sindp and cosdp in c67xmathlib_2_01_00. We also suspect that there maybe a similar bug with atan2dp based on earlier problems we've run into.

We've contacted TI about this issue, so hopefully, there'll be a revised version of this library soon.

Thanks

0 Rahul Prabhu over 14 years ago in reply to Eric Chin

TI__Guru** 116170 points

Eric,

Thanks for reporting the issue with the assembly version of the fastmath trignometric functions. We are looking at resolving this issue and will try to fix this in the next release. In the meantime if you have not been able to fix this issue yet, I would recommend using the intrinsic C version of the sindp, cosdp and atandp functions included in the library. You can find the intrinics versions in the source provided with the library. The C intrinsic version of the same API adds a suffix _c to the original API. These functions will provide you almost the same performance as the assembly version but will not have the issue that you see with the hand assembly. Here are the performance estimates

8561.c67xfastRTS_Benchmarking.pdf

Unfortunately the issue that you are reporting did not show up in our test bench which is also provided in the library. Please let us know if this resolves the issue you are facing and let us know if the performance you get from intrinsic C version of the API suffices your application requirement.

Regards,

Rahul

0 Geoff Dolan over 14 years ago in reply to Rahul Prabhu

Prodigy 50 points

For the next person to find this issue, we (Eric and I) tracked this down to the small answer return path of the sindp and cosdp functions in the fastmath library (they are identical with a few extra lines for the cosdp function). The m1 unit isn't allowed enough delay before the return branch and is still writing to register A7 on the first operation after returning. If this next operation writes to A7, an exception is issued. Here's the code from sindp starting at the mpydp instruction:

LTpi: ; label if no arg. reduction

mpydp .m1 A5:A4, A5:A4, A7:A6 ; F = A*A

|| lddw .d2 *+B2[3], B5:B4 ; T = r8

|| extu .s1 A5,1,21,A2 ; reduced exp

|| mvkl .s2 1023-20,B1 ; MIN exp

cmplt .l1x A2,B1,A2 ; V = true for small exp

|| extu .s1 A1,31,0,A1 ; move sign to bit 31

lddw .d2 *+B2[4], A1:A0 ; S = r7

|| mv .l2x A1,B0 ; sign = bit 31 position

[A2] b .s2 b3 ; return small answer exit

|| [A2] or .l1x A5,B0,A5 ; set sign bit for small ans

nop 5 ; wait for F (or take b b3)

A single nop between lddw || mv and the branch call should be sufficient to fix the vulnerability. As for why the multiply in conjunction with the sindp call is at fault, it may have something to do with code that compiles into using the A7 register on the return. We use sindp everywhere in our code, but only had issues when doing double precision multiplies on the results.

Hope this helps,

Geoff

0 Eric Chin over 13 years ago in reply to Rahul Prabhu

Prodigy 120 points

Hi Rahul,

Thanks for the suggestion. We're linking the fastmath library as a replacement for the standard RTS rather than directly calling the fastmath API function names. Is there a simple way to rebuild the library so that the linked functions use the intrinsic versions rather than the asm versions? I didn't see any obvious switches in the make files.

Eric

0 Rahul Prabhu over 13 years ago in reply to Eric Chin

TI__Guru** 116170 points

Eric,

The fastmath library does not archive the intrinsic version in the library so there is no apparent switch in the library. The easiest way to use them is to include the cosdp_c.c, sindp_c.c and atandp_c.c as source files to your project and then scan for the trigonometric functions(sin,cos,atan) in your code and replace them with the intrinsic APIs (cosdp_c,sindp_c, atandp_c).

But if you still wish to use the library, then you might have to remove the asm files from the library build project (given in the build folder ) and rebuild the library but that would still require you to rename the APIs in your code with the _c version of the RTS functions.

Best Regards,

Rahul

0 Eric Chin over 13 years ago in reply to Rahul Prabhu

Prodigy 120 points

Hi Rahul,

Because our C code is generated from matlab code, we'd like to avoid using the C calls. I'd like to attempt to rebuild the fastmath library so that the intrinsic functions get called at the linker step. I've already modified the make file to build and archive the correct files, but I need to tell the linker to override the rts functions. What is the best way to do the equivalent of the following asm code for the C functions?

.if __TI_EABI__

.asg divdp, _divdp

.endif

.global _divdp ; entry labels

.if (OVERRIDE_RTS = 1)

.global __divd

.endif

Are there any other complexities that we should be aware of?

thanks,

Eric

0 Rahul Prabhu over 13 years ago in reply to Eric Chin

TI__Guru** 116170 points

Eric,

You don`t need to worry about the __TI_EABI__ symbol as that is a symbol used only when you are building ELF binaries. Now in order to override the RTS functions, I think there are two ways to do this

Method 1: (Minimum modifications to source)

Easiest way is to create a header and use #define to replace RTS function calls to intrinsic function calls.

For Eg: #define cos(x) cosdp_c(x)

Include the header in your source, to override the RTS functions

Important: Inorder for this to work you will have to use --gcc option with the C6000 compiler.

If you are working in the CCS environment this option is available under Properties->C/C++Build->Settings->Compiler->Language Options

Method 2: (No modification to your source code but change in source of the intrinsic functions)

Rename the intrinsic functions to match the RTS function names/symbols in their source file before rebuilding library and continue to link the library the way you have been doing earlier (prelink the fastMath library before the RTS library)

I have not tried these solutions myself but I am reasonably confident this will work. Please let us know how this goes.

Regards,

Rahul

0 Eric Chin over 13 years ago in reply to Rahul Prabhu

Prodigy 120 points

Thanks Rahul,

I ended up doing Method 2 last night and that seemed to work smoothly. I'm in the process of testing our system with the intrinsic c library to see if there are any problems.

We're assuming that the C code will be more reliable than the hand asm code since there's less chance of subtle errors. However, we're a little worried that these functions haven't been vetted as much since most people probably just link the asm functions. How much testing has the intrinsic library gone through? Can we be confident that the library is safe? Given the problems that we've had so far, we're a little paranoid at the moment.

One more question: given the significant increase in performance for the fastmath library, why isn't this code used in the default RTS?

Also, there's an error in line 142 of the c67xmathlib/build/Makefile: DIVDP should be DIVSP.

Thanks,

Eric

0 Rahul Prabhu over 13 years ago in reply to Eric Chin

TI__Guru** 116170 points

Eric,

The testing for each of the intrinsic functions in fastMath was done by creating a stand alone single thread applications where these functions are checked for accuracy and performance over 100 runs. In terms of accuracy the optimized C and assembly versions are bit exact and I have provided you the difference in perfromance estimates in an earlier post.

Since these functions can also be use in an ARM+DSP implementation, we have also checked calling of these functions from the ARM side over the dsplink in a cache enabled implementation on the DSP where data comes from the external memory. It is unfortunate that you had to work through this issue but there are several other active developers who are working with this library and this is the first major issue that was reported with the library so far.

The fastMath functions removes some control code and checks from the RTS functions to improve efficiency, rewrites some parts of the code to allow the code to be pipelined and cuts down on the terms in the expansion series of some functions as an efficient tradeoff between accuracy and performance. Also,you may have seen not all the RTS functions are rewritten in this library so it continues to make sense to provide the normal RTS library with the full functionality and to provide a subset of optimized functions that are generally used to math intensive real time applications.

Thanks for your patience in dealing with this issue. Looking forward to hearing from you after you complete your testing.

Best Regards,

Rahul

0 Todd Hahn over 13 years ago in reply to Eric Chin

TI__Expert 3455 points

Eric Chin said:

One more question: given the significant increase in performance for the fastmath library, why isn't this code used in the default RTS?

It is my understanding (correct me if I am wrong Rahul) that the fastMath library does not handle certain exceptional cases in some instances. e.g. Overflow, NaN, etc. It is also less precise in some/many cases.

0 Paul Newton over 13 years ago in reply to Todd Hahn

Intellectual 340 points

I have spent a long time trying to work out why a stand alone test program that a colleague wrote ran successfully, whilst when I ported it to run in a DSP/BIOS thread, I would get exceptions.

I held back from posting to the forum for some time as I did not want my first post to be a cry wolf!

http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/p/133991/482799.aspx#482799

In my case the exceptions occurred when sindp was called with an argument of zero, and the result was passed to another function.

For example:

x = sqrt(sin(y));

would cause the NMI HWI exception if y was zero.

I found that breaking the call down fixed the problem:

x = sin(y);
x = sqrt(x);

does not cause the excepetion when y is zero.

I am assuming that the compiler assigns registers differently in a DSP/BIOS environment, as the stand alone non-DSP/BIOS version did not have any apparent issues.

I have just rebuilt the c67xmathlib_2_01_00_00 library using the suggested changes courtesy of Geoff and Eric.

(I added a single nop to sindp.asm and cosdp.asm.)

I now have working code.

I am aware that I may encounter other bugs in the library if my code just happens to use the wrong combination of data at runtime.

For anyone following this post with the same issue wondering whether to edit the source code or edit the library, I would like to say that rebuilding the library was trivial.
It will take you approximately 5 minutes to rebuild the library and then update your own project.
In your own project it does not involve editing any header files or source files. E.g. no search and replace.

On the down side, there may be other bugs in the library that are missed - so use caution making your decision.

In CCS:

select Project->Import Existing,
browse to the build folder in the c67xmathlib_2_01_00_00 library on your harddisc
import the project,
edit the files (see Geoff's post above - 09-01-2011 5:07 PM)
build the project.

Note that the resulting library is placed in the folder C:\c67xmathlib_2_01_00_00\build\C674x and is called c674xfastMath_rebuild.lib

Hence you need to modify your own project(s) to link in the rebuilt library:

Open the Build Properties -> C/C++ Build->C6000 Linker -> File Search Path

locate the entry for the original library (by default this would be "C:\c67xmathlib_2_01_00_00\lib\c674xfastMath.lib")
replace it with the new path and name (by default "C:\c67xmathlib_2_01_00_00\build\C674x\c674xfastMath_rebuild.lib")

If you are using DSP/BIOS you will (no doubt) have added the library into the Build Properties -> CCS Build->Link Order tab.
(If you have not done this then you will not be using the library correctly as DSP/BIOS inserts the standard rts library into the linker command file prior to any user specified libraries.)
So you will need to check that that is also updated.

I would like to take this opportunity to thank Geoff and Eric for posting the assembler fix.

Much appreciated.

Paul

0 Eric Chin over 13 years ago in reply to Rahul Prabhu

Prodigy 120 points

For anybody following this post,

We tested the C intrinsic functions from the fastmath library as a replacement for the assembly version and found a number of serious problems, particularly around the edge cases. We've contacted TI regarding these issues, so hopefully there will be an updated library soon. In the mean time, I do not recommend using the C intrinsic or vectorized functions in the fastmath api.

Eric

Processors

Processors forum

C674x internal resource conflict exception