This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software: Code Composer Studio
Hello Champs,
One of my customer is asking me their MPY command in the ISR takes long time comparing IAR compiler.
So I tried to reproduce it, but I can't reproduced it. I asked them to support source code but they can't due to the security issue.
1. They just sent me the C++ code with disassembled code and map file. I am attaching it.
This source code is executed at DMA ISR and DMA is triggered when 3ch SD24 is converted.
Their source code is C++ based and file extension is .cpp.
Compiler option is set as use the MPY(F5). This means they did as below :
- MSP430 Compiler - Optimization - "--use_hw_mpy=F5"
- MSP430 Compiler - Advanced Options - Advanced Optimization - "--disable_interrupts_around_hw_mpy=On" <- they need because several interrupt use mpy
- MSP430 Linker - Advanced Options - Symbol Management - "--use_hw_mpy=F5"
Their result shows over 1000 cycles for excuting 3 C/C++ commands below. I don't remember it exactly, but it takes 32, 32, 1600 something accordingly.
*(int32_t *)(&MPYS32L) = adc_buf[BSP_METROLOGY_AFE_VOLTAGE_A_CHANNEL];
*(int32_t *)(&OP2L) = *(int32_t *)(&MPYS32L);
afe_data->V_RMS_Src += *(int64_t *)(&RES0);
With the same source code, but when they tried on IAR showed around 365 cycles in total of 3 commands.
2. I just tried to reproduced it using CCSv8 with simple periodic timer and inserted code inside the ISR.
But it just take 17(in case of local variable) or 36(in case of global variable) cycles.
Q) : How can they modify the source code for reducing cycles? Could you suggest some modification?
I agree they use pointer structure take some more assembly operation, but huge difference between CCS and IAR.
cf) : They have some issue using IAR, IAR can't use over 4KB RAM for F6736 which have 8KB RAM. So they got in trouble to use both CCS and IAR.
<C Code w/ disassembler>
625 *(int32_t *)(&MPYS32L) = adc_buf[BSP_METROLOGY_AFE_VOLTAGE_A_CHANNEL]; 01203e: 41A2 04D4 MOV.W @SP,&MPY_32__Multiplier__32_Bit_Mode_MPYS32L 012042: 4192 0002 04D6 MOV.W 0x0002(SP),&MPY_32__Multiplier__32_Bit_Mode_MPYS32H 626 *(int32_t *)(&OP2L) = *(int32_t *)(&MPYS32L); 012048: 4292 04D4 04E0 MOV.W &MPY_32__Multiplier__32_Bit_Mode_MPYS32L,&MPY_32__Multiplier__32_Bit_Mode_OP2L 01204e: 4292 04D6 04E2 MOV.W &MPY_32__Multiplier__32_Bit_Mode_MPYS32H,&MPY_32__Multiplier__32_Bit_Mode_OP2H 627 afe_data->V_RMS_Src += *(int64_t *)(&RES0); 012054: 013F 002C MOVA 0x002c(SP),R15 012058: 0F71 0044 MOVA R15,0x0044(SP) 01205c: 00AF 001F ADDA #0x0001f,R15 012060: 0F71 0048 MOVA R15,0x0048(SP) 012064: 4F58 0003 MOV.B 0x0003(R15),R8 012068: 4309 CLR.W R9 01206a: 430A CLR.W R10 01206c: 430B CLR.W R11 01206e: 403C 0018 MOV.W #0x0018,R12 012072: 13B1 DF8A CALLA #__mspabi_sllll 012076: 4C81 0034 MOV.W R12,0x0034(SP) 01207a: 4D81 0036 MOV.W R13,0x0036(SP) 01207e: 4E81 0038 MOV.W R14,0x0038(SP) 012082: 4F81 003A MOV.W R15,0x003a(SP) 012086: 013F 0048 MOVA 0x0048(SP),R15 01208a: 4F58 0001 MOV.B 0x0001(R15),R8 01208e: 4309 CLR.W R9 012090: 430A CLR.W R10 012092: 430B CLR.W R11 012094: 423C MOV.W #8,R12 012096: 13B1 DF8A CALLA #__mspabi_sllll 01209a: 013B 0048 MOVA 0x0048(SP),R11 01209e: 4B64 MOV.B @R11,R4 0120a0: 4305 CLR.W R5 0120a2: 4306 CLR.W R6 0120a4: 4307 CLR.W R7 0120a6: DC04 BIS.W R12,R4 0120a8: DD05 BIS.W R13,R5 0120aa: DE06 BIS.W R14,R6 0120ac: DF07 BIS.W R15,R7 0120ae: 0BCF MOVA R11,R15 0120b0: 4F5F 0002 MOV.B 0x0002(R15),R15 0120b4: DF05 BIS.W R15,R5 0120b6: D306 BIS.W #0,R6 0120b8: D307 BIS.W #0,R7 0120ba: D304 BIS.W #0,R4 0120bc: D114 0034 BIS.W 0x0034(SP),R4 0120c0: D115 0036 BIS.W 0x0036(SP),R5 0120c4: D116 0038 BIS.W 0x0038(SP),R6 0120c8: D117 003A BIS.W 0x003a(SP),R7 0120cc: 0BCF MOVA R11,R15 0120ce: 4F5F 0004 MOV.B 0x0004(R15),R15 0120d2: DF06 BIS.W R15,R6 0120d4: D305 BIS.W #0,R5 0120d6: D307 BIS.W #0,R7 0120d8: D304 BIS.W #0,R4 0120da: 0BCF MOVA R11,R15 0120dc: 4F58 0005 MOV.B 0x0005(R15),R8 0120e0: 4309 CLR.W R9 0120e2: 430A CLR.W R10 0120e4: 430B CLR.W R11 0120e6: 403C 0028 MOV.W #0x0028,R12 0120ea: 13B1 DF8A CALLA #__mspabi_sllll 0120ee: DC04 BIS.W R12,R4 0120f0: DD05 BIS.W R13,R5 0120f2: DE06 BIS.W R14,R6 0120f4: DF07 BIS.W R15,R7 0120f6: 013F 0048 MOVA 0x0048(SP),R15 0120fa: 4F5F 0006 MOV.B 0x0006(R15),R15 0120fe: DF07 BIS.W R15,R7 012100: D305 BIS.W #0,R5 012102: D306 BIS.W #0,R6 012104: D304 BIS.W #0,R4 012106: 013F 0048 MOVA 0x0048(SP),R15 01210a: 4F58 0007 MOV.B 0x0007(R15),R8 01210e: 4309 CLR.W R9 012110: 430A CLR.W R10 012112: 430B CLR.W R11 012114: 403C 0038 MOV.W #0x0038,R12 012118: 13B1 DF8A CALLA #__mspabi_sllll 01211c: DC04 BIS.W R12,R4 01211e: DD05 BIS.W R13,R5 012120: DE06 BIS.W R14,R6 012122: DF07 BIS.W R15,R7 012124: 5214 04E4 ADD.W &MPY_32__Multiplier__32_Bit_Mode_RES0,R4 012128: 6215 04E6 ADDC.W &MPY_32__Multiplier__32_Bit_Mode_RES1,R5 01212c: 6216 04E8 ADDC.W &MPY_32__Multiplier__32_Bit_Mode_RES2,R6 012130: 6217 04EA ADDC.W &MPY_32__Multiplier__32_Bit_Mode_RES3,R7 012134: 013F 0044 MOVA 0x0044(SP),R15 012138: 00AF 001F ADDA #0x0001f,R15 01213c: 44CF 0000 MOV.B R4,0x0000(R15) 012140: 1084 SWPB R4 012142: 44CF 0001 MOV.B R4,0x0001(R15) 012146: 45CF 0002 MOV.B R5,0x0002(R15) 01214a: 1085 SWPB R5 01214c: 45CF 0003 MOV.B R5,0x0003(R15) 012150: 46CF 0004 MOV.B R6,0x0004(R15) 012154: 1086 SWPB R6 012156: 46CF 0005 MOV.B R6,0x0005(R15) 01215a: 47CF 0006 MOV.B R7,0x0006(R15) 01215e: 1087 SWPB R7 012160: 47CF 0007 MOV.B R7,0x0007(R15)
<map file>
Best Regards,
Ernest Cho
Even though I have to guess at a few things, I think I have a reasonable idea of what is happening. To directly address this issue, we must have a test case. Though I am confident we can limit what is in this test case to a point to where your customer is willing to submit it.
This statement ...
Ernest Cho said:afe_data->V_RMS_Src += *(int64_t *)(&RES0);
... generates a large number of instructions. Thus, it must be the case that the type of V_RMS_Src is a complicated user defined class. This class includes a large function for the operator += . That being the case, there is no way we can reproduce this issue by guessing at the source code. A test case from the customer must be submitted, or nothing more can be done.
The key question appears to be: Why does this statement use out-of-line calls to the multiply function in the compiler RTS instead of using the HW multiply instruction? Once we have a test case, I am confident we can answer. It is a good guess that some change in the source code or build options would help. Or, perhaps a change in the compiler is needed.
Ernest Cho said:I asked them to support source code but they can't due to the security issue.
I understand. But I think they will be pleasantly surprised at how little we need in the test case. For one thing, we do not need the entire project. For the one source file which contains the problem C++ statements, please submit a test case as described in the article How to Submit a Compiler Test Case. Note how you can use the option --preproc_only so comments are not in the file.
If even that is too much, consider this variation. Prior to preprocessing the source file, surround parts of the code which are unrelated like this ...
#if 0 /* unrelated code here */ #endif
Then perform the preprocessing step. The unrelated lines do not appear in the resulting preprocessed file. Make sure the problem assembly code is still generated.
Another step which may help ... Obfuscate the code. Performance an internet search on the term c++ code obfuscator, and use one of those tools.
Note the article says to attach the test case to a forum post. But you are welcome to send it to me by private message within the forum, or by email. No matter how you send it, be sure to include the compiler version and all the build options.
Thanks and regards,
-George
Since it has been a while, I presume you have resolved your problem. I'd appreciate hearing how you resolved it.
Thanks and regards,
-George