CCS/AM5728: Shift-JIS code support in CCS

shigehiro tsuda

Part Number: AM5728

Tool/software: Code Composer Studio

Hi,

Japanese (Shift-JIS code) of character strings and comments is not recognized correctly,

If there is a character with 2 bytes of character 0x5C, it is recognized as an escape sequence,
An error occurs when compiling the project.

Our customers previously used CCS version 4.2. It seems that there was no problem at that time.
It seems that it was building with "--multibyte_chars" option added.

Please tell me what can be avoided with the build option of CCS version 7.4 project.

customer environment
CCS version 7.4.0
TI v16.9.6 LTS

Best Regards,
Shigehiro Tsuda

over 6 years ago

0 Naoki Kawada over 6 years ago

Guru 19890 points

Tsuda-san, I think it is better to show some example code to recreate the issue in order to address the problem at TI side.

Best Regards,

Kawada

0 George Mock over 6 years ago

TI__Guru**** 232670 points

Compilers from Texas Instruments only support US ASCII characters in all input. Please see this forum thread for more detail.

shigehiro tsuda said:
Our customers previously used CCS version 4.2. It seems that there was no problem at that time.

I don't know how to explain that.

Thanks and regards,

-George

0 Naoki Kawada over 6 years ago in reply to George Mock

Guru 19890 points

Hi George,

Thanks for your reply. My customer definitly requires SJIS support on CCS... They are using SJIS strings in program, for example, such like this:

memcpy (buf, "SJIS charactars", size);

There are so many code related to SJIS characters in there program and it is very difficult for them to modify the existing code (replacing SJIS string to Hex array). As Tsuda mentioned, their legacy code worked on CCSv4.2 would have to reuse for new K2G or AM57x based design and it requires newer CCS to develop their application with ProcSDK. I understand your policy for SJIS on CCS, but could you please let us know the latest version of CCS to support SJIS characters ?

Best Regards,

0 George Mock over 6 years ago in reply to Naoki Kawada

TI__Guru**** 232670 points

Naoki Kawada said:
please let us know the latest version of CCS to support SJIS characters ?

No version of CCS has ever supported SJIS characters.

Naoki Kawada said:
their legacy code worked on CCSv4.2

I cannot explain that.

Thanks and regards,

-George

0 Archaeologist over 6 years ago in reply to Naoki Kawada

TI__Guru* 84225 points

Please try converting the files to UTF-8. The TI compiler never officially supported JIS, but we have updated the parser so that it should accept UTF-8 characters in comments.

0 Naoki Kawada over 6 years ago in reply to Archaeologist

Guru 19890 points

Hi,

Thanks for your answers. Ok.. so, what kind of encoding is officially supported on newer CCS (v7 or later) with this particular version of tool chain ? How about MS932 ? My understanding is it is very similar encoding with SJIS. I need to visit the customer to explain this issue next Tuesday (Japan time).

Best Regards,
NK

0 Naoki Kawada over 6 years ago in reply to Naoki Kawada

Guru 19890 points

Do you have any response on this issue ? I`m waiting for your reply.

Best Regards,

0 George Mock over 6 years ago in reply to Naoki Kawada

TI__Guru**** 232670 points

Naoki Kawada said:
what kind of encoding is officially supported on newer CCS (v7 or later) with this particular version of tool chain ?

As to what is officially supported: In all TI compilers, including the latest versions, only US ASCII characters are supported.

This next part does not describe an officially supported feature, but an extension you can consider. A text file encoded in UTF-8 can contain non-ASCII characters, but only in the comments. It is of some debate whether this is useful. Suppose you make a mistake and a non-ASCII character is used in a string constant, variable name, or somewhere else other than a comment. The TI compiler cannot diagnose this mistake and tell you about it. It may silently ignore the character, or issue a unrelated error message, or something similar.

Thanks and regards,

-George

0 Naoki Kawada over 6 years ago in reply to George Mock

Guru 19890 points

Hi George,

Thanks for your reply. Ok, TI compilers support only US ascii encoding. So how about CCS ? CCS (for example, v7.1) preference shows the following property. Please take a look at red rectangle.

It seems MS932, or other encoding type can be selected. How should we think about that ? Compiler has been validated to work with US ASCII, right ? So should we select US ASCII encoding for proper operation on CCS ?

Best Regards,
NK

0 Naoki Kawada over 6 years ago in reply to Naoki Kawada

Guru 19890 points

Some additional questions from the customer ... about --multibyte_chars build option.
Legacy TI ARM compiler, for example, v5.2 used to support --multibyte_chars build option. But it has been removed or deprecated from newer LTS tool chains. Can you share the reason why this option had been removed from your support ?

Best Regards,
NK

EDIT
Sorry to have troubled you so much, but the customer has to make decision very soon to move forward. Please let me know If you have any additional suggestion for this issue rather than modifying their code, or using supported encoding.

0 Archaeologist over 6 years ago in reply to Naoki Kawada

TI__Guru* 84225 points

The underlying source code for the parser was changed. The newer parser supports UTF-8 in some contexts by default, with no option needed.

0 Naoki Kawada over 6 years ago in reply to Archaeologist

Guru 19890 points

So in summary, newer CCS (say, v7 or later) will support UTF-8 without any options. As for TI compiler, however, supports US ascii encoding only to understand/parse the existing code properly. The customer will have to change file encoding to UTF-8 and modify their code like this:

from
memcpy (buf, "――――――――", size);

to
unsigned short double_dash_array[8] = {0x815C,0x815C,0x815C,0x815C,0x815C,0x815C,0x815C,0x815C};
memcpy (buf, double_dash_array, sizeof(double_dash_array));

...Correct ?

Best Regards,
NK

0 Archaeologist over 6 years ago in reply to Naoki Kawada

TI__Guru* 84225 points

The compiler version is not related to the CCS version. The CCS version should have nothing to do with what the TI compiler version accepts.

Multi-byte characters in string constants is a different matter. At this time, the TI compiler only officially supports ASCII characters in string constants.

0 George Mock over 6 years ago in reply to Naoki Kawada

TI__Guru**** 232670 points

Naoki Kawada said:
It seems MS932, or other encoding type can be selected. How should we think about that ?

Eclipse is a generic development environment that is the basis for more than just Code Composer Studio. I'm not sure what is affected by the setting titled Text file encoding. Probably the built-in text editor, and maybe some of the dialog boxes where you can type free form text.

As for what encoding you should use ... I have to get back to you on that. For now, I can tell you that the encoding used in my copy of CCS is named Cp1252. So that works for me. I suspect some of the other encodings will work as well, but I need to check.

Thanks and regards,

-George

0 Naoki Kawada over 6 years ago in reply to George Mock

Guru 19890 points

Hi George,

Thanks for your reply. Ok, please let me know if you have any update about CCS Text File Encoding.

I talked with the customer and they are still sticking to multibyte character support on TI compiler. You mentioned TI does not support multibyte character in string constant (except ASCII), but in newer TI compiler document, it says like this:

www.ti.com/.../spnu151r.pdf

J.3.2 Environment

The compiler does not support multibyte characters in identifiers, so there is no mapping from multibyte characters to the source character set. However, the compiler accepts multibyte characters in comments, string literals, and character constants in the physical source file. (5.1.1.2)

I believe this statement is still documented in much older version of tool chain. How should we think about this statement ?

My customer strongly wanted to know :
- What file encoding is actually supported for multibyte chars (for string constant)
- Where is that documented ?

They are strict for the documentation. Based on this, they would like to consider the next step to close the issue.

Best Regards,
NK

0 George Mock over 6 years ago in reply to Naoki Kawada

TI__Guru**** 232670 points

It is now clear it will take a few days to determine the best reply. I'm sorry I didn't realize it before now. I didn't even understand how little I knew about international character sets. Please be patient.

Thanks and regards,

-George

0 Naoki Kawada over 6 years ago in reply to George Mock

Guru 19890 points

Hello George,

Ok, we are waiting for your reply here. We could not find the compiler documentation for the particular version of tool chain in web site, but my customer found it on CCS help. Here are snapshots. Just for your information.

Best Regards,
NK

0 George Mock over 6 years ago in reply to Naoki Kawada

TI__Guru**** 232670 points

It is clear the TI documentation about support for international characters is incorrect and lacking. I will start correcting it in this thread. I don't expect to get it completely right the first time. But I am confident we will get there. Once it has settled, I will file a bug report to request that the documentation be changed to be close to, if not the same, as what is in this thread.

The TI compiler never officially supported the shift JIS character set and encoding
C and C++ comments may contain any Unicode character
In all other contexts (strings, identifiers, assembly files, link command files, etc.) only 7-bit ASCII characters are supported
If a C or C++ source file contains anything other then 7-bit ASCII characters, it must be encoded in UTF-8
If a text file contains only 7-bit ASCII characters, other widely recognized encodings can be used, so long as they are 8-bit byte oriented. Examples include Cp1252 and ISO-8859-1.
Wide character (wchar_t) types and operations are supported. This cannot be taken to mean wide character strings may contain characters beyond 7-bit ASCII. The encoding of wide characters is 7-bit ASCII, 0 extended to the width of wchar_t (16 or 32 bits).
Should an extended character appear outside of a C or C++ comment, the behavior of the tools (compiler, assembler, linker, etc.) is not well defined. A diagnostic may or may not be issued. When a diagnostic is issued, it is often not directly related to the problem character(s). Whatever behavior you see in the tools, it is subject to change in a future release.

Some further related comments ...

In CCS, when you use extended characters in your C or C++ comments, the selection Preferences | General | Workspace | Text File Encoding should be UTF-8.

Older compilers supported the option --multibyte_chars. I have tried, and failed, to discover an example of what effect this option had. The documentation states ...

Enables support for multibyte character sequences in comments, string literals and character constants

That is probably an error.

It possible to write code which can handle characters beyond 7-bit ASCII, and uses other encodings like shift JIS. But there is no built-in support in the compiler for it. All the details of input, output, initialization, display, etc. are left to the user.

Thanks and regards,

-George

0 Naoki Kawada over 6 years ago in reply to George Mock

Guru 19890 points

Hello George,

Thanks for your clarification. I suggested the same to the customer and they basically understood your points, but one thing, I would like you to ask -- can we move over to off-line message?

Best Regards,
NK

0 George Mock over 6 years ago in reply to Naoki Kawada

TI__Guru**** 232670 points

Concluding this thread ...

Conversations continued offline. The customer is looking into whether other ARM compilers have better support for shift JIS text.

Thanks and regards,

-George

Code Composer Studio™︎

Code Composer Studio forum

CCS/AM5728: Shift-JIS code support in CCS