This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

XDC and Trigraphs - more common: XDC and '\'

Since the times of Stone Age the C standard includes so-called "trigraphs", to fill the gap of missing some characters ('#', '[', ']', '{', '}', '\', '|', '~', '^') in some very limited character sets. They could be coded by those trigraphs, f.e. '#' as '??='

I am not sure if anybody (resp. any character set / keyboard / compiler) yet needs these trigraphs, I just stumbled over that by setting a test string within a modules *.cfg file:

Log.text = "Now it works... (???)";

Afterwards the compiler found the trigraph '??)' and warned about:

warning: trigraph ??) ignored, use -trigraphs to enable

So I changed my text to that

Log.text = "Now it works... (?\?\?)";

But again the C file generated by XDC contains the string "Now it works... (???)", about which again the compiler warns.

Of course, I could have suppressed this warning without any serious consequences.

But felt like that might be a small (design) error within the string handling of XDC. Shouldn't XDC give attention to this small, old element of the C standard, maybe with respect to the chosen compiler behaviour? Or , at least, leave the '\?' unchainged?

I wrote '\\?' within the config, this is translated to '\?' within the generated C file, and, now came the big surprise for me, consistently it translated '\n' to a simple line break (without closing '\' for line change) within the string at the C file, which, of course, produces an compiler error.

Now, is it somewhere written down that those signs started with '\' are translated by the XDC framework when generating C files from that, and that thatswhy f.e. a '\n' must be coded there as '\\'? AND, ist that behaviour really wanted?

Best regards,
Joern.

 

  • The .cfg script uses JavaScript, and obeys JavaScript's notions of a string and of special characters. JavaScript interperets "\n" as a newline character. To insert a literal backslash in your string, you would need to escape it, as you found, with "\\". So to define the string C:\windows in JavaScript, it would be "C:\\windows". If you wanted a newline character to appear in the generated C string, I think you would need to write "My line of text\\n" so that JavaScript doesn't intereperet the \n as a newline.

    Chris

  • Joern said:

    I am not sure if anybody (resp. any character set / keyboard / compiler) yet needs these trigraphs, I just stumbled over that by setting a test string within a modules *.cfg file:

    Log.text = "Now it works... (???)";

    Afterwards the compiler found the trigraph '??)' and warned about:

    warning: trigraph ??) ignored, use -trigraphs to enable

    So I changed my text to that

    Log.text = "Now it works... (?\?\?)";

    But again the C file generated by XDC contains the string "Now it works... (???)", about which again the compiler warns.

    Right.  *.cfg files are JavaScript files and follow the syntax specified by ECMA-262 (see http://www.ecma.ch/ecma1/STAND/ECMA-262.HTM).  In this case, "(?\?\?)" is interpreted as "(???)" and this is written to the generated C file without the embedded backslashes.  When you run gcc -Wall, for example, you will see the warning about the ignored trigraph.

    Joern said:

    Of course, I could have suppressed this warning without any serious consequences.

    But felt like that might be a small (design) error within the string handling of XDC. Shouldn't XDC give attention to this small, old element of the C standard, maybe with respect to the chosen compiler behaviour? Or , at least, leave the '\?' unchainged?

    Unfortunately not without violating ECMA-262.  We intentionally opted to _not_ invent a new language for configuration scripts and decided to stick to a standard scripting language and simply add configuration objects.  This allows one to pick up any JavaScript language reference to answer questions like the the ones below.  It also allows us to leverage existing high quality JavaScript implementations.

    Joern said:

    I wrote '\\?' within the config, this is translated to '\?' within the generated C file, and, now came the big surprise for me, consistently it translated '\n' to a simple line break (without closing '\' for line change) within the string at the C file, which, of course, produces an compiler error.

    Now, is it somewhere written down that those signs started with '\' are translated by the XDC framework when generating C files from that, and that thats why f.e. a '\n' must be coded there as '\\'? AND, ist that behaviour really wanted?

    The translation is the consequence of JavaScript interpreting '\n' in strings like C does: it replaces the digraph with the single new line character.  When the generated C file is created it does not perform _any_ translation of the string supplied by JavaScript. 

    Since it _is_ possible for the generation portion of XDC to detect these characters _before_ generating the C file, it should replace these characters with the appropriate digraph.  In general, the generated C file should _always_ compile using a ANSI C compiler.  This is a bug which I just filed here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=350109.

    As you've already noted, the work around is to escape the translation that JavaScript applies to literal strings by adding an extra '\'.  How do you know what to escape?  Basically any use of '\' that appears in a C literal;  the extra '\' will prevent JavaScript from doing any translation before we generate the .c file and XDC will see just one '\' which is exactly what you want in front of the special characters.

    I hope this helps.

    dave

     

  • Dave, thanks a lot for your detailed explanation, I find you made this thread actually being a helpful source.

    Thanks for the link to the ECMA-262 (obviously I had overlooked some information from the RTSC pages), and thanks in general for the clear statement concerning your decision not to use a language somewhat like JavaScript or JScript but to keep it standard-compliant to ECMA-262.

    But the most happy I am about finding the problem with '\n' (to lead to uncompilable code) now being filed as bug report.

    And I think this point concerning the '\' handling should be thought through anew while handling this bug report:

    • The XDC framework is made (as far as I understand and want myself) to make the handling of given modules and switching target platforms more easy. Thatswhy it should warn about detectable problems within the config scripts, because resulting errors always are more difficult to combine with their actual root.
    • I agree concerning all wants (and needs) of standard compliance, so '\\' probably should remain the right solution to input a '\' into a string variable, but maybe it should be noted more explicitely within the documents and for sure it should be warned about in situations leading to resulting errors.
    • A "translation" is done in other cases too, f.e. numbers are translated into their hex values within the C code. The problem concerning '\' will probably be that the XDC mechanisms might handle the config files by standard ECMA-262 handlers and so have to analyze the situation from the resulting string, not from what the user put in. That makes the situation somewhat more difficult, last not least because f.e. the user cannot be warned about probably forgotton '\', because all '\' before non-escape characters ("The CV of CharacterEscapeSequence :: NonEscapeCharacter is the CV of the NonEscapeCharacter.", ECMA-262 p. 34) will be eaten. It should be thought of escape sequences like '\t' and, in general, about the way to input (and correctly translate) unicode characters.

    But, however, until such a way has been thought trough and has been developed the strict usage of '\\' instead of '\' will be the solution.

    Thanks, again,
    Joern.