Skip to main content

IAR Embedded Workbench for Arm 9.70.x

Text encodings

In this section:

Text files read or written by IAR tools can use a variety of text encodings:

  • Raw

    This is a backward-compatibility mode for C/C++ source files. Only 7-bit ASCII characters can be used in symbol names. Other characters can only be used in comments, literals, etc. This is the default source file encoding if there is no Byte Order Mark (BOM).

  • The system default locale

    The locale that you have configured your Windows OS to use.

  • UTF-8

    Unicode encoded as a sequence of 8-bit bytes, with or without a Byte Order Mark.

  • UTF-16

    Unicode encoded as a sequence of 16-bit words using a big-endian or little-endian representation. These files always start with a Byte Order Mark.

In any encoding other than Raw, you can use Unicode characters of the appropriate kind (alphabetic, numeric, etc) in the names of symbols.

When an IAR tool reads a text file with a Byte Order Mark, it will use the appropriate Unicode encoding, regardless of the any options set for input file encoding.

For source files without a Byte Order Mark, the compiler will use the Raw encoding, unless you specify the compiler option ‑‑source_encoding. See ‑‑source_encoding.

For source files without a Byte Order Mark, the assembler will use the Raw encoding unless you specify the assembler option ‑‑source_encoding.

For other text input files, like the extended command line (.xcl files), without a Byte Order Mark, the IAR tools will use the system default locale unless you specify the compiler option ‑‑utf8_text_in, in which case UTF-8 will be used. See ‑‑utf8_text_in.

For compiler list files and preprocessor output, the same encoding as the main source file will be used by default. Other tools that generate text output will use the UTF-8 encoding by default. You can change this by using the compiler options ‑‑text_out and ‑‑no_bom. See ‑‑text_out and ‑‑no_bom.

Characters and string literals

When you compile source code, characters (x) and string literals (xx) are handled as follows:

'x', "xx"

Characters in untyped character and string literals are copied verbatim, using the same encoding as in the source file.

u8"xx"

Characters in UTF-8 string literals are converted to UTF-8.

u'x', u"xx"

Characters in UTF-16 character and string literals are converted to UTF-16.

U'x', U"xx"

Characters in UTF-32 character and string literals are converted to UTF-32.

L'x', L"xx"

Characters in wide character and string literals are converted to UTF-32.