3.2.5 Multi-byte String types

For multi-byte string types, the basic character has a size of at least 2. This means it can be used to store a unicode character in UTF16 or UCS2 encoding.

UnicodeStrings

Unicodestrings (used to represent unicode character strings) are implemented in much the same way as ansistrings: reference counted, null-terminated arrays, only they are implemented as arrays of WideChars instead of regular Chars. A WideChar is a two-byte character (an element of a DBCS: Double Byte Character Set). Mostly the same rules apply for UnicodeStrings as for AnsiStrings. The compiler transparently converts UnicodeStrings to AnsiStrings and vice versa.

Similarly to the typecast of an Ansistring to a PChar null-terminated array of characters, a UnicodeString can be converted to a PUnicodeChar null-terminated array of characters. Note that the PUnicodeChar array is terminated by 2 null bytes instead of 1, so a typecast to a pchar is not automatic.

The compiler itself provides no support for any conversion from Unicode to ansistrings or vice versa. The system unit has a unicodestring manager record, which can be initialized with some OS-specific unicode handling routines. For more information, see the system unit reference.

A unicode string literal can be constructed in a similar manner as a widechar:

Const
  ws2: unicodestring = 'phi omega : '#$03A8' '#$03A9;

WideStrings

The Widestring type (used to represent unicode character strings in COM applications) is implemented in much the same way as Unicodestring on Windows, and on other platforms, they are simply the same type. If interaction with COM is not required, the UnicodeString type should be used.

On Windows, unlike UnicodeString, the WideString type is not reference counted, and are allocated with a special windows function which allows them to be used for OLE automation. This means they are implemented as null-terminated arrays of WideChars instead of regular Chars. WideString obeys the same rules as for UnicodeStrings. Similar to unicodestrings, the compiler transparently converts WideStrings to AnsiStrings and vice versa.

For typecasting and conversion, the same rules apply as for the UnicodeString type.

Note that on windows, because a WideString string is allocated using a special windows function, the memory layout differs from UnicodeString. The length for instance is stored in bytes rather than characters.