Null-terminated multibyte strings
Template:cpp/string/multibyte/sidebar
This section is incomplete Reason: clearly explain relation to NTBS (NTMBS is a subset of NTBS, etc.) |
A null-terminated multibyte string (NTMBS), or "multibyte string", is also a sequence of nonzero bytes followed by a byte with value zero (the terminating null character), but each character stored in the string may occupy more than one byte. For example, the char array Template:cpp} is an NTMBS holding the string Template:cpp in UTF-8 multibyte encoding: the first three bytes encode the character 你, the next three bytes encode the character 好.
An NTMBS is only valid if it begins and ends in the initial shift state: if the string above began with Template:cpp, a byte that cannot appear in the initial shift state of UTF-8 (that is, it cannot be the first byte of a multibyte character), the sequence would not be an NTMBS. A multibyte character string is layout-compatible with byte string, that is, can be stored, copied, and examined using the same facilities, except for the length calculation. Multibyte strings can be converted to and from wide strings using appropriate conversion functions.
Multibyte/wide character conversions
Defined in header
<cstdlib> | |
returns the number of bytes in the next multibyte character (function) | |
converts the next multibyte character to wide character (function) | |
converts a wide character to its multibyte representation (function) | |
converts a narrow multibyte character string to wide string (function) | |
converts a wide string to narrow multibyte character string (function) | |
Defined in header
<cwchar> | |
checks if the mbstate_t object represents initial shift state (function) | |
widens a single-byte narrow character to wide character, if possible (function) | |
narrows a wide character to a single-byte narrow character, if possible (function) | |
returns the number of bytes in the next multibyte character, given state (function) | |
converts the next multibyte character to wide character, given state (function) | |
converts a wide character to its multibyte representation, given state (function) | |
converts a narrow multibyte character string to wide string, given state (function) | |
converts a wide string to narrow multibyte character string, given state (function) | |
Defined in header
<cuchar> | |
(C++11) |
generate the next 16-bit wide character from a narrow multibyte string (function) |
(C++11) |
convert a 16-bit wide character to narrow multibyte string (function) |
(C++11) |
generate the next 32-bit wide character from a narrow multibyte string (function) |
(C++11) |
convert a 32-bit wide character to narrow multibyte string (function) |
Macros
Defined in header
<climits> | |
MB_LEN_MAX |
maximum number of bytes in a multibyte character (macro constant) |
Defined in header
<cstdlib> | |
MB_CUR_MAX |
maximum number of bytes in a multibyte character in the current C locale (macro constant) |
Defined in header
<cuchar> | |
__STDC_UTF_16__ |
indicates that UTF-16 encoding is used by mbrtoc16 and c16rtomb (macro constant) |
__STDC_UTF_32__ |
indicates that UTF-32 encoding is used by mbrtoc32 and c32rtomb (macro constant) |