Namespaces
Variants
Views
Actions

Null-terminated multibyte strings

From cppreference.com
< cpp‎ | string
Revision as of 21:36, 7 December 2011 by Cubbi (Talk | contribs)

Template:cpp/string/multibyte/sidebar

A null-terminated multibyte string (NTMBS), or "multibyte string", is also a sequence of nonzero bytes followed by a byte with value zero (the terminating null character), but each character stored in the string may occupy more than one byte. For example, the char array Template:cpp} is an NTMBS holding the string Template:cpp in UTF-8 multibyte encoding: the first three bytes encode the character 你, the next three bytes encode the character 好.

An NTMBS is only valid if it begins and ends in the initial shift state: if the string above began with Template:cpp, a byte that cannot appear in the initial shift state of UTF-8 (that is, it cannot be the first byte of a multibyte character), the sequence would not be an NTMBS. A multibyte character string is layout-compatible with byte string, that is, can be stored, copied, and examined using the same facilities, except for the length calculation. Multibyte strings can be converted to and from wide strings using appropriate conversion functions.

Multibyte/wide character conversions

Defined in header <cstdlib>
returns the number of bytes in the next multibyte character
(function)
converts the next multibyte character to wide character
(function)
converts a wide character to its multibyte representation
(function)
converts a narrow multibyte character string to wide string
(function)
converts a wide string to narrow multibyte character string
(function)
Defined in header <cwchar>
checks if the mbstate_t object represents initial shift state
(function)
widens a single-byte narrow character to wide character, if possible
(function)
narrows a wide character to a single-byte narrow character, if possible
(function)
returns the number of bytes in the next multibyte character, given state
(function)
converts the next multibyte character to wide character, given state
(function)
converts a wide character to its multibyte representation, given state
(function)
converts a narrow multibyte character string to wide string, given state
(function)
converts a wide string to narrow multibyte character string, given state
(function)
Defined in header <cuchar>
(C++11)
generate the next 16-bit wide character from a narrow multibyte string
(function)
(C++11)
convert a 16-bit wide character to narrow multibyte string
(function)
(C++11)
generate the next 32-bit wide character from a narrow multibyte string
(function)
(C++11)
convert a 32-bit wide character to narrow multibyte string
(function)

Macros

Defined in header <climits>
MB_LEN_MAX
maximum number of bytes in a multibyte character
(macro constant)
Defined in header <cstdlib>
MB_CUR_MAX
maximum number of bytes in a multibyte character in the current C locale
(macro constant)
Defined in header <cuchar>
__STDC_UTF_16__
indicates that UTF-16 encoding is used by mbrtoc16 and c16rtomb
(macro constant)
__STDC_UTF_32__
indicates that UTF-32 encoding is used by mbrtoc32 and c32rtomb
(macro constant)