Difference between revisions of "cpp/string/multibyte/mblen"

Revision as of 18:53, 6 December 2020

Defined in header `<cstdlib>`
int mblen( const char* s, std::size_t n );

Determines the size, in bytes, of the multibyte character whose first byte is pointed to by s.

If s is a null pointer, resets the global conversion state and determines whether shift sequences are used.

This function is equivalent to the call std::mbtowc((wchar_t*)0, s, n), except that conversion state of std::mbtowc is unaffected.

Each call to mblen updates the internal global conversion state (a static object of type std::mbstate_t, only known to this function). If the multibyte encoding uses shift states, care must be taken to avoid backtracking or multiple scans. In any case, multiple threads should not call mblen without synchronization: std::mbrlen may be used instead.

Parameters

s	-	pointer to the multibyte character
n	-	limit on the number of bytes in s that can be examined

Return value

If s is not a null pointer, returns the number of bytes that are contained in the multibyte character or -1 if the first bytes pointed to by s do not form a valid multibyte character or 0 if s is pointing at the null charcter '\0'.

If s is a null pointer, resets its internal conversion state to represent the initial shift state and returns 0 if the current multibyte encoding is not state-dependent (does not use shift sequences) or a non-zero value if the current multibyte encoding is state-dependent (uses shift sequences).

Example

Run this code

#include <clocale>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <stdexcept>
#include <string_view>
 
// the number of characters in a multibyte string is the sum of mblen()'s
// note: the simpler approach is std::mbstowcs(NULL, s.c_str(), s.size())
std::size_t strlen_mb(const std::string_view s)
{
    std::size_t result = 0;
    const char* ptr = s.data();
    const char* end = ptr + s.size();
    std::mblen(NULL, 0); // reset the conversion state
    while (ptr < end) {
        int next = std::mblen(ptr, end-ptr);
        if (next == -1) {
            throw std::runtime_error("strlen_mb(): conversion error");
        }
        ptr += next;
        ++result;
    }
    return result;
}
 
void dump_bytes(const std::string_view str)
{
    std::cout << std::hex << std::uppercase << std::setfill('0');
    for (unsigned char c : str)
        std::cout << std::setw(2) << static_cast<int>(c) << ' ';
    std::cout << std::dec << '\n';
}
 
int main()
{
    // allow mblen() to work with UTF-8 multibyte encoding
    std::setlocale(LC_ALL, "en_US.utf8");
    // UTF-8 narrow multibyte encoding
    const std::string_view str = "z\u00df\u6c34\U0001f34c"; // or u8"zß水🍌"
    std::cout << std::quoted(str) << " is " << strlen_mb(str)
              << " characters, but as much as " << str.size() << " bytes: ";
    dump_bytes(str);
}

Output:

"zß水🍌" is 4 characters, but as much as 10 bytes: 7A C3 9F E6 B0 B4 F0 9F 8D 8C

Compiler support
Freestanding and hosted
Language
Standard library
Standard library headers
Named requirements
Feature test macros (C++20)
Language support library
Concepts library (C++20)
Metaprogramming library (C++11)
Diagnostics library
General utilities library
Strings library
Containers library
Iterators library
Ranges library (C++20)
Algorithms library
Numerics library
Localizations library
Input/output library
Filesystem library (C++17)
Regular expressions library (C++11)
Concurrency support library (C++11)
Execution support library (C++26)
Technical specifications
Symbols index
External libraries

Null-terminated strings
Byte strings
Multibyte strings
Wide strings
Classes
basic_string
basic_string_view (C++17)
char_traits

mbtowc	converts the next multibyte character to wide character (function) [edit]
mbrlen	returns the number of bytes in the next multibyte character, given state (function) [edit]
C documentation for mblen

cppreference.com

Namespaces

Variants

Views

Actions

Difference between revisions of "cpp/string/multibyte/mblen"

Revision as of 18:53, 6 December 2020

Contents

Notes

Parameters

Return value

Example

See also

Navigation

Toolbox