Namespaces
Variants
Views
Actions

Difference between revisions of "cpp/string/byte/strtok"

From cppreference.com
< cpp‎ | string‎ | byte
m (char * -> char*)
 
(21 intermediate revisions by 11 users not shown)
Line 1: Line 1:
{{cpp/title| strtok}}
+
{{cpp/title|strtok}}
{{cpp/string/byte/sidebar}}
+
{{cpp/string/byte/navbar}}
{{ddcl list begin}}
+
{{dcl begin}}
{{ddcl list header | cstring}}
+
{{dcl header|cstring}}
{{ddcl list item | 1=
+
{{dcl|1=
 
char* strtok( char* str, const char* delim );
 
char* strtok( char* str, const char* delim );
 
}}
 
}}
{{ddcl list end}}
+
{{dcl end}}
  
Finds the next token in a null-terminated byte string pointed to by {{tt|str}}. The separator characters are identified by null-terminated byte string pointed to by {{tt|delim}}.
+
Finds the next token in a null-terminated byte string pointed to by {{c|str}}. The separator characters are identified by null-terminated byte string pointed to by {{c|delim}}.
  
If {{c|str !{{=}} NULL}}, the function searches for the first character which is ''not'' separator. This character is the ''beginning of the token''. Then the function searches for the first separator character. This character is the ''end of the token''. Function terminates and returns {{c|NULL}} if end of {{tt|str}} is encountered before ''end of the token'' is found. Otherwise, a pointer to ''end of the token'' is saved in a static location for subsequent invocations. This character is then replaced by a NULL-character and the function returns a pointer to the ''beginning of the token''.
+
This function is designed to be called multiple times to obtain successive tokens from the same string.
  
If {{c|str {{==}} NULL}}, the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}.
+
* If {{c|str}} is not a null pointer, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{c|delim}}.
 +
:* If no such character was found, there are no tokens in {{c|str}} at all, and the function returns a null pointer.
 +
:* If such character was found, it is the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{c|delim}}.
 +
::* If no such character was found, {{c|str}} has only one token, and the future calls to {{tt|strtok}} will return a null pointer.
 +
::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations.
 +
:* The function then returns the pointer to the beginning of the token.
 +
* If {{c|str}} is a null pointer, the call is treated as a subsequent call to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}.
  
 
===Parameters===
 
===Parameters===
{{param list begin}}
+
{{par begin}}
{{param list item | str | pointer to the null-terminated byte string to tokenize}}
+
{{par|str|pointer to the null-terminated byte string to tokenize}}
{{param list item | delim | pointer to the null-terminated byte string identifying delimiters}}
+
{{par|delim|pointer to the null-terminated byte string identifying delimiters}}
{{param list end}}
+
{{par end}}
  
 
===Return value===
 
===Return value===
Pointer to the beginning of a token if the end of string has not been encountered. Otherwise returns {{c|NULL}}>
+
Pointer to the beginning of the next token or a {{c|nullptr}} if there are no more tokens.
  
===Note===
+
===Notes===
The function is not thread safe.
+
This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{c|str}}. In particular, a [[cpp/language/string_literal|string literal]] cannot be used as the first argument of {{tt|std::strtok}}.
 +
 
 +
Each call to this function modifies a static variable: is not thread safe.
 +
 
 +
Unlike most other tokenizers, the delimiters in {{tt|std::strtok}} can be different for each subsequent token, and can even depend on the contents of the previous tokens.
 +
 
 +
===Possible implementation===
 +
{{eq fun|1=
 +
char* strtok(char* str, const char* delim)
 +
{
 +
    static char* buffer;
 +
 
 +
    if (str != nullptr)
 +
        buffer = str;
 +
 
 +
    buffer += std::strspn(buffer, delim);
 +
 
 +
    if (*buffer == '\0')
 +
        return nullptr;
 +
 
 +
    char* const tokenBegin = buffer;
 +
 
 +
    buffer += std::strcspn(buffer, delim);
 +
 
 +
    if (*buffer != '\0')
 +
        *buffer++ = '\0';
 +
 
 +
    return tokenBegin;
 +
}
 +
}}
 +
 
 +
Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in [https://github.com/bminor/musl/blob/master/src/string/strtok.c MUSL libc]), or in terms of its reentrant version (as in [https://github.com/bminor/glibc/blob/master/string/strtok.c GNU libc]).
  
 
===Example===
 
===Example===
 
{{example
 
{{example
|
+
|code=
| code=
+
#include <cstring>
| output=
+
#include <iomanip>
 +
#include <iostream>
 +
 
 +
int main()
 +
{
 +
    char input[] = "one + two * (three - four)!";
 +
    const char* delimiters = "! +- (*)";
 +
    char* token = std::strtok(input, delimiters);
 +
    while (token)
 +
    {
 +
        std::cout << std::quoted(token) << ' ';
 +
        token = std::strtok(nullptr, delimiters);
 +
    }
 +
 
 +
    std::cout << "\nContents of the input string now:\n\"";
 +
    for (std::size_t n = 0; n < sizeof input; ++n)
 +
    {
 +
        if (const char c = input[n]; c != '\0')
 +
            std::cout << c;
 +
        else
 +
            std::cout << "\\0";
 +
    }
 +
    std::cout << "\"\n";
 +
}
 +
|output=
 +
"one" "two" "three" "four"
 +
Contents of the input string now:
 +
"one\0+ two\0* (three\0- four\0!\0"
 
}}
 
}}
  
[[fr:cpp/string/byte/strtok]]
+
===See also===
[[ja:cpp/string/byte/strtok]]
+
{{dsc begin}}
[[zh:cpp/string/byte/strtok]]
+
{{dsc inc|cpp/string/byte/dsc strpbrk}}
 +
{{dsc inc|cpp/string/byte/dsc strcspn}}
 +
{{dsc inc|cpp/string/byte/dsc strspn}}
 +
{{dsc inc|cpp/ranges/dsc split_view}}
 +
{{dsc see c|c/string/byte/strtok}}
 +
{{dsc end}}
 +
 
 +
{{langlinks|de|es|fr|it|ja|pt|ru|zh}}

Latest revision as of 23:00, 21 October 2023

Defined in header <cstring>
char* strtok( char* str, const char* delim );

Finds the next token in a null-terminated byte string pointed to by str. The separator characters are identified by null-terminated byte string pointed to by delim.

This function is designed to be called multiple times to obtain successive tokens from the same string.

  • If str is not a null pointer, the call is treated as the first call to strtok for this particular string. The function searches for the first character which is not contained in delim.
  • If no such character was found, there are no tokens in str at all, and the function returns a null pointer.
  • If such character was found, it is the beginning of the token. The function then searches from that point on for the first character that is contained in delim.
  • If no such character was found, str has only one token, and the future calls to strtok will return a null pointer.
  • If such character was found, it is replaced by the null character '\0' and the pointer to the following character is stored in a static location for subsequent invocations.
  • The function then returns the pointer to the beginning of the token.
  • If str is a null pointer, the call is treated as a subsequent call to strtok: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as str.

Contents

[edit] Parameters

str - pointer to the null-terminated byte string to tokenize
delim - pointer to the null-terminated byte string identifying delimiters

[edit] Return value

Pointer to the beginning of the next token or a nullptr if there are no more tokens.

[edit] Notes

This function is destructive: it writes the '\0' characters in the elements of the string str. In particular, a string literal cannot be used as the first argument of std::strtok.

Each call to this function modifies a static variable: is not thread safe.

Unlike most other tokenizers, the delimiters in std::strtok can be different for each subsequent token, and can even depend on the contents of the previous tokens.

[edit] Possible implementation

char* strtok(char* str, const char* delim)
{
    static char* buffer;
 
    if (str != nullptr)
        buffer = str;
 
    buffer += std::strspn(buffer, delim);
 
    if (*buffer == '\0')
        return nullptr;
 
    char* const tokenBegin = buffer;
 
    buffer += std::strcspn(buffer, delim);
 
    if (*buffer != '\0')
        *buffer++ = '\0';
 
    return tokenBegin;
}

Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in MUSL libc), or in terms of its reentrant version (as in GNU libc).

[edit] Example

#include <cstring>
#include <iomanip>
#include <iostream>
 
int main() 
{
    char input[] = "one + two * (three - four)!";
    const char* delimiters = "! +- (*)";
    char* token = std::strtok(input, delimiters);
    while (token)
    {
        std::cout << std::quoted(token) << ' ';
        token = std::strtok(nullptr, delimiters);
    }
 
    std::cout << "\nContents of the input string now:\n\"";
    for (std::size_t n = 0; n < sizeof input; ++n)
    {
        if (const char c = input[n]; c != '\0')
            std::cout << c;
        else
            std::cout << "\\0";
    }
    std::cout << "\"\n";
}

Output:

"one" "two" "three" "four" 
Contents of the input string now:
"one\0+ two\0* (three\0- four\0!\0"

[edit] See also

finds the first location of any character from a set of separators
(function) [edit]
returns the length of the maximum initial segment that consists
of only the characters not found in another byte string
(function) [edit]
returns the length of the maximum initial segment that consists
of only the characters found in another byte string
(function) [edit]
a view over the subranges obtained from splitting another view using a delimiter
(class template) (range adaptor object)[edit]
C documentation for strtok