Difference between revisions of "cpp/string/byte/strtok"

Latest revision as of 23:00, 21 October 2023

Defined in header `<cstring>`
char* strtok( char* str, const char* delim );

Finds the next token in a null-terminated byte string pointed to by str. The separator characters are identified by null-terminated byte string pointed to by delim.

This function is designed to be called multiple times to obtain successive tokens from the same string.

If str is not a null pointer, the call is treated as the first call to strtok for this particular string. The function searches for the first character which is not contained in delim.

If no such character was found, there are no tokens in str at all, and the function returns a null pointer.
If such character was found, it is the beginning of the token. The function then searches from that point on for the first character that is contained in delim.

If no such character was found, str has only one token, and the future calls to strtok will return a null pointer.
If such character was found, it is replaced by the null character '\0' and the pointer to the following character is stored in a static location for subsequent invocations.

The function then returns the pointer to the beginning of the token.

If str is a null pointer, the call is treated as a subsequent call to strtok: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as str.

char* strtok(char* str, const char* delim)
{
    static char* buffer;
 
    if (str != nullptr)
        buffer = str;
 
    buffer += std::strspn(buffer, delim);
 
    if (*buffer == '\0')
        return nullptr;
 
    char* const tokenBegin = buffer;
 
    buffer += std::strcspn(buffer, delim);
 
    if (*buffer != '\0')
        *buffer++ = '\0';
 
    return tokenBegin;
}

Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in MUSL libc), or in terms of its reentrant version (as in GNU libc).

[edit] Example

Run this code

#include <cstring>
#include <iomanip>
#include <iostream>
 
int main() 
{
    char input[] = "one + two * (three - four)!";
    const char* delimiters = "! +- (*)";
    char* token = std::strtok(input, delimiters);
    while (token)
    {
        std::cout << std::quoted(token) << ' ';
        token = std::strtok(nullptr, delimiters);
    }
 
    std::cout << "\nContents of the input string now:\n\"";
    for (std::size_t n = 0; n < sizeof input; ++n)
    {
        if (const char c = input[n]; c != '\0')
            std::cout << c;
        else
            std::cout << "\\0";
    }
    std::cout << "\"\n";
}

Output:

"one" "two" "three" "four" 
Contents of the input string now:
"one\0+ two\0* (three\0- four\0!\0"

[edit] See also

strpbrk	finds the first location of any character from a set of separators (function) [edit]
strcspn	returns the length of the maximum initial segment that consists of only the characters not found in another byte string (function) [edit]
strspn	returns the length of the maximum initial segment that consists of only the characters found in another byte string (function) [edit]
ranges::split_viewviews::split (C++20)	a `view` over the subranges obtained from splitting another `view` using a delimiter (class template) (range adaptor object)[edit]
C documentation for strtok

@@ Line 1: / Line 1: @@
-{{cpp/title| strtok}}
+{{cpp/title|strtok}}
-{{cpp/string/byte/sidebar}}
+{{cpp/string/byte/navbar}}
-{{ddcl list begin}}
+{{dcl begin}}
-{{ddcl list header | cstring}}
+{{dcl header|cstring}}
-{{ddcl list item | 1=
+{{dcl|1=
 char* strtok( char* str, const char* delim );
 }}
-{{ddcl list end}}
+{{dcl end}}
-Finds the next token in a null-terminated byte string pointed to by {{tt|str}}. The separator characters are identified by null-terminated byte string pointed to by {{tt|delim}}.
+Finds the next token in a null-terminated byte string pointed to by {{c|str}}. The separator characters are identified by null-terminated byte string pointed to by {{c|delim}}.
-If {{c|str !{{=}} NULL}}, the function searches for the first character which is ''not'' separator. This character is the ''beginning of the token''. Then the function searches for the first separator character. This character is the ''end of the token''. Function terminates and returns {{c|NULL}} if end of {{tt|str}} is encountered before ''end of the token'' is found. Otherwise, a pointer to ''end of the token'' is saved in a static location for subsequent invocations. This character is then replaced by a NULL-character and the function returns a pointer to the ''beginning of the token''.
+This function is designed to be called multiple times to obtain successive tokens from the same string.
-If {{c|str {{==}} NULL}}, the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}.
+* If {{c|str}} is not a null pointer, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{c|delim}}.
+:* If no such character was found, there are no tokens in {{c|str}} at all, and the function returns a null pointer.
+:* If such character was found, it is the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{c|delim}}.
+::* If no such character was found, {{c|str}} has only one token, and the future calls to {{tt|strtok}} will return a null pointer.
+::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations.
+:* The function then returns the pointer to the beginning of the token.
+* If {{c|str}} is a null pointer, the call is treated as a subsequent call to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}.
 ===Parameters===
-{{param list begin}}
+{{par begin}}
-{{param list item | str | pointer to the null-terminated byte string to tokenize}}
+{{par|str|pointer to the null-terminated byte string to tokenize}}
-{{param list item | delim | pointer to the null-terminated byte string identifying delimiters}}
+{{par|delim|pointer to the null-terminated byte string identifying delimiters}}
-{{param list end}}
+{{par end}}
 ===Return value===
-Pointer to the beginning of a token if the end of string has not been encountered. Otherwise returns {{c|NULL}}>
+Pointer to the beginning of the next token or a {{c|nullptr}} if there are no more tokens.
-===Note===
+===Notes===
-The function is not thread safe.
+This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{c|str}}. In particular, a [[cpp/language/string_literal|string literal]] cannot be used as the first argument of {{tt|std::strtok}}.
+Each call to this function modifies a static variable: is not thread safe.
+Unlike most other tokenizers, the delimiters in {{tt|std::strtok}} can be different for each subsequent token, and can even depend on the contents of the previous tokens.
+===Possible implementation===
+{{eq fun|1=
+char* strtok(char* str, const char* delim)
+{
+    static char* buffer;
+    if (str != nullptr)
+        buffer = str;
+    buffer += std::strspn(buffer, delim);
+    if (*buffer == '\0')
+        return nullptr;
+    char* const tokenBegin = buffer;
+    buffer += std::strcspn(buffer, delim);
+    if (*buffer != '\0')
+        *buffer++ = '\0';
+    return tokenBegin;
+}
+}}
+Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in [https://github.com/bminor/musl/blob/master/src/string/strtok.c MUSL libc]), or in terms of its reentrant version (as in [https://github.com/bminor/glibc/blob/master/string/strtok.c GNU libc]).
 ===Example===
 {{example
- |
+|code=
- | code=
+#include <cstring>
- | output=
+#include <iomanip>
+#include <iostream>
+int main()
+{
+    char input[] = "one + two * (three - four)!";
+    const char* delimiters = "! +- (*)";
+    char* token = std::strtok(input, delimiters);
+    while (token)
+    {
+        std::cout << std::quoted(token) << ' ';
+        token = std::strtok(nullptr, delimiters);
+    }
+    std::cout << "\nContents of the input string now:\n\"";
+    for (std::size_t n = 0; n < sizeof input; ++n)
+    {
+        if (const char c = input[n]; c != '\0')
+            std::cout << c;
+        else
+            std::cout << "\\0";
+    }
+    std::cout << "\"\n";
+}
+|output=
+"one" "two" "three" "four"
+Contents of the input string now:
+"one\0+ two\0* (three\0- four\0!\0"
 }}
-[[fr:cpp/string/byte/strtok]]
+===See also===
-[[ja:cpp/string/byte/strtok]]
+{{dsc begin}}
-[[zh:cpp/string/byte/strtok]]
+{{dsc inc|cpp/string/byte/dsc strpbrk}}
+{{dsc inc|cpp/string/byte/dsc strcspn}}
+{{dsc inc|cpp/string/byte/dsc strspn}}
+{{dsc inc|cpp/ranges/dsc split_view}}
+{{dsc see c|c/string/byte/strtok}}
+{{dsc end}}
+{{langlinks|de|es|fr|it|ja|pt|ru|zh}}

Compiler support
Freestanding and hosted
Language
Standard library
Standard library headers
Named requirements
Feature test macros (C++20)
Language support library
Concepts library (C++20)
Metaprogramming library (C++11)
Diagnostics library
General utilities library
Strings library
Containers library
Iterators library
Ranges library (C++20)
Algorithms library
Numerics library
Localizations library
Input/output library
Filesystem library (C++17)
Regular expressions library (C++11)
Concurrency support library (C++11)
Execution support library (C++26)
Technical specifications
Symbols index
External libraries

Null-terminated strings
Byte strings
Multibyte strings
Wide strings
Classes
basic_string
basic_string_view (C++17)
char_traits

cppreference.com

Namespaces

Variants

Views

Actions

Difference between revisions of "cpp/string/byte/strtok"

Latest revision as of 23:00, 21 October 2023

Contents

[edit] Parameters

[edit] Return value

[edit] Notes

[edit] Possible implementation

[edit] Example

[edit] See also

Navigation

Toolbox

str	-	pointer to the null-terminated byte string to tokenize
delim	-	pointer to the null-terminated byte string identifying delimiters