Namespaces
Variants
Views
Actions

Difference between revisions of "cpp/string/byte/strtok"

From cppreference.com
< cpp‎ | string‎ | byte
(link to real impl (templted to just paste MUSL's ))
m ({{c}}, ., headers sorted, fmt)
Line 1: Line 1:
{{cpp/title| strtok}}
+
{{cpp/title|strtok}}
 
{{cpp/string/byte/navbar}}
 
{{cpp/string/byte/navbar}}
 
{{dcl begin}}
 
{{dcl begin}}
{{dcl header | cstring}}
+
{{dcl header|cstring}}
{{dcl | 1=
+
{{dcl|1=
 
char* strtok( char* str, const char* delim );
 
char* strtok( char* str, const char* delim );
 
}}
 
}}
 
{{dcl end}}
 
{{dcl end}}
  
Finds the next token in a null-terminated byte string pointed to by {{tt|str}}. The separator characters are identified by null-terminated byte string pointed to by {{tt|delim}}.
+
Finds the next token in a null-terminated byte string pointed to by {{c|str}}. The separator characters are identified by null-terminated byte string pointed to by {{c|delim}}.
  
 
This function is designed to be called multiple times to obtain successive tokens from the same string.
 
This function is designed to be called multiple times to obtain successive tokens from the same string.
  
* If {{c|str}} is not a null pointer, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{tt|delim}}.
+
* If {{c|str}} is not a null pointer, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{c|delim}}.
:* If no such character was found, there are no tokens in {{tt|str}} at all, and the function returns a null pointer.  
+
:* If no such character was found, there are no tokens in {{c|str}} at all, and the function returns a null pointer.  
:* If such character was found, it is the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{tt|delim}}.  
+
:* If such character was found, it is the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{c|delim}}.  
::* If no such character was found, {{tt|str}} has only one token, and the future calls to {{tt|strtok}} will return a null pointer
+
::* If no such character was found, {{c|str}} has only one token, and the future calls to {{tt|strtok}} will return a null pointer.
 
::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations.
 
::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations.
:* The function then returns the pointer to the beginning of the token
+
:* The function then returns the pointer to the beginning of the token.
 
* If {{c|str}} is a null pointer, the call is treated as a subsequent call to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}.
 
* If {{c|str}} is a null pointer, the call is treated as a subsequent call to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}.
  
 
===Parameters===
 
===Parameters===
 
{{par begin}}
 
{{par begin}}
{{par | str | pointer to the null-terminated byte string to tokenize}}
+
{{par|str|pointer to the null-terminated byte string to tokenize}}
{{par | delim | pointer to the null-terminated byte string identifying delimiters}}
+
{{par|delim|pointer to the null-terminated byte string identifying delimiters}}
 
{{par end}}
 
{{par end}}
  
Line 30: Line 30:
  
 
===Notes===
 
===Notes===
This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{tt|str}}. In particular, a [[cpp/language/string_literal|string literal]] cannot be used as the first argument of {{tt|std::strtok}}.
+
This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{c|str}}. In particular, a [[cpp/language/string_literal|string literal]] cannot be used as the first argument of {{tt|std::strtok}}.
  
 
Each call to this function modifies a static variable: is not thread safe.
 
Each call to this function modifies a static variable: is not thread safe.
Line 43: Line 43:
  
 
     if (str != nullptr)
 
     if (str != nullptr)
    {
 
 
         buffer = str;
 
         buffer = str;
    }
 
  
 
     buffer += std::strspn(buffer, delim);
 
     buffer += std::strspn(buffer, delim);
  
 
     if (*buffer == '\0')
 
     if (*buffer == '\0')
    {
 
 
         return nullptr;
 
         return nullptr;
    }
 
  
 
     char* const tokenBegin = buffer;
 
     char* const tokenBegin = buffer;
Line 59: Line 55:
  
 
     if (*buffer != '\0')
 
     if (*buffer != '\0')
    {
 
 
         *buffer++ = '\0';
 
         *buffer++ = '\0';
    }
 
  
 
     return tokenBegin;
 
     return tokenBegin;
Line 67: Line 61:
 
}}
 
}}
  
Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in [https://github.com/bminor/musl/blob/master/src/string/strtok.c MUSL libc]), or in terms of its reentrant version (as in [https://github.com/bminor/glibc/blob/master/string/strtok.c GNU libc])
+
Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in [https://github.com/bminor/musl/blob/master/src/string/strtok.c MUSL libc]), or in terms of its reentrant version (as in [https://github.com/bminor/glibc/blob/master/string/strtok.c GNU libc]).
  
 
===Example===
 
===Example===
 
{{example
 
{{example
|
+
|code=
| code=
+
 
#include <cstring>
 
#include <cstring>
 +
#include <iomanip>
 
#include <iostream>
 
#include <iostream>
#include <iomanip>
 
  
 
int main()  
 
int main()  
Line 82: Line 75:
 
     const char* delimiters = "! +- (*)";
 
     const char* delimiters = "! +- (*)";
 
     char *token = std::strtok(input, delimiters);
 
     char *token = std::strtok(input, delimiters);
     while (token) {
+
     while (token)
 +
    {
 
         std::cout << std::quoted(token) << ' ';
 
         std::cout << std::quoted(token) << ' ';
 
         token = std::strtok(nullptr, delimiters);
 
         token = std::strtok(nullptr, delimiters);
Line 88: Line 82:
  
 
     std::cout << "\nContents of the input string now:\n\"";
 
     std::cout << "\nContents of the input string now:\n\"";
     for (std::size_t n = 0; n < sizeof input; ++n) {
+
     for (std::size_t n = 0; n < sizeof input; ++n)
 +
    {
 
         if (const char c = input[n]; c != '\0')
 
         if (const char c = input[n]; c != '\0')
 
             std::cout << c;
 
             std::cout << c;
Line 96: Line 91:
 
     std::cout << "\"\n";
 
     std::cout << "\"\n";
 
}
 
}
| output=
+
|output=
 
"one" "two" "three" "four"  
 
"one" "two" "three" "four"  
 
Contents of the input string now:
 
Contents of the input string now:
Line 104: Line 99:
 
===See also===
 
===See also===
 
{{dsc begin}}
 
{{dsc begin}}
{{dsc inc | cpp/string/byte/dsc strpbrk}}
+
{{dsc inc|cpp/string/byte/dsc strpbrk}}
{{dsc inc | cpp/string/byte/dsc strcspn}}
+
{{dsc inc|cpp/string/byte/dsc strcspn}}
{{dsc inc | cpp/string/byte/dsc strspn}}
+
{{dsc inc|cpp/string/byte/dsc strspn}}
{{dsc inc | cpp/ranges/dsc split_view}}
+
{{dsc inc|cpp/ranges/dsc split_view}}
{{dsc see c | c/string/byte/strtok}}
+
{{dsc see c|c/string/byte/strtok}}
 
{{dsc end}}
 
{{dsc end}}
  
 
{{langlinks|de|es|fr|it|ja|pt|ru|zh}}
 
{{langlinks|de|es|fr|it|ja|pt|ru|zh}}

Revision as of 08:53, 6 June 2023

Defined in header <cstring>
char* strtok( char* str, const char* delim );

Finds the next token in a null-terminated byte string pointed to by str. The separator characters are identified by null-terminated byte string pointed to by delim.

This function is designed to be called multiple times to obtain successive tokens from the same string.

  • If str is not a null pointer, the call is treated as the first call to strtok for this particular string. The function searches for the first character which is not contained in delim.
  • If no such character was found, there are no tokens in str at all, and the function returns a null pointer.
  • If such character was found, it is the beginning of the token. The function then searches from that point on for the first character that is contained in delim.
  • If no such character was found, str has only one token, and the future calls to strtok will return a null pointer.
  • If such character was found, it is replaced by the null character '\0' and the pointer to the following character is stored in a static location for subsequent invocations.
  • The function then returns the pointer to the beginning of the token.
  • If str is a null pointer, the call is treated as a subsequent call to strtok: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as str.

Contents

Parameters

str - pointer to the null-terminated byte string to tokenize
delim - pointer to the null-terminated byte string identifying delimiters

Return value

Pointer to the beginning of the next token or a nullptr if there are no more tokens.

Notes

This function is destructive: it writes the '\0' characters in the elements of the string str. In particular, a string literal cannot be used as the first argument of std::strtok.

Each call to this function modifies a static variable: is not thread safe.

Unlike most other tokenizers, the delimiters in std::strtok can be different for each subsequent token, and can even depend on the contents of the previous tokens.

Possible implementation

char* strtok(char* str, const char* delim)
{
    static char* buffer;
 
    if (str != nullptr)
        buffer = str;
 
    buffer += std::strspn(buffer, delim);
 
    if (*buffer == '\0')
        return nullptr;
 
    char* const tokenBegin = buffer;
 
    buffer += std::strcspn(buffer, delim);
 
    if (*buffer != '\0')
        *buffer++ = '\0';
 
    return tokenBegin;
}

Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in MUSL libc), or in terms of its reentrant version (as in GNU libc).

Example

#include <cstring>
#include <iomanip>
#include <iostream>
 
int main() 
{
    char input[] = "one + two * (three - four)!";
    const char* delimiters = "! +- (*)";
    char *token = std::strtok(input, delimiters);
    while (token)
    {
        std::cout << std::quoted(token) << ' ';
        token = std::strtok(nullptr, delimiters);
    }
 
    std::cout << "\nContents of the input string now:\n\"";
    for (std::size_t n = 0; n < sizeof input; ++n)
    {
        if (const char c = input[n]; c != '\0')
            std::cout << c;
        else
            std::cout << "\\0";
    }
    std::cout << "\"\n";
}

Output:

"one" "two" "three" "four" 
Contents of the input string now:
"one\0+ two\0* (three\0- four\0!\0"

See also

finds the first location of any character from a set of separators
(function) [edit]
returns the length of the maximum initial segment that consists
of only the characters not found in another byte string
(function) [edit]
returns the length of the maximum initial segment that consists
of only the characters found in another byte string
(function) [edit]
a view over the subranges obtained from splitting another view using a delimiter
(class template) (range adaptor object)[edit]
C documentation for strtok