Namespaces
Variants
Views
Actions

Difference between revisions of "cpp/string/byte/strtok"

From cppreference.com
< cpp‎ | string‎ | byte
m (Shorten template names. Use {{lc}} where appropriate.)
m (char * -> char*)
 
(14 intermediate revisions by 8 users not shown)
Line 1: Line 1:
{{cpp/title| strtok}}
+
{{cpp/title|strtok}}
 
{{cpp/string/byte/navbar}}
 
{{cpp/string/byte/navbar}}
 
{{dcl begin}}
 
{{dcl begin}}
{{dcl header | cstring}}
+
{{dcl header|cstring}}
{{dcl | 1=
+
{{dcl|1=
 
char* strtok( char* str, const char* delim );
 
char* strtok( char* str, const char* delim );
 
}}
 
}}
 
{{dcl end}}
 
{{dcl end}}
  
Finds the next token in a null-terminated byte string pointed to by {{tt|str}}. The separator characters are identified by null-terminated byte string pointed to by {{tt|delim}}.
+
Finds the next token in a null-terminated byte string pointed to by {{c|str}}. The separator characters are identified by null-terminated byte string pointed to by {{c|delim}}.
  
This function is designed to be called multiples times to obtain successive tokens from the same string.
+
This function is designed to be called multiple times to obtain successive tokens from the same string.
  
* If {{c|str !{{=}} NULL}}, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{tt|delim}}.
+
* If {{c|str}} is not a null pointer, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{c|delim}}.
:* If no such character was found, there are no tokens in {{tt|str}} at all, and the function returns a null pointer.  
+
:* If no such character was found, there are no tokens in {{c|str}} at all, and the function returns a null pointer.  
:* If such character was found, is it the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{tt|delim}}.  
+
:* If such character was found, it is the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{c|delim}}.  
::* If no such character was found, {{tt|str}} has only one token, and the future calls to {{tt|strtok}} will return a null pointer
+
::* If no such character was found, {{c|str}} has only one token, and the future calls to {{tt|strtok}} will return a null pointer.
 
::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations.
 
::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations.
:* The function then returns the pointer to the beginning of the token
+
:* The function then returns the pointer to the beginning of the token.
* If {{c|str {{==}} NULL}}, the call is treated as a subsequent calls to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}.
+
* If {{c|str}} is a null pointer, the call is treated as a subsequent call to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}.
  
 
===Parameters===
 
===Parameters===
 
{{par begin}}
 
{{par begin}}
{{par | str | pointer to the null-terminated byte string to tokenize}}
+
{{par|str|pointer to the null-terminated byte string to tokenize}}
{{par | delim | pointer to the null-terminated byte string identifying delimiters}}
+
{{par|delim|pointer to the null-terminated byte string identifying delimiters}}
 
{{par end}}
 
{{par end}}
  
 
===Return value===
 
===Return value===
Pointer to the beginning of the next token or {{lc|NULL}} if there are no more tokens.
+
Pointer to the beginning of the next token or a {{c|nullptr}} if there are no more tokens.
  
 
===Notes===
 
===Notes===
This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{tt|str}}. In particular, a [[cpp/language/string_literal|string literal]] cannot be used as the first argument of {{tt|strtok}}.
+
This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{c|str}}. In particular, a [[cpp/language/string_literal|string literal]] cannot be used as the first argument of {{tt|std::strtok}}.
  
 
Each call to this function modifies a static variable: is not thread safe.
 
Each call to this function modifies a static variable: is not thread safe.
  
Unlike most other tokenizers, the delimiters in {{tt|strtok}} can be different for each subsequent token, and can even depend on the contents of the previous tokens.
+
Unlike most other tokenizers, the delimiters in {{tt|std::strtok}} can be different for each subsequent token, and can even depend on the contents of the previous tokens.
 +
 
 +
===Possible implementation===
 +
{{eq fun|1=
 +
char* strtok(char* str, const char* delim)
 +
{
 +
    static char* buffer;
 +
 
 +
    if (str != nullptr)
 +
        buffer = str;
 +
 
 +
    buffer += std::strspn(buffer, delim);
 +
 
 +
    if (*buffer == '\0')
 +
        return nullptr;
 +
 
 +
    char* const tokenBegin = buffer;
 +
 
 +
    buffer += std::strcspn(buffer, delim);
 +
 
 +
    if (*buffer != '\0')
 +
        *buffer++ = '\0';
 +
 
 +
    return tokenBegin;
 +
}
 +
}}
 +
 
 +
Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in [https://github.com/bminor/musl/blob/master/src/string/strtok.c MUSL libc]), or in terms of its reentrant version (as in [https://github.com/bminor/glibc/blob/master/string/strtok.c GNU libc]).
  
 
===Example===
 
===Example===
 
{{example
 
{{example
|
+
|code=
| code=
+
 
#include <cstring>
 
#include <cstring>
 +
#include <iomanip>
 
#include <iostream>
 
#include <iostream>
  
 
int main()  
 
int main()  
 
{
 
{
     char input[100] = "A bird came down the walk";
+
     char input[] = "one + two * (three - four)!";
     char *token = std::strtok(input, " ");
+
    const char* delimiters = "! +- (*)";
     while (token != NULL) {
+
     char* token = std::strtok(input, delimiters);
         std::cout << token << '\n';
+
     while (token)
         token = std::strtok(NULL, " ");
+
    {
 +
         std::cout << std::quoted(token) << ' ';
 +
         token = std::strtok(nullptr, delimiters);
 +
    }
 +
 
 +
    std::cout << "\nContents of the input string now:\n\"";
 +
    for (std::size_t n = 0; n < sizeof input; ++n)
 +
    {
 +
        if (const char c = input[n]; c != '\0')
 +
            std::cout << c;
 +
        else
 +
            std::cout << "\\0";
 
     }
 
     }
 +
    std::cout << "\"\n";
 
}
 
}
| output=
+
|output=
A
+
"one" "two" "three" "four"
bird
+
Contents of the input string now:
came
+
"one\0+ two\0* (three\0- four\0!\0"
down
+
the
+
walk
+
 
}}
 
}}
  
 
===See also===
 
===See also===
 
{{dsc begin}}
 
{{dsc begin}}
{{dsc see c | c/string/byte/strtok}}
+
{{dsc inc|cpp/string/byte/dsc strpbrk}}
 +
{{dsc inc|cpp/string/byte/dsc strcspn}}
 +
{{dsc inc|cpp/string/byte/dsc strspn}}
 +
{{dsc inc|cpp/ranges/dsc split_view}}
 +
{{dsc see c|c/string/byte/strtok}}
 
{{dsc end}}
 
{{dsc end}}
  
[[de:cpp/string/byte/strtok]]
+
{{langlinks|de|es|fr|it|ja|pt|ru|zh}}
[[es:cpp/string/byte/strtok]]
+
[[fr:cpp/string/byte/strtok]]
+
[[it:cpp/string/byte/strtok]]
+
[[ja:cpp/string/byte/strtok]]
+
[[pt:cpp/string/byte/strtok]]
+
[[ru:cpp/string/byte/strtok]]
+
[[zh:cpp/string/byte/strtok]]
+

Latest revision as of 23:00, 21 October 2023

Defined in header <cstring>
char* strtok( char* str, const char* delim );

Finds the next token in a null-terminated byte string pointed to by str. The separator characters are identified by null-terminated byte string pointed to by delim.

This function is designed to be called multiple times to obtain successive tokens from the same string.

  • If str is not a null pointer, the call is treated as the first call to strtok for this particular string. The function searches for the first character which is not contained in delim.
  • If no such character was found, there are no tokens in str at all, and the function returns a null pointer.
  • If such character was found, it is the beginning of the token. The function then searches from that point on for the first character that is contained in delim.
  • If no such character was found, str has only one token, and the future calls to strtok will return a null pointer.
  • If such character was found, it is replaced by the null character '\0' and the pointer to the following character is stored in a static location for subsequent invocations.
  • The function then returns the pointer to the beginning of the token.
  • If str is a null pointer, the call is treated as a subsequent call to strtok: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as str.

Contents

[edit] Parameters

str - pointer to the null-terminated byte string to tokenize
delim - pointer to the null-terminated byte string identifying delimiters

[edit] Return value

Pointer to the beginning of the next token or a nullptr if there are no more tokens.

[edit] Notes

This function is destructive: it writes the '\0' characters in the elements of the string str. In particular, a string literal cannot be used as the first argument of std::strtok.

Each call to this function modifies a static variable: is not thread safe.

Unlike most other tokenizers, the delimiters in std::strtok can be different for each subsequent token, and can even depend on the contents of the previous tokens.

[edit] Possible implementation

char* strtok(char* str, const char* delim)
{
    static char* buffer;
 
    if (str != nullptr)
        buffer = str;
 
    buffer += std::strspn(buffer, delim);
 
    if (*buffer == '\0')
        return nullptr;
 
    char* const tokenBegin = buffer;
 
    buffer += std::strcspn(buffer, delim);
 
    if (*buffer != '\0')
        *buffer++ = '\0';
 
    return tokenBegin;
}

Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in MUSL libc), or in terms of its reentrant version (as in GNU libc).

[edit] Example

#include <cstring>
#include <iomanip>
#include <iostream>
 
int main() 
{
    char input[] = "one + two * (three - four)!";
    const char* delimiters = "! +- (*)";
    char* token = std::strtok(input, delimiters);
    while (token)
    {
        std::cout << std::quoted(token) << ' ';
        token = std::strtok(nullptr, delimiters);
    }
 
    std::cout << "\nContents of the input string now:\n\"";
    for (std::size_t n = 0; n < sizeof input; ++n)
    {
        if (const char c = input[n]; c != '\0')
            std::cout << c;
        else
            std::cout << "\\0";
    }
    std::cout << "\"\n";
}

Output:

"one" "two" "three" "four" 
Contents of the input string now:
"one\0+ two\0* (three\0- four\0!\0"

[edit] See also

finds the first location of any character from a set of separators
(function) [edit]
returns the length of the maximum initial segment that consists
of only the characters not found in another byte string
(function) [edit]
returns the length of the maximum initial segment that consists
of only the characters found in another byte string
(function) [edit]
a view over the subranges obtained from splitting another view using a delimiter
(class template) (range adaptor object)[edit]
C documentation for strtok