Difference between revisions of "cpp/string/byte/strtok"
(link to real impl (templted to just paste MUSL's )) |
Andreas Krug (Talk | contribs) m ({{c}}, ., headers sorted, fmt) |
||
Line 1: | Line 1: | ||
− | {{cpp/title| strtok}} | + | {{cpp/title|strtok}} |
{{cpp/string/byte/navbar}} | {{cpp/string/byte/navbar}} | ||
{{dcl begin}} | {{dcl begin}} | ||
− | {{dcl header | cstring}} | + | {{dcl header|cstring}} |
− | {{dcl | 1= | + | {{dcl|1= |
char* strtok( char* str, const char* delim ); | char* strtok( char* str, const char* delim ); | ||
}} | }} | ||
{{dcl end}} | {{dcl end}} | ||
− | Finds the next token in a null-terminated byte string pointed to by {{ | + | Finds the next token in a null-terminated byte string pointed to by {{c|str}}. The separator characters are identified by null-terminated byte string pointed to by {{c|delim}}. |
This function is designed to be called multiple times to obtain successive tokens from the same string. | This function is designed to be called multiple times to obtain successive tokens from the same string. | ||
− | * If {{c|str}} is not a null pointer, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{ | + | * If {{c|str}} is not a null pointer, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{c|delim}}. |
− | :* If no such character was found, there are no tokens in {{ | + | :* If no such character was found, there are no tokens in {{c|str}} at all, and the function returns a null pointer. |
− | :* If such character was found, it is the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{ | + | :* If such character was found, it is the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{c|delim}}. |
− | ::* If no such character was found, {{ | + | ::* If no such character was found, {{c|str}} has only one token, and the future calls to {{tt|strtok}} will return a null pointer. |
::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations. | ::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations. | ||
− | :* The function then returns the pointer to the beginning of the token | + | :* The function then returns the pointer to the beginning of the token. |
* If {{c|str}} is a null pointer, the call is treated as a subsequent call to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}. | * If {{c|str}} is a null pointer, the call is treated as a subsequent call to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}. | ||
===Parameters=== | ===Parameters=== | ||
{{par begin}} | {{par begin}} | ||
− | {{par | str | pointer to the null-terminated byte string to tokenize}} | + | {{par|str|pointer to the null-terminated byte string to tokenize}} |
− | {{par | delim | pointer to the null-terminated byte string identifying delimiters}} | + | {{par|delim|pointer to the null-terminated byte string identifying delimiters}} |
{{par end}} | {{par end}} | ||
Line 30: | Line 30: | ||
===Notes=== | ===Notes=== | ||
− | This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{ | + | This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{c|str}}. In particular, a [[cpp/language/string_literal|string literal]] cannot be used as the first argument of {{tt|std::strtok}}. |
Each call to this function modifies a static variable: is not thread safe. | Each call to this function modifies a static variable: is not thread safe. | ||
Line 43: | Line 43: | ||
if (str != nullptr) | if (str != nullptr) | ||
− | |||
buffer = str; | buffer = str; | ||
− | |||
buffer += std::strspn(buffer, delim); | buffer += std::strspn(buffer, delim); | ||
if (*buffer == '\0') | if (*buffer == '\0') | ||
− | |||
return nullptr; | return nullptr; | ||
− | |||
char* const tokenBegin = buffer; | char* const tokenBegin = buffer; | ||
Line 59: | Line 55: | ||
if (*buffer != '\0') | if (*buffer != '\0') | ||
− | |||
*buffer++ = '\0'; | *buffer++ = '\0'; | ||
− | |||
return tokenBegin; | return tokenBegin; | ||
Line 67: | Line 61: | ||
}} | }} | ||
− | Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in [https://github.com/bminor/musl/blob/master/src/string/strtok.c MUSL libc]), or in terms of its reentrant version (as in [https://github.com/bminor/glibc/blob/master/string/strtok.c GNU libc]) | + | Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in [https://github.com/bminor/musl/blob/master/src/string/strtok.c MUSL libc]), or in terms of its reentrant version (as in [https://github.com/bminor/glibc/blob/master/string/strtok.c GNU libc]). |
===Example=== | ===Example=== | ||
{{example | {{example | ||
− | + | |code= | |
− | + | ||
#include <cstring> | #include <cstring> | ||
+ | #include <iomanip> | ||
#include <iostream> | #include <iostream> | ||
− | |||
int main() | int main() | ||
Line 82: | Line 75: | ||
const char* delimiters = "! +- (*)"; | const char* delimiters = "! +- (*)"; | ||
char *token = std::strtok(input, delimiters); | char *token = std::strtok(input, delimiters); | ||
− | while (token) { | + | while (token) |
+ | { | ||
std::cout << std::quoted(token) << ' '; | std::cout << std::quoted(token) << ' '; | ||
token = std::strtok(nullptr, delimiters); | token = std::strtok(nullptr, delimiters); | ||
Line 88: | Line 82: | ||
std::cout << "\nContents of the input string now:\n\""; | std::cout << "\nContents of the input string now:\n\""; | ||
− | for (std::size_t n = 0; n < sizeof input; ++n) { | + | for (std::size_t n = 0; n < sizeof input; ++n) |
+ | { | ||
if (const char c = input[n]; c != '\0') | if (const char c = input[n]; c != '\0') | ||
std::cout << c; | std::cout << c; | ||
Line 96: | Line 91: | ||
std::cout << "\"\n"; | std::cout << "\"\n"; | ||
} | } | ||
− | + | |output= | |
"one" "two" "three" "four" | "one" "two" "three" "four" | ||
Contents of the input string now: | Contents of the input string now: | ||
Line 104: | Line 99: | ||
===See also=== | ===See also=== | ||
{{dsc begin}} | {{dsc begin}} | ||
− | {{dsc inc | cpp/string/byte/dsc strpbrk}} | + | {{dsc inc|cpp/string/byte/dsc strpbrk}} |
− | {{dsc inc | cpp/string/byte/dsc strcspn}} | + | {{dsc inc|cpp/string/byte/dsc strcspn}} |
− | {{dsc inc | cpp/string/byte/dsc strspn}} | + | {{dsc inc|cpp/string/byte/dsc strspn}} |
− | {{dsc inc | cpp/ranges/dsc split_view}} | + | {{dsc inc|cpp/ranges/dsc split_view}} |
− | {{dsc see c | c/string/byte/strtok}} | + | {{dsc see c|c/string/byte/strtok}} |
{{dsc end}} | {{dsc end}} | ||
{{langlinks|de|es|fr|it|ja|pt|ru|zh}} | {{langlinks|de|es|fr|it|ja|pt|ru|zh}} |
Revision as of 08:53, 6 June 2023
Defined in header <cstring>
|
||
char* strtok( char* str, const char* delim ); |
||
Finds the next token in a null-terminated byte string pointed to by str. The separator characters are identified by null-terminated byte string pointed to by delim.
This function is designed to be called multiple times to obtain successive tokens from the same string.
- If str is not a null pointer, the call is treated as the first call to
strtok
for this particular string. The function searches for the first character which is not contained in delim.
- If no such character was found, there are no tokens in str at all, and the function returns a null pointer.
- If such character was found, it is the beginning of the token. The function then searches from that point on for the first character that is contained in delim.
- If no such character was found, str has only one token, and the future calls to
strtok
will return a null pointer. - If such character was found, it is replaced by the null character '\0' and the pointer to the following character is stored in a static location for subsequent invocations.
- If no such character was found, str has only one token, and the future calls to
- The function then returns the pointer to the beginning of the token.
- If str is a null pointer, the call is treated as a subsequent call to
strtok
: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as str.
Contents |
Parameters
str | - | pointer to the null-terminated byte string to tokenize |
delim | - | pointer to the null-terminated byte string identifying delimiters |
Return value
Pointer to the beginning of the next token or a nullptr if there are no more tokens.
Notes
This function is destructive: it writes the '\0' characters in the elements of the string str. In particular, a string literal cannot be used as the first argument of std::strtok
.
Each call to this function modifies a static variable: is not thread safe.
Unlike most other tokenizers, the delimiters in std::strtok
can be different for each subsequent token, and can even depend on the contents of the previous tokens.
Possible implementation
char* strtok(char* str, const char* delim) { static char* buffer; if (str != nullptr) buffer = str; buffer += std::strspn(buffer, delim); if (*buffer == '\0') return nullptr; char* const tokenBegin = buffer; buffer += std::strcspn(buffer, delim); if (*buffer != '\0') *buffer++ = '\0'; return tokenBegin; } |
Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in MUSL libc), or in terms of its reentrant version (as in GNU libc).
Example
#include <cstring> #include <iomanip> #include <iostream> int main() { char input[] = "one + two * (three - four)!"; const char* delimiters = "! +- (*)"; char *token = std::strtok(input, delimiters); while (token) { std::cout << std::quoted(token) << ' '; token = std::strtok(nullptr, delimiters); } std::cout << "\nContents of the input string now:\n\""; for (std::size_t n = 0; n < sizeof input; ++n) { if (const char c = input[n]; c != '\0') std::cout << c; else std::cout << "\\0"; } std::cout << "\"\n"; }
Output:
"one" "two" "three" "four" Contents of the input string now: "one\0+ two\0* (three\0- four\0!\0"
See also
finds the first location of any character from a set of separators (function) | |
returns the length of the maximum initial segment that consists of only the characters not found in another byte string (function) | |
returns the length of the maximum initial segment that consists of only the characters found in another byte string (function) | |
a view over the subranges obtained from splitting another view using a delimiter(class template) (range adaptor object) | |
C documentation for strtok
|