Difference between revisions of "cpp/string/byte/strtok"
m (r2.7.3) (Robot: Adding fr:cpp/string/byte/strtok, ja:cpp/string/byte/strtok, zh:cpp/string/byte/strtok) |
Andreas Krug (Talk | contribs) m (char * -> char*) |
||
(21 intermediate revisions by 11 users not shown) | |||
Line 1: | Line 1: | ||
− | {{cpp/title| strtok}} | + | {{cpp/title|strtok}} |
− | {{cpp/string/byte/ | + | {{cpp/string/byte/navbar}} |
− | {{ | + | {{dcl begin}} |
− | {{ | + | {{dcl header|cstring}} |
− | {{ | + | {{dcl|1= |
char* strtok( char* str, const char* delim ); | char* strtok( char* str, const char* delim ); | ||
}} | }} | ||
− | {{ | + | {{dcl end}} |
− | Finds the next token in a null-terminated byte string pointed to by {{ | + | Finds the next token in a null-terminated byte string pointed to by {{c|str}}. The separator characters are identified by null-terminated byte string pointed to by {{c|delim}}. |
− | + | This function is designed to be called multiple times to obtain successive tokens from the same string. | |
− | If {{c|str {{ | + | * If {{c|str}} is not a null pointer, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{c|delim}}. |
+ | :* If no such character was found, there are no tokens in {{c|str}} at all, and the function returns a null pointer. | ||
+ | :* If such character was found, it is the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{c|delim}}. | ||
+ | ::* If no such character was found, {{c|str}} has only one token, and the future calls to {{tt|strtok}} will return a null pointer. | ||
+ | ::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations. | ||
+ | :* The function then returns the pointer to the beginning of the token. | ||
+ | * If {{c|str}} is a null pointer, the call is treated as a subsequent call to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}. | ||
===Parameters=== | ===Parameters=== | ||
− | {{ | + | {{par begin}} |
− | {{ | + | {{par|str|pointer to the null-terminated byte string to tokenize}} |
− | {{ | + | {{par|delim|pointer to the null-terminated byte string identifying delimiters}} |
− | {{ | + | {{par end}} |
===Return value=== | ===Return value=== | ||
− | Pointer to the beginning of | + | Pointer to the beginning of the next token or a {{c|nullptr}} if there are no more tokens. |
− | === | + | ===Notes=== |
− | + | This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{c|str}}. In particular, a [[cpp/language/string_literal|string literal]] cannot be used as the first argument of {{tt|std::strtok}}. | |
+ | |||
+ | Each call to this function modifies a static variable: is not thread safe. | ||
+ | |||
+ | Unlike most other tokenizers, the delimiters in {{tt|std::strtok}} can be different for each subsequent token, and can even depend on the contents of the previous tokens. | ||
+ | |||
+ | ===Possible implementation=== | ||
+ | {{eq fun|1= | ||
+ | char* strtok(char* str, const char* delim) | ||
+ | { | ||
+ | static char* buffer; | ||
+ | |||
+ | if (str != nullptr) | ||
+ | buffer = str; | ||
+ | |||
+ | buffer += std::strspn(buffer, delim); | ||
+ | |||
+ | if (*buffer == '\0') | ||
+ | return nullptr; | ||
+ | |||
+ | char* const tokenBegin = buffer; | ||
+ | |||
+ | buffer += std::strcspn(buffer, delim); | ||
+ | |||
+ | if (*buffer != '\0') | ||
+ | *buffer++ = '\0'; | ||
+ | |||
+ | return tokenBegin; | ||
+ | } | ||
+ | }} | ||
+ | |||
+ | Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in [https://github.com/bminor/musl/blob/master/src/string/strtok.c MUSL libc]), or in terms of its reentrant version (as in [https://github.com/bminor/glibc/blob/master/string/strtok.c GNU libc]). | ||
===Example=== | ===Example=== | ||
{{example | {{example | ||
− | + | |code= | |
− | + | #include <cstring> | |
− | + | #include <iomanip> | |
+ | #include <iostream> | ||
+ | |||
+ | int main() | ||
+ | { | ||
+ | char input[] = "one + two * (three - four)!"; | ||
+ | const char* delimiters = "! +- (*)"; | ||
+ | char* token = std::strtok(input, delimiters); | ||
+ | while (token) | ||
+ | { | ||
+ | std::cout << std::quoted(token) << ' '; | ||
+ | token = std::strtok(nullptr, delimiters); | ||
+ | } | ||
+ | |||
+ | std::cout << "\nContents of the input string now:\n\""; | ||
+ | for (std::size_t n = 0; n < sizeof input; ++n) | ||
+ | { | ||
+ | if (const char c = input[n]; c != '\0') | ||
+ | std::cout << c; | ||
+ | else | ||
+ | std::cout << "\\0"; | ||
+ | } | ||
+ | std::cout << "\"\n"; | ||
+ | } | ||
+ | |output= | ||
+ | "one" "two" "three" "four" | ||
+ | Contents of the input string now: | ||
+ | "one\0+ two\0* (three\0- four\0!\0" | ||
}} | }} | ||
− | + | ===See also=== | |
− | + | {{dsc begin}} | |
− | + | {{dsc inc|cpp/string/byte/dsc strpbrk}} | |
+ | {{dsc inc|cpp/string/byte/dsc strcspn}} | ||
+ | {{dsc inc|cpp/string/byte/dsc strspn}} | ||
+ | {{dsc inc|cpp/ranges/dsc split_view}} | ||
+ | {{dsc see c|c/string/byte/strtok}} | ||
+ | {{dsc end}} | ||
+ | |||
+ | {{langlinks|de|es|fr|it|ja|pt|ru|zh}} |
Latest revision as of 23:00, 21 October 2023
Defined in header <cstring>
|
||
char* strtok( char* str, const char* delim ); |
||
Finds the next token in a null-terminated byte string pointed to by str. The separator characters are identified by null-terminated byte string pointed to by delim.
This function is designed to be called multiple times to obtain successive tokens from the same string.
- If str is not a null pointer, the call is treated as the first call to
strtok
for this particular string. The function searches for the first character which is not contained in delim.
- If no such character was found, there are no tokens in str at all, and the function returns a null pointer.
- If such character was found, it is the beginning of the token. The function then searches from that point on for the first character that is contained in delim.
- If no such character was found, str has only one token, and the future calls to
strtok
will return a null pointer. - If such character was found, it is replaced by the null character '\0' and the pointer to the following character is stored in a static location for subsequent invocations.
- If no such character was found, str has only one token, and the future calls to
- The function then returns the pointer to the beginning of the token.
- If str is a null pointer, the call is treated as a subsequent call to
strtok
: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as str.
Contents |
[edit] Parameters
str | - | pointer to the null-terminated byte string to tokenize |
delim | - | pointer to the null-terminated byte string identifying delimiters |
[edit] Return value
Pointer to the beginning of the next token or a nullptr if there are no more tokens.
[edit] Notes
This function is destructive: it writes the '\0' characters in the elements of the string str. In particular, a string literal cannot be used as the first argument of std::strtok
.
Each call to this function modifies a static variable: is not thread safe.
Unlike most other tokenizers, the delimiters in std::strtok
can be different for each subsequent token, and can even depend on the contents of the previous tokens.
[edit] Possible implementation
char* strtok(char* str, const char* delim) { static char* buffer; if (str != nullptr) buffer = str; buffer += std::strspn(buffer, delim); if (*buffer == '\0') return nullptr; char* const tokenBegin = buffer; buffer += std::strcspn(buffer, delim); if (*buffer != '\0') *buffer++ = '\0'; return tokenBegin; } |
Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in MUSL libc), or in terms of its reentrant version (as in GNU libc).
[edit] Example
#include <cstring> #include <iomanip> #include <iostream> int main() { char input[] = "one + two * (three - four)!"; const char* delimiters = "! +- (*)"; char* token = std::strtok(input, delimiters); while (token) { std::cout << std::quoted(token) << ' '; token = std::strtok(nullptr, delimiters); } std::cout << "\nContents of the input string now:\n\""; for (std::size_t n = 0; n < sizeof input; ++n) { if (const char c = input[n]; c != '\0') std::cout << c; else std::cout << "\\0"; } std::cout << "\"\n"; }
Output:
"one" "two" "three" "four" Contents of the input string now: "one\0+ two\0* (three\0- four\0!\0"
[edit] See also
finds the first location of any character from a set of separators (function) | |
returns the length of the maximum initial segment that consists of only the characters not found in another byte string (function) | |
returns the length of the maximum initial segment that consists of only the characters found in another byte string (function) | |
a view over the subranges obtained from splitting another view using a delimiter(class template) (range adaptor object) | |
C documentation for strtok
|