Difference between revisions of "cpp/string/byte/strtok"
m (Shorten template names. Use {{lc}} where appropriate.) |
Andreas Krug (Talk | contribs) m (char * -> char*) |
||
(14 intermediate revisions by 8 users not shown) | |||
Line 1: | Line 1: | ||
− | {{cpp/title| strtok}} | + | {{cpp/title|strtok}} |
{{cpp/string/byte/navbar}} | {{cpp/string/byte/navbar}} | ||
{{dcl begin}} | {{dcl begin}} | ||
− | {{dcl header | cstring}} | + | {{dcl header|cstring}} |
− | {{dcl | 1= | + | {{dcl|1= |
char* strtok( char* str, const char* delim ); | char* strtok( char* str, const char* delim ); | ||
}} | }} | ||
{{dcl end}} | {{dcl end}} | ||
− | Finds the next token in a null-terminated byte string pointed to by {{ | + | Finds the next token in a null-terminated byte string pointed to by {{c|str}}. The separator characters are identified by null-terminated byte string pointed to by {{c|delim}}. |
− | This function is designed to be called | + | This function is designed to be called multiple times to obtain successive tokens from the same string. |
− | * If {{c|str | + | * If {{c|str}} is not a null pointer, the call is treated as the first call to {{tt|strtok}} for this particular string. The function searches for the first character which is ''not'' contained in {{c|delim}}. |
− | :* If no such character was found, there are no tokens in {{ | + | :* If no such character was found, there are no tokens in {{c|str}} at all, and the function returns a null pointer. |
− | :* If such character was found, | + | :* If such character was found, it is the ''beginning of the token''. The function then searches from that point on for the first character that ''is'' contained in {{c|delim}}. |
− | ::* If no such character was found, {{ | + | ::* If no such character was found, {{c|str}} has only one token, and the future calls to {{tt|strtok}} will return a null pointer. |
::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations. | ::* If such character was found, it is ''replaced'' by the null character {{c|'\0'}} and the pointer to the following character is stored in a static location for subsequent invocations. | ||
− | :* The function then returns the pointer to the beginning of the token | + | :* The function then returns the pointer to the beginning of the token. |
− | * If {{c|str | + | * If {{c|str}} is a null pointer, the call is treated as a subsequent call to {{tt|strtok}}: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as {{c|str}}. |
===Parameters=== | ===Parameters=== | ||
{{par begin}} | {{par begin}} | ||
− | {{par | str | pointer to the null-terminated byte string to tokenize}} | + | {{par|str|pointer to the null-terminated byte string to tokenize}} |
− | {{par | delim | pointer to the null-terminated byte string identifying delimiters}} | + | {{par|delim|pointer to the null-terminated byte string identifying delimiters}} |
{{par end}} | {{par end}} | ||
===Return value=== | ===Return value=== | ||
− | Pointer to the beginning of the next token or {{ | + | Pointer to the beginning of the next token or a {{c|nullptr}} if there are no more tokens. |
===Notes=== | ===Notes=== | ||
− | This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{ | + | This function is destructive: it writes the {{c|'\0'}} characters in the elements of the string {{c|str}}. In particular, a [[cpp/language/string_literal|string literal]] cannot be used as the first argument of {{tt|std::strtok}}. |
Each call to this function modifies a static variable: is not thread safe. | Each call to this function modifies a static variable: is not thread safe. | ||
− | Unlike most other tokenizers, the delimiters in {{tt|strtok}} can be different for each subsequent token, and can even depend on the contents of the previous tokens. | + | Unlike most other tokenizers, the delimiters in {{tt|std::strtok}} can be different for each subsequent token, and can even depend on the contents of the previous tokens. |
+ | |||
+ | ===Possible implementation=== | ||
+ | {{eq fun|1= | ||
+ | char* strtok(char* str, const char* delim) | ||
+ | { | ||
+ | static char* buffer; | ||
+ | |||
+ | if (str != nullptr) | ||
+ | buffer = str; | ||
+ | |||
+ | buffer += std::strspn(buffer, delim); | ||
+ | |||
+ | if (*buffer == '\0') | ||
+ | return nullptr; | ||
+ | |||
+ | char* const tokenBegin = buffer; | ||
+ | |||
+ | buffer += std::strcspn(buffer, delim); | ||
+ | |||
+ | if (*buffer != '\0') | ||
+ | *buffer++ = '\0'; | ||
+ | |||
+ | return tokenBegin; | ||
+ | } | ||
+ | }} | ||
+ | |||
+ | Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in [https://github.com/bminor/musl/blob/master/src/string/strtok.c MUSL libc]), or in terms of its reentrant version (as in [https://github.com/bminor/glibc/blob/master/string/strtok.c GNU libc]). | ||
===Example=== | ===Example=== | ||
{{example | {{example | ||
− | + | |code= | |
− | + | ||
#include <cstring> | #include <cstring> | ||
+ | #include <iomanip> | ||
#include <iostream> | #include <iostream> | ||
int main() | int main() | ||
{ | { | ||
− | char input[ | + | char input[] = "one + two * (three - four)!"; |
− | char *token = std::strtok(input, | + | const char* delimiters = "! +- (*)"; |
− | while (token | + | char* token = std::strtok(input, delimiters); |
− | std::cout << token << ' | + | while (token) |
− | token = std::strtok( | + | { |
+ | std::cout << std::quoted(token) << ' '; | ||
+ | token = std::strtok(nullptr, delimiters); | ||
+ | } | ||
+ | |||
+ | std::cout << "\nContents of the input string now:\n\""; | ||
+ | for (std::size_t n = 0; n < sizeof input; ++n) | ||
+ | { | ||
+ | if (const char c = input[n]; c != '\0') | ||
+ | std::cout << c; | ||
+ | else | ||
+ | std::cout << "\\0"; | ||
} | } | ||
+ | std::cout << "\"\n"; | ||
} | } | ||
− | + | |output= | |
− | + | "one" "two" "three" "four" | |
− | + | Contents of the input string now: | |
− | + | "one\0+ two\0* (three\0- four\0!\0" | |
− | + | ||
− | the | + | |
− | + | ||
}} | }} | ||
===See also=== | ===See also=== | ||
{{dsc begin}} | {{dsc begin}} | ||
− | {{dsc see c | c/string/byte/strtok}} | + | {{dsc inc|cpp/string/byte/dsc strpbrk}} |
+ | {{dsc inc|cpp/string/byte/dsc strcspn}} | ||
+ | {{dsc inc|cpp/string/byte/dsc strspn}} | ||
+ | {{dsc inc|cpp/ranges/dsc split_view}} | ||
+ | {{dsc see c|c/string/byte/strtok}} | ||
{{dsc end}} | {{dsc end}} | ||
− | + | {{langlinks|de|es|fr|it|ja|pt|ru|zh}} | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + |
Latest revision as of 23:00, 21 October 2023
Defined in header <cstring>
|
||
char* strtok( char* str, const char* delim ); |
||
Finds the next token in a null-terminated byte string pointed to by str. The separator characters are identified by null-terminated byte string pointed to by delim.
This function is designed to be called multiple times to obtain successive tokens from the same string.
- If str is not a null pointer, the call is treated as the first call to
strtok
for this particular string. The function searches for the first character which is not contained in delim.
- If no such character was found, there are no tokens in str at all, and the function returns a null pointer.
- If such character was found, it is the beginning of the token. The function then searches from that point on for the first character that is contained in delim.
- If no such character was found, str has only one token, and the future calls to
strtok
will return a null pointer. - If such character was found, it is replaced by the null character '\0' and the pointer to the following character is stored in a static location for subsequent invocations.
- If no such character was found, str has only one token, and the future calls to
- The function then returns the pointer to the beginning of the token.
- If str is a null pointer, the call is treated as a subsequent call to
strtok
: the function continues from where it left in previous invocation. The behavior is the same as if the previously stored pointer is passed as str.
Contents |
[edit] Parameters
str | - | pointer to the null-terminated byte string to tokenize |
delim | - | pointer to the null-terminated byte string identifying delimiters |
[edit] Return value
Pointer to the beginning of the next token or a nullptr if there are no more tokens.
[edit] Notes
This function is destructive: it writes the '\0' characters in the elements of the string str. In particular, a string literal cannot be used as the first argument of std::strtok
.
Each call to this function modifies a static variable: is not thread safe.
Unlike most other tokenizers, the delimiters in std::strtok
can be different for each subsequent token, and can even depend on the contents of the previous tokens.
[edit] Possible implementation
char* strtok(char* str, const char* delim) { static char* buffer; if (str != nullptr) buffer = str; buffer += std::strspn(buffer, delim); if (*buffer == '\0') return nullptr; char* const tokenBegin = buffer; buffer += std::strcspn(buffer, delim); if (*buffer != '\0') *buffer++ = '\0'; return tokenBegin; } |
Actual C++ library implementations of this function delegate to the C library, where it may be implemented directly (as in MUSL libc), or in terms of its reentrant version (as in GNU libc).
[edit] Example
#include <cstring> #include <iomanip> #include <iostream> int main() { char input[] = "one + two * (three - four)!"; const char* delimiters = "! +- (*)"; char* token = std::strtok(input, delimiters); while (token) { std::cout << std::quoted(token) << ' '; token = std::strtok(nullptr, delimiters); } std::cout << "\nContents of the input string now:\n\""; for (std::size_t n = 0; n < sizeof input; ++n) { if (const char c = input[n]; c != '\0') std::cout << c; else std::cout << "\\0"; } std::cout << "\"\n"; }
Output:
"one" "two" "three" "four" Contents of the input string now: "one\0+ two\0* (three\0- four\0!\0"
[edit] See also
finds the first location of any character from a set of separators (function) | |
returns the length of the maximum initial segment that consists of only the characters not found in another byte string (function) | |
returns the length of the maximum initial segment that consists of only the characters found in another byte string (function) | |
a view over the subranges obtained from splitting another view using a delimiter(class template) (range adaptor object) | |
C documentation for strtok
|