Namespaces
Variants
Views
Actions

Difference between revisions of "cpp/regex"

From cppreference.com
< cpp
(added typedefs)
m (Minor fix.)
 
(30 intermediate revisions by 17 users not shown)
Line 1: Line 1:
{{title|Regular expressions library}}
+
{{title|Regular expressions library {{mark since c++11}}}}
{{cpp/regex/sidebar}}
+
{{cpp/regex/navbar}}
  
The regular expressions library provides a class that represents [[enwiki:Regular expression|regular expressions]], which are a kind of mini-language used to perform pattern matching within strings.
+
The regular expressions library provides a class that represents {{enwiki|Regular expression|regular expressions}}, which are a kind of mini-language used to perform pattern matching within strings. Almost all operations with regexes can be characterized by operating on several of the following objects:
  
Also provided in the regular expressions library are utility classes that provide support for various algorithms, iterators, exceptions, and type traits.
+
* '''Target sequence'''. The character sequence that is searched for a pattern. This may be a range specified by two iterators, a null-terminated character string or a {{lc|std::string}}.
  
===Main types===
+
* '''Pattern'''. This is the regular expression itself. It determines what constitutes a match. It is an object of type {{lc|std::basic_regex}}, constructed from a string with special [[#Regular expression grammars|grammar]].
  
These classes encapsulate a regular expression and the results of matching a regular expression within a target sequence of characters.
+
* '''Matched array'''. The information about matches may be retrieved as an object of type {{lc|std::match_results}}.
  
{{dcl list begin}}
+
* '''Replacement string'''. This is a string that determines how to replace the matches.
{{dcl list template | cpp/regex/dcl list basic_regex }}
+
{{dcl list tclass | cpp/regex/sub_match | identifies a sub match in the analyzed string | notes={{mark c++11}}}}
+
{{dcl list template | cpp/regex/dcl list match_results }}
+
{{dcl list end}}
+
  
Two typedefs are also provided for narrow and wide character types:
+
===Regular expression grammars===
 +
Patterns and replacement strings support the following regular expression grammars:
 +
* [[cpp/regex/ecmascript|Modified ECMAScript regular expression grammar]]. This is the default grammar.
 +
* [https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03 Basic POSIX regular expression grammar].
 +
* [https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04 Extended POSIX regular expression grammar].
 +
* The regular expression grammar used by the {{box|[https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_04 awk]}} utility in POSIX.
 +
* The regular expression grammar used by the {{box|[https://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html grep]}} utility in POSIX. This is effectively the same as the basic POSIX regular expression grammar, with the addition of newline {{c|'\n'}} as an alternation separator.
 +
* The regular expression grammar used by the {{c|grep}} utility, with the {{c|-E}} option, in POSIX. This is effectively the same as the extended POSIX regular expression grammar, with the addition of newline {{c|'\n'}} as an alternation separator in addition to {{c|'{{!}}'}}.
  
{{tdcl list begin}}
+
Some grammar variations (such as case-insensitive matching) are also avaliable, see {{rl|basic_regex/constants|this page}} for details.
{{tdcl list hitem | Type | Definition}}
+
 
{{tdcl list item | {{tt|regex}} | {{c|basic_regex<char>}} | notes={{mark since c++11}} }}
+
===Main classes===
{{tdcl list item | {{tt|wregex}} | {{c|basic_regex<wchar_t>}} | notes={{mark since c++11}} }}
+
These classes encapsulate a regular expression and the results of matching a regular expression within a target sequence of characters.
{{tdcl list end}}
+
 
 +
{{dsc begin}}
 +
{{dsc inc|cpp/regex/dsc basic_regex}}
 +
{{dsc inc|cpp/regex/dsc sub_match}}
 +
{{dsc inc|cpp/regex/dsc match_results}}
 +
{{dsc end}}
  
 
===Algorithms===
 
===Algorithms===
 +
These functions are used to apply the regular expression encapsulated in a regex to a target sequence of characters.
  
These functions are used to apply the regular expression encapsulated in a regex to a target sequence of characters..
+
{{dsc begin}}
 
+
{{dsc inc|cpp/regex/dsc regex_match}}
{{dcl list begin}}
+
{{dsc inc|cpp/regex/dsc regex_search}}
{{dcl list tfun | cpp/regex/regex_match | try to match a regular expression with a complete string | notes={{mark c++11}}}}
+
{{dsc inc|cpp/regex/dsc regex_replace}}
{{dcl list tfun | cpp/regex/regex_search | check if a regular expression occurs anywhere within a string | notes={{mark c++11}}}}
+
{{dsc end}}
{{dcl list tfun | cpp/regex/regex_replace | replace occurrences of a regular expression with some other text | notes={{mark c++11}}}}
+
{{dcl list end}}
+
  
 
===Iterators===
 
===Iterators===
 
 
The regex iterators are used to traverse the entire set of regular expression matches found within a sequence.
 
The regex iterators are used to traverse the entire set of regular expression matches found within a sequence.
  
{{dcl list begin}}
+
{{dsc begin}}
{{dcl list tclass | cpp/regex/regex_iterator | | notes={{mark c++11}}}}
+
{{dsc inc|cpp/regex/dsc regex_iterator}}
{{dcl list tclass | cpp/regex/regex_token_iterator | | notes={{mark c++11}}}}
+
{{dsc inc|cpp/regex/dsc regex_token_iterator}}
{{dcl list end}}
+
{{dsc end}}
  
 
===Exceptions===
 
===Exceptions===
 
 
This class defines the type of objects thrown as exceptions to report errors from the regular expressions library.
 
This class defines the type of objects thrown as exceptions to report errors from the regular expressions library.
  
{{dcl list begin}}
+
{{dsc begin}}
{{dcl list class | cpp/regex/regex_error | | notes={{mark c++11}}}}
+
{{dsc inc|cpp/regex/dsc regex_error}}
{{dcl list end}}
+
{{dsc end}}
  
 
===Traits===
 
===Traits===
 
 
The regex traits class is used to encapsulate the localizable aspects of a regex.
 
The regex traits class is used to encapsulate the localizable aspects of a regex.
  
{{dcl list begin}}
+
{{dsc begin}}
{{dcl list tclass | cpp/regex/regex_traits | | notes={{mark c++11}}}}
+
{{dsc inc|cpp/regex/dsc regex_traits}}
{{dcl list end}}
+
{{dsc end}}
  
 
===Constants===
 
===Constants===
 
+
{{dsc begin}}
{{dcl list begin}}
+
{{dsc namespace|std::regex_constants}}
{{dcl list namespace | std::regex_constants }}
+
{{dsc inc|cpp/regex/dsc syntax_option_type}}
{{dcl list template | cpp/regex/dcl list syntax_option_type}}
+
{{dsc inc|cpp/regex/dsc match_flag_type}}
{{dcl list template | cpp/regex/dcl list match_flag_type}}
+
{{dsc inc|cpp/regex/dsc error_type}}
{{dcl list template | cpp/regex/dcl list error_type}}
+
{{dsc end}}
{{dcl list end}}
+
 
+
<!--=====regex_constants=====
+
 
+
The namespace ''std::regex_constants'' holds symbolic constants used by the regular expression library.  These constants are used to modify how regular expressions are created and executed.
+
 
+
===Contents===
+
 
+
The namespace provides three types:
+
 
+
  * [[syntax_option_type]]
+
  * [[match_flag_type]]
+
  * [[error_type]]
+
 
+
In addition, ''std::regex_constants'' defines several bitwise operators for manipulating each type.
+
  
 
===Example===
 
===Example===
 +
{{example
 +
|code=
 +
#include <iostream>
 +
#include <iterator>
 +
#include <regex>
 +
#include <string>
  
The ''[[/regex/basic_regex/]]'' defines internal constants that match [[syntax_option_type]] that can be used to change how regular expressions operate. The following code creates a regular expression for type ''char'' (i.e. ''std::regex'') that is case-insensitive and optimized for match speed):
+
int main()
 
+
{
<code>
+
    std::string s = "Some people, when confronted with a problem, think "
std::regex imageRegex(R"(\w+\.((gif)|(png)|(jpg)))",  
+
        "\"I know, I'll use regular expressions.\" "
                      std::regex::icase | std::regex::optimize);
+
        "Now they have two problems.";
</code>
+
   
 +
    std::regex self_regex("REGULAR EXPRESSIONS",
 +
        std::regex_constants::ECMAScript {{!}} std::regex_constants::icase);
 +
    if (std::regex_search(s, self_regex))
 +
        std::cout << "Text contains the phrase 'regular expressions'\n";
 +
   
 +
    std::regex word_regex("(\\w+)");
 +
    auto words_begin =
 +
        std::sregex_iterator(s.begin(), s.end(), word_regex);
 +
    auto words_end = std::sregex_iterator();
 +
   
 +
    std::cout << "Found "
 +
              << std::distance(words_begin, words_end)
 +
              << " words\n";
 +
   
 +
    const int N = 6;
 +
    std::cout << "Words longer than " << N << " characters:\n";
 +
    for (std::sregex_iterator i = words_begin; i != words_end; ++i)
 +
    {
 +
        std::smatch match = *i;
 +
        std::string match_str = match.str();
 +
        if (match_str.size() > N)
 +
            std::cout << "  " << match_str << '\n';
 +
    }
 +
   
 +
    std::regex long_word_regex("(\\w{7,})");
 +
    std::string new_s = std::regex_replace(s, long_word_regex, "[$&]");
 +
    std::cout << new_s << '\n';
 +
}
 +
|output=
 +
Text contains the phrase 'regular expressions'
 +
Found 20 words
 +
Words longer than 6 characters:
 +
  confronted
 +
  problem
 +
  regular
 +
  expressions
 +
  problems
 +
Some people, when [confronted] with a [problem], think
 +
"I know, I'll use [regular] [expressions]." Now they have two [problems].
 +
}}
  
-->
+
{{langlinks|ar|de|es|fr|it|ja|ko|pt|ru|zh}}

Latest revision as of 09:54, 10 June 2024

The regular expressions library provides a class that represents regular expressions, which are a kind of mini-language used to perform pattern matching within strings. Almost all operations with regexes can be characterized by operating on several of the following objects:

  • Target sequence. The character sequence that is searched for a pattern. This may be a range specified by two iterators, a null-terminated character string or a std::string.
  • Pattern. This is the regular expression itself. It determines what constitutes a match. It is an object of type std::basic_regex, constructed from a string with special grammar.
  • Matched array. The information about matches may be retrieved as an object of type std::match_results.
  • Replacement string. This is a string that determines how to replace the matches.

Contents

[edit] Regular expression grammars

Patterns and replacement strings support the following regular expression grammars:

  • Modified ECMAScript regular expression grammar. This is the default grammar.
  • Basic POSIX regular expression grammar.
  • Extended POSIX regular expression grammar.
  • The regular expression grammar used by the awk utility in POSIX.
  • The regular expression grammar used by the grep utility in POSIX. This is effectively the same as the basic POSIX regular expression grammar, with the addition of newline '\n' as an alternation separator.
  • The regular expression grammar used by the grep utility, with the -E option, in POSIX. This is effectively the same as the extended POSIX regular expression grammar, with the addition of newline '\n' as an alternation separator in addition to '|'.

Some grammar variations (such as case-insensitive matching) are also avaliable, see this page for details.

[edit] Main classes

These classes encapsulate a regular expression and the results of matching a regular expression within a target sequence of characters.

regular expression object
(class template) [edit]
(C++11)
identifies the sequence of characters matched by a sub-expression
(class template) [edit]
identifies one regular expression match, including all sub-expression matches
(class template) [edit]

[edit] Algorithms

These functions are used to apply the regular expression encapsulated in a regex to a target sequence of characters.

attempts to match a regular expression to an entire character sequence
(function template) [edit]
attempts to match a regular expression to any part of a character sequence
(function template) [edit]
replaces occurrences of a regular expression with formatted replacement text
(function template) [edit]

[edit] Iterators

The regex iterators are used to traverse the entire set of regular expression matches found within a sequence.

iterates through all regex matches within a character sequence
(class template) [edit]
iterates through the specified sub-expressions within all regex matches in a given string or through unmatched substrings
(class template) [edit]

[edit] Exceptions

This class defines the type of objects thrown as exceptions to report errors from the regular expressions library.

reports errors generated by the regular expressions library
(class) [edit]

[edit] Traits

The regex traits class is used to encapsulate the localizable aspects of a regex.

provides metainformation about a character type, required by the regex library
(class template) [edit]

[edit] Constants

Defined in namespace std::regex_constants
general options controlling regex behavior
(typedef) [edit]
options specific to matching
(typedef) [edit]
describes different types of matching errors
(typedef) [edit]

[edit] Example

#include <iostream>
#include <iterator>
#include <regex>
#include <string>
 
int main()
{
    std::string s = "Some people, when confronted with a problem, think "
        "\"I know, I'll use regular expressions.\" "
        "Now they have two problems.";
 
    std::regex self_regex("REGULAR EXPRESSIONS",
        std::regex_constants::ECMAScript | std::regex_constants::icase);
    if (std::regex_search(s, self_regex))
        std::cout << "Text contains the phrase 'regular expressions'\n";
 
    std::regex word_regex("(\\w+)");
    auto words_begin = 
        std::sregex_iterator(s.begin(), s.end(), word_regex);
    auto words_end = std::sregex_iterator();
 
    std::cout << "Found "
              << std::distance(words_begin, words_end)
              << " words\n";
 
    const int N = 6;
    std::cout << "Words longer than " << N << " characters:\n";
    for (std::sregex_iterator i = words_begin; i != words_end; ++i)
    {
        std::smatch match = *i;
        std::string match_str = match.str();
        if (match_str.size() > N)
            std::cout << "  " << match_str << '\n';
    }
 
    std::regex long_word_regex("(\\w{7,})");
    std::string new_s = std::regex_replace(s, long_word_regex, "[$&]");
    std::cout << new_s << '\n';
}

Output:

Text contains the phrase 'regular expressions'
Found 20 words
Words longer than 6 characters:
  confronted
  problem
  regular
  expressions
  problems
Some people, when [confronted] with a [problem], think 
"I know, I'll use [regular] [expressions]." Now they have two [problems].