Repeat Certain Number
Hexadecimal number
\b[a-f0-9]{1,8}\b
- Regex options: Case insensitive
- Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Floating-point number
\d*\.\d+(e\d+)?
- Regex options: Case insensitive
- Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Lazy quantifiers
1 | *? |
- any times, one after another, the less the better
- more than one time, one after another, the less the better
- have or not have, not have first
- seven to fourtytwo times, the less the better
In default repeat like *
+
?
{}
after match unit are all greedy[greedy is not selfish]
Possessive quantifiers
Possessive quantifiers is greedy and selfish
It’s greedy and selfish, when it matches, it wouldn’t care other expressions’ express.
In other words, it wouldn’t Backtracking when matches.
1 | *+ |
- any times, block reduce to match, the more the better, no backtracking
- more than one time, the more the better, no backtracking
- have or not have, the more the better, no backtracking
- seven to fourtytwo times, the more the better, no backtracking
\b\d++\b
- Regex options: None
- Regex flavors: Java, PCRE, Perl 5.10, Ruby 1.9
\b(?>\d+)\b
- Regex options: None
- Regex flavors: .NET, Java, PCRE, Perl, Ruby
ps. (?>\d+)
is an atomic group that essentially is a noncapturing group,
with the extra job of refusing to backtrack.
Example
\w++\d++
can’t mathes “123abc”
Because \w++
selfishly occupy “123abc”, don’t care \d++ failed to match.
When use (\w+)(\d+)
to match “123abc”, i found gourp \1 matches “12”
and group \2 matches “3”, so greedy works.
It’s not mean selfish is bad, but only unusual work in simply task.
Prevent Runaway Repetition
Why prevent? because in some complex condition, expression will complex the same.
In these condition Possessive quantifiers
is necessary.
Problem
Match a html’s origin code
Solution1
2<html>(?>.*?<head>)(?>.*?<title>)(?>.*?</title>)↵
(?>.*?</head>)(?>.*?<body[^>]*>)(?>.*?</body>).*?</html>
- Regex options: Case insensitive, dot matches line breaks
- Regex flavors: .NET, Java, PCRE, Perl, Ruby
ps.If some tag properties need be matched, please modify expression.
This regular expression has a worst-case complexity$^3$ of $O(n^7)$
If the file is twice the size, the regex can need up to 128 times
as many steps to figure out it doesn’t match.
Example
(x+x+)+y
to matches “xxxxxxxxxx”
It should not matches, watching the operation progress.
This regex is $O(2^n)$, if one more ‘x’ is added, the regex can need up to 2times