Problem
My cat is brown
category
octocat
staccato
- find word ‘cat’
- find word begin with ‘cat’
- find word end with ‘cat’
- find word contain ‘cat’
- find word not begin with ‘cat’
- find word not end with ‘cat’
- find word not contain ‘cat’
Solution
Word boundaries1
\bcat\b
- Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Nonboundaries1
\Bcat\B
Regex options: None
- Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1
2
3
4
5\bcat (?<!\w)(?=\w)cat
cat\b cat(?<=\w)(?!\w)
\Bcat (?<=\w)cat(?!\w)
cat\B (?<!\w)cat(?=\w)
\b(?!\w*?cat\w*?)\w+?\b
\b -> (?<=\w)(?!\w)|(?<!\w)(?=\w)
Discussion
‹\b› matches in these three positions:
- Before the first character in the subject, if the first character is a word character
- After the last character in the subject, if the last character is a word character
- Between two characters in the subject, where one is a word character and the other
is not a word character
‹\B› matches in these five positions:
- Before the first character in the subject, if the first character is not a word character
- After the last character in the subject, if the last character is not a word character
- Between two word characters
- Between two nonword characters
- The empty string
Word Characters
Java
:- Java 4 to 6 ‹\w› matches only ASCII characters
- Java 7 ‹\w› extended matches Unicode characters if set the
UNICODE_CHARACTER_CLASS
flag - All version Java ‹\b› is Unicode-enabled, supporting any script
.NET, JavaScript, PCRE, Perl, Python, and Ruby have
:- ‹\b› match between two characters where one is matched by ‹\w› and the other by ‹\W›.
- ‹\B› always matches between two characters where both are matched by ‹\w› or ‹\W›
JavaScript, PCRE, and Ruby
: ‹\w› is identical to ‹[a-zA-Z0-9_]› so only “whole words only” search in language which use Latin alphabet..NET
: treats letters and digits from all scripts as word characters. You can do a “whole words only” search on words in any languagePython 2.x
: non-ASCII characters are included only if you pass theUNICODE
orU
flag when creating the regex.Python 3.x
: non-ASCII character are included by default, but you can exclude them with theASCII
orA
flag. This flag affects both ‹\b› and ‹\w› equally.Perl
: depends on your version ofPerl
and/adlu
flags whether ‹\w› is pure ASCII or includes all Unicode letters, digits, and underscores.