Match whole words

Problem

My cat is brown
category
octocat
staccato

Word boundaries

\bcat\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Nonboundaries
1
\Bcat\B
Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\bcat       (?<!\w)(?=\w)cat
cat\b       cat(?<=\w)(?!\w)
\Bcat       (?<=\w)cat(?!\w)
cat\B       (?<!\w)cat(?=\w)
\b(?!\w*?cat\w*?)\w+?\b

\b -> (?<=\w)(?!\w)|(?<!\w)(?=\w)

‹\b› matches in these three positions:

Before the first character in the subject, if the first character is a word character
After the last character in the subject, if the last character is a word character
Between two characters in the subject, where one is a word character and the other
is not a word character

‹\B› matches in these five positions:

Before the first character in the subject, if the first character is not a word character
After the last character in the subject, if the last character is not a word character
Between two word characters
Between two nonword characters
The empty string

Java :
- Java 4 to 6 ‹\w› matches only ASCII characters
- Java 7 ‹\w› extended matches Unicode characters if set the UNICODE_CHARACTER_CLASS flag
- All version Java ‹\b› is Unicode-enabled, supporting any script

.NET, JavaScript, PCRE, Perl, Python, and Ruby have:
- ‹\b› match between two characters where one is matched by ‹\w› and the other by ‹\W›.
- ‹\B› always matches between two characters where both are matched by ‹\w› or ‹\W›
JavaScript, PCRE, and Ruby : ‹\w› is identical to ‹[a-zA-Z0-9_]› so only “whole words only” search in language which use Latin alphabet.
.NET : treats letters and digits from all scripts as word characters. You can do a “whole words only” search on words in any language
Python 2.x: non-ASCII characters are included only if you pass the UNICODE or U flag when creating the regex.
Python 3.x: non-ASCII character are included by default, but you can exclude them with the ASCII or Aflag. This flag affects both ‹\b› and ‹\w› equally.
Perl: depends on your version of Perl and /adlu flags whether ‹\w› is pure ASCII or includes all Unicode letters, digits, and underscores.