In Regexes§
See primary documentation in context for Enumerated character classes and ranges
Sometimes the pre-existing wildcards and character classes are not enough. Fortunately, defining your own is fairly simple. Within <[ ]>
, you can put any number of single characters and ranges of characters (expressed with two dots between the end points), with or without whitespace.
"abacabadabacaba" ~~ / * /;# Unicode hex codepoint range"ÀÁÂÃÄÅÆ" ~~ / * /;# Unicode named codepoint range"αβγ" ~~ /*/;# Non-alphanumeric'$@%!' ~~ /+/ # OUTPUT: «「$@%!」»
As the last line above illustrates, within <[ ]>
you do not need to quote or escape most non-alphanumeric characters the way you do in regex text outside of <[ ]>
. You do, however, need to escape the much smaller set of characters that have special meaning within <[ ]>
, such as \
, [
, and ]
.
To escape characters that would have some meaning inside the <[ ]>
, precede the character with a \
.
say "[ hey ]" ~~ /+/; # OUTPUT: «「hey」»
You do not have the option of quoting special characters inside a <[ ]>
– a '
just matches a literal '
.
Within the < >
you can use +
and -
to add or remove multiple range definitions and even mix in some of the Unicode categories above. You can also write the backslashed forms for character classes between the [ ]
.
/ /;# starts with \d and removes odd ASCII digits, but not quite the same as/ /;# because the first one also contains "weird" unicodey digits
You can include Unicode properties in the list as well:
//# Any character with "Zs" property, or a tab, but not a "no-break space" or "narrow no-break space"
To negate a character class, put a -
after the opening angle bracket:
say 'no quotes' ~~ / + /; # <-["]> matches any character except "
A common pattern for parsing quote-delimited strings involves negated character classes:
say '"in quotes"' ~~ / '"' * '"'/;
This regex first matches a quote, then any characters that aren't quotes, and then a quote again. The meaning of *
and +
in the examples above are explained in the next section on quantifiers.
Just as you can use the -
for both set difference and negation of a single value, you can also explicitly put a +
in front:
/ / # same as <[123]>