In Regexes §

See primary documentation in context for Unicode properties.

The character classes mentioned so far are mostly for convenience; another approach is to use Unicode character properties. These come in the form <:property>, where property can be a short or long Unicode General Category name. These use pair syntax.

To match against a Unicode property you can use either smartmatch or uniprop:

Raku highlighting

"a".uniprop('Script');                 # OUTPUT: «Latin␤»
"a" ~~ / <:Script<Latin>> /;           # OUTPUT: «｢a｣␤»
"a".uniprop('Block');                  # OUTPUT: «Basic Latin␤»
"a" ~~ / <:Block('Basic Latin')> /;    # OUTPUT: «｢a｣␤»

These are the Unicode general categories used for matching:

Short	Long
L	Letter
LC	Cased_Letter
Lu	Uppercase_Letter
Ll	Lowercase_Letter
Lt	Titlecase_Letter
Lm	Modifier_Letter
Lo	Other_Letter
M	Mark
Mn	Nonspacing_Mark
Mc	Spacing_Mark
Me	Enclosing_Mark
N	Number
Nd	Decimal_Number or digit
Nl	Letter_Number
No	Other_Number
P	Punctuation or punct
Pc	Connector_Punctuation
Pd	Dash_Punctuation
Ps	Open_Punctuation
Pe	Close_Punctuation
Pi	Initial_Punctuation
Pf	Final_Punctuation
Po	Other_Punctuation
S	Symbol
Sm	Math_Symbol
Sc	Currency_Symbol
Sk	Modifier_Symbol
So	Other_Symbol
Z	Separator
Zs	Space_Separator
Zl	Line_Separator
Zp	Paragraph_Separator
C	Other
Cc	Control or cntrl
Cf	Format
Cs	Surrogate
Co	Private_Use
Cn	Unassigned

For example, <:Lu> matches a single, uppercase letter.

Its negation is this: <:!property>. So, <:!Lu> matches a single character that is not an uppercase letter.

Categories can be used together, with an infix operator:

Operator	Meaning
+	set union
\-	set difference

To match either a lowercase letter or a number, write <:Ll+:N> or <:Ll+:Number> or <+ :Lowercase_Letter + :Number>.

It's also possible to group categories and sets of categories with parentheses; for example:

Raku highlighting

say $0 if 'raku9' ~~ /\w+(<:Ll+:N>)/ # OUTPUT: «｢9｣␤»

In Regexes§

In Regexes §