In Regexes§
See primary documentation in context for Unicode properties.
The character classes mentioned so far are mostly for convenience; another approach is to use Unicode character properties. These come in the form <:property>, where property can be a short or long Unicode General Category name. These use pair syntax.
To match against a Unicode property you can use either smartmatch or uniprop:
"a".uniprop('Script'); # OUTPUT: «Latin» "a" ~~ / <:Script<Latin>> /; # OUTPUT: «「a」» "a".uniprop('Block'); # OUTPUT: «Basic Latin» "a" ~~ / <:Block('Basic Latin')> /; # OUTPUT: «「a」»
These are the Unicode general categories used for matching:
| Short | Long |
|---|---|
| L | Letter |
| LC | Cased_Letter |
| Lu | Uppercase_Letter |
| Ll | Lowercase_Letter |
| Lt | Titlecase_Letter |
| Lm | Modifier_Letter |
| Lo | Other_Letter |
| M | Mark |
| Mn | Nonspacing_Mark |
| Mc | Spacing_Mark |
| Me | Enclosing_Mark |
| N | Number |
| Nd | Decimal_Number or digit |
| Nl | Letter_Number |
| No | Other_Number |
| P | Punctuation or punct |
| Pc | Connector_Punctuation |
| Pd | Dash_Punctuation |
| Ps | Open_Punctuation |
| Pe | Close_Punctuation |
| Pi | Initial_Punctuation |
| Pf | Final_Punctuation |
| Po | Other_Punctuation |
| S | Symbol |
| Sm | Math_Symbol |
| Sc | Currency_Symbol |
| Sk | Modifier_Symbol |
| So | Other_Symbol |
| Z | Separator |
| Zs | Space_Separator |
| Zl | Line_Separator |
| Zp | Paragraph_Separator |
| C | Other |
| Cc | Control or cntrl |
| Cf | Format |
| Cs | Surrogate |
| Co | Private_Use |
| Cn | Unassigned |
For example, <:Lu> matches a single, uppercase letter.
Its negation is this: <:!property>. So, <:!Lu> matches a single character that is not an uppercase letter.
Categories can be used together, with an infix operator:
| Operator | Meaning |
|---|---|
| + | set union |
| - | set difference |
To match either a lowercase letter or a number, write <:Ll+:N> or <:Ll+:Number> or <+ :Lowercase_Letter + :Number>.
It's also possible to group categories and sets of categories with parentheses; for example:
say $0 if 'raku9' ~~ /\w+(<:Ll+:N>)/ # OUTPUT: «「9」»