In Regexes§
See primary documentation in context for Sigspace.
The :sigspace or :s adverb changes the behavior of unquoted whitespace in a regex.
Without :sigspace, unquoted whitespace in a regex is generally ignored, to make regexes more readable by programmers. When :sigspace is present, unquoted whitespace may be converted into <.ws> subrule calls depending on where it occurs in the regex.
say so "I used Photoshop®" ~~ m:i/ photo shop /; # OUTPUT: «True» say so "I used a photo shop" ~~ m:i:s/ photo shop /; # OUTPUT: «True» say so "I used Photoshop®" ~~ m:i:s/ photo shop /; # OUTPUT: «False»
m:s/ photo shop / acts the same as m/ photo <.ws> shop <.ws> /. By default, <.ws> makes sure that words are separated, so a b and ^& will match <.ws> in the middle, but ab won't:
say so "ab" ~~ m:s/a <.ws> b/; # OUTPUT: «False» say so "a b" ~~ m:s/a <.ws> b/; # OUTPUT: «True» say so "^&" ~~ m:s/'^' <.ws> '&'/; # OUTPUT: «True»
The third line is matched, because ^& is not a word. For more clarification on how <.ws> rule works, refer to WS rule description.
Where whitespace in a regex turns into <.ws> depends on what comes before the whitespace. In the above example, whitespace in the beginning of a regex doesn't turn into <.ws>, but whitespace after characters does. In general, the rule is that if a term might match something, whitespace after it will turn into <.ws>.
In addition, if whitespace comes after a term but before a quantifier (+, *, or ?), <.ws> will be matched after every match of the term. So, foo + becomes [ foo <.ws> ]+. On the other hand, whitespace after a quantifier acts as normal significant whitespace; e.g., "foo+ " becomes foo+ <.ws>. On the other hand, whitespace between a quantifier and the % or %% quantifier modifier is not significant. Thus foo+ % , does not become foo+ <.ws>% , (which would be invalid anyway); instead, neither of the spaces are significant.
In all, this code:
rx :s { ^^ { say "No sigspace after this"; } <.assertion_and_then_ws> characters_with_ws_after+ ws_separated_characters * [ | some "stuff" .. . | $$ ] :my $foo = "no ws after this"; $foo }
Becomes:
rx { ^^ <.ws> { say "No space after this"; } <.assertion_and_then_ws> <.ws> characters_with_ws_after+ <.ws> [ws_separated_characters <.ws>]* <.ws> [ | some <.ws> "stuff" <.ws> .. <.ws> . <.ws> | $$ <.ws> ] <.ws> :my $foo = "no ws after this"; $foo <.ws> }
If a regex is declared with the rule keyword, both the :sigspace and :ratchet adverbs are implied.
Grammars provide an easy way to override what <.ws> matches:
grammar Demo { token ws { <!ww> # only match when not within a word \h* # only match horizontal whitespace } rule TOP { # called by Demo.parse; a b '.' } } # doesn't parse, whitespace required between a and b say so Demo.parse("ab."); # OUTPUT: «False» say so Demo.parse("a b."); # OUTPUT: «True» say so Demo.parse("a\tb ."); # OUTPUT: «True» # \n is vertical whitespace, so no match say so Demo.parse("a\tb\n."); # OUTPUT: «False»
When parsing file formats where some whitespace (for example, vertical whitespace) is significant, it's advisable to override ws.