In Regexes §

See primary documentation in context for Sigspace.

The :sigspace or :s adverb changes the behavior of unquoted whitespace in a regex.

Without :sigspace, unquoted whitespace in a regex is generally ignored, to make regexes more readable by programmers. When :sigspace is present, unquoted whitespace may be converted into <.ws> subrule calls depending on where it occurs in the regex.

Raku highlighting

say so "I used Photoshop®"   ~~ m:i/   photo shop /;  # OUTPUT: «True␤»
say so "I used a photo shop" ~~ m:i:s/ photo shop /;  # OUTPUT: «True␤»
say so "I used Photoshop®"   ~~ m:i:s/ photo shop /;  # OUTPUT: «False␤»

m:s/ photo shop / acts the same as m/ photo <.ws> shop <.ws> /. By default, <.ws> makes sure that words are separated, so a b and ^& will match <.ws> in the middle, but ab won't:

Raku highlighting

say so "ab" ~~ m:s/a <.ws> b/;     # OUTPUT: «False␤»
say so "a b" ~~ m:s/a <.ws> b/;    # OUTPUT: «True␤»
say so "^&" ~~ m:s/'^' <.ws> '&'/; # OUTPUT: «True␤»

The third line is matched, because ^& is not a word. For more clarification on how <.ws> rule works, refer to WS rule description.

Where whitespace in a regex turns into <.ws> depends on what comes before the whitespace. In the above example, whitespace in the beginning of a regex doesn't turn into <.ws>, but whitespace after characters does. In general, the rule is that if a term might match something, whitespace after it will turn into <.ws>.

In addition, if whitespace comes after a term but before a quantifier (+, *, or ?), <.ws> will be matched after every match of the term. So, foo + becomes [ foo <.ws> ]+. On the other hand, whitespace after a quantifier acts as normal significant whitespace; e.g., "foo+ " becomes foo+ <.ws>. On the other hand, whitespace between a quantifier and the % or %% quantifier modifier is not significant. Thus foo+ % , does not become foo+ <.ws>% , (which would be invalid anyway); instead, neither of the spaces are significant.

In all, this code:

Raku highlighting

rx :s {
    ^^
    {
        say "No sigspace after this";
    }
    <.assertion_and_then_ws>
    characters_with_ws_after+
    ws_separated_characters *
    [
    | some "stuff" .. .
    | $$
    ]
    :my $foo = "no ws after this";
    $foo
}

Becomes:

Raku highlighting

rx {
    ^^ <.ws>
    {
        say "No space after this";
    }
    <.assertion_and_then_ws> <.ws>
    characters_with_ws_after+ <.ws>
    [ws_separated_characters <.ws>]* <.ws>
    [
    | some <.ws> "stuff" <.ws> .. <.ws> . <.ws>
    | $$ <.ws>
    ] <.ws>
    :my $foo = "no ws after this";
    $foo <.ws>
}

If a regex is declared with the rule keyword, both the :sigspace and :ratchet adverbs are implied.

Grammars provide an easy way to override what <.ws> matches:

Raku highlighting

grammar Demo {
    token ws {
        <!ww>       # only match when not within a word
        \h*         # only match horizontal whitespace
    }
    rule TOP {      # called by Demo.parse;
        a b '.'
    }
}

# doesn't parse, whitespace required between a and b
say so Demo.parse("ab.");                 # OUTPUT: «False␤»
say so Demo.parse("a b.");                # OUTPUT: «True␤»
say so Demo.parse("a\tb .");              # OUTPUT: «True␤»

# \n is vertical whitespace, so no match
say so Demo.parse("a\tb\n.");             # OUTPUT: «False␤»

When parsing file formats where some whitespace (for example, vertical whitespace) is significant, it's advisable to override ws.

In Regexes§

In Regexes §