Thursday, November 10, 2016

Bedrock & Regular Expressions

Regular expressions are a powerful tool that every programmer should be familiar with.  Perl programmers are typically well versed in using regular expressions but you can find many tools and languages that support regular expressions.   Even Bedrock!


There are several functions within Bedrock that support regular expressions.  In this blog we'll discuss using regular expressions in Bedrock.

<if>

The most obvious place you might want to use a regular expression is when using an <if/else> block.  The --re operator allow you to provide a regular expression on the right hand side of the expression.  The regular expression can be a simple string or a more complex quoted regular expressions (qr) .

<if $input.foo --re "^bar">buz</if>

The expression above will match all strings that start with "bar".  Note that you do not enclose the regular expression using any delimiters in this form.  You can also add other regexp specifiers.

<if $input.foo --re "^bar[kf]er">buz</if>

...would match strings starting with "barker" or "barfer".  We could also write that as:

<if $input.foo --re "^(barker|barfer)">buz</if>

If you'd like to add options to the regular expression you are matching, use the complex regular expression form using Perl's quoted regular expression syntax.

qr/STRING/msixpo

This operator quotes (and possibly compiles) its STRING as a regular expression. STRING is interpolated the same way as PATTERN in "m/PATTERN/".  If "’" is used as the delimiter, no interpolation is done.  Returns a Perl value which may be used instead of the corresponding "/STRING/msixpo" expression. The returned value is a normalized version of the original pattern. It magically differs from a string containing the same characters: "ref(qr/x/)" returns "Regexp", even though dereferencing the result returns undef.

<if $input.foo --re "qr/^(barker|barfer)/i">buz</if>

...would match any string that started with "barker" or "barfer" regardless of case.

Capture Groups

Bedrock supports named capture groups that allow you to store the matched string in Bedrock scalars.
In our example, let's capture the matched string in a variable named "match".

<if $input.foo --re "qr/^(?<match\>barker|barfer)/i">buz</if>

Note that you need to escape the '>' character in the variable name specifier so Bedrock's parser does not get tripped up.  Alternately you can use single quotes around the capture name.

<if $input.foo --re "qr/^(?'match'barker|barfer)/i">buz</if>

Let's try to validate a telephone number. In the US a telephone number sorta/kinda looks like this:

(xxx) xxx-xxxx

...where x is a digit between 0 and 9.  Let's say we'll accept a number without the area code and without any decorations regardless of whitespace.  That means we'll accept:

xxx-xxxx
xxx xxxx
xxxxxxx
xxxxxxxxxx
xxx xxx-xxxx
(xxx) xxx-xxxx

Here's our regular expression:

<null:regexp 'qr/\\s*(\\(?\\d{3}\\)?)?\\s*\\d{3}\\s*\-?\\s*\\d{4}\\s*/'>

Note we need to escape the back slash ('\') character.

Our phone number requirements are getting a little more complex.  It turns out you can't start an area code with a '0' or a '1', so let's modify our regexp slightly.

<null:regexp 'qr/\\s*(\\(?[2-9]\\d{2}\\)?)?\\s*\\d{3}\\s*\-?\\s*\\d{4}\\s*/'>

Nice!  Well, almost.  Nothing is ever easy, eh?  The last two digits of an area code cannot both be '1' to avoid confusion with numbers like 911.  We can use a conditional expression in the regexp to make sure we don't have the pattern N11 in the area code. 

<null:regexp 'qr/^\\s*(\\(?[2-9]((?=1)\\d[02-9]|[02-9]\\d)\\)?)?\\s*\\d{3}\\s*-?\\s*\\d{4}\\s*$/'>

That's sort of ugly so let's use the <sink> tag to define our regular expression and avoid having to escape back slashes in Bedrock strings.

<sink:regexp>qr/^\s*(\(?[2-9]((?=1)\d[02-9]|[02-9]\d)\)?)?\s*\d{3}\s*-?\s*\d{4}\s*$/</sink>

<if $input.phone --re $regexp>Good!<else>Bad!</if>

<catch>

When implementing a try/catch block in Bedrock the argument to the <catch> tag is a regular expression.  This allows you to match the exception to some regular expression in order to provide some kind of exception handling.  

<catch "bad parameter">

Again, you can use simple strings or complex regular expressions, including capture groups.  If you use a simple string in a  <catch> tag it is matched ignoring case.  To more precisely specify the match string, use a quoted regexp.

<try>
  <raise "bad mojo">
<catch "qr/^bad\\s*(?'what'.*?)$/">
  You have a bad something: bad <var $what>!
</try>

grep()

The grep() method of arrays and hashes allows you to use regular expressions to match on the values of arrays and hashes.

<hash:foo foo bar baz buz fiz fuz>
<foreach $foo.grep('qr/.*z$/i')><var $_>
</foreach>

Regular expressions are powerful addition to your toolbox.  For more information about regular expressions visit the Perl regular expression documentation page.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.