[SPL] regex support

Clifford Wolf clifford at clifford.at
Mon May 2 11:49:49 CEST 2005


On Sun, May 01, 2005 at 07:14:17AM +0200, Clifford Wolf wrote:
> I'll fix the remaining issues in the next days.

done now (rev 375). I've also added a section in the language documentation
about the regex support:


Regular Expressions

SPL is using the PCRE library for regular expression matching. So it is pretty
compatible with Perl regular expression. If the PCRE library cannot be found
by the SPL makefile, SPL is compiled without regex support and a runtime error
is produced whenever the regex instructions are called.

The syntax for doing regex is also simmilar to the Perl syntax:

        x =~ /foobar/;
        x =~ s/foo/bar/g;

Perl-like modifiers supported by SPL:

        i .. ignore case in pattern matching
        s .. dot metacharater matches all characters, including newlines
        x .. ignore unescaped whitespaces and allow comments using '#'
        m .. multiline matching, ^ and $ also match newline characters
        g .. match (and substitute) globally, not only the first match

Modifiers new in SPL:

        N .. include captured strings as child nodes in result, using numbers
        P .. include named captured strings (?P<foo>...) in result, using names
        A .. add an array with an element per match (with N/P together with g)

The return value of '=~' is the number of matches found. If the 'g' modifier
isn't used, the return value may only be 0 or 1. With the modifiers 'N',
'P' and 'A', the result will also have child nodes with additional data
about the matches.

It is possible to declare names for capturing parentheses using the python
syntax (?P<name>...). This is much of a help when dealing with complex
regular expression with many capturing parentheses.

Refering to the strings matched by a regex can be done by using $N, $<Number>
and $<Name> (in addition to including them in the result value using the 'N',
'P' and 'A' modifiers. The special variable $0 represents the whole regular
expression and is also available if no capturing parentheses were present in
the regex. This special variables are declared locally - they do not
invalidate regex results in any higher context. So it is save to e.g. do a
regex, then call a function which also using regexes, and after that refer
to the matches of the first regex using this variables.

Here comes a bit complex example:

        var x = "foolish bigfoot";

        var r = x =~ /(?P<word>(?P<firstchar>\S)\S*)\s+/APg;

        foreach i (r) {
                r[i].word =~ s/foo(.*)/bar$1/;
                debug "Match #$i: [${r[i].first}] ${r[i].word} ($0)";

        debug "Ever seen a r[0].word $1?";

This script creates the following output:

        SPL Debug: Match #0: [f] barlish (foolish)
        SPL Debug: Match #1: [b] bigbart (foot)
        SPL Debug: Ever seen a barlish bigfoot?

A full description of the regular expression syntax supported by PCRE (and SPL)
can be found in the "pcrepattern" manpage.


have fun,
 - clifford

#!/usr/bin/perl -F(?#_leetmaker_v1.1_-_by_clifford_wolf_)
$_="Leetmaker v1.1\n";do{y/A-Z/a-z/;s/the/dA/g;s/s\b/z/g;
$)f)/$1ph/g;s/\s*!(!+)/ !1$1/g;s/0u/oo/g;print}while(<>);
When your hammer is C++, everything begins to look like a thumb.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.rocklinux.net/pipermail/spl/attachments/20050502/d9a6781c/attachment.bin

More information about the SPL mailing list