Regexp2 Tutorial

Go to the tutorial main page.

The regexp2 module is Allegro CL's fast, Perl-compatible regular expression module. It is documented in regexp.htm.

You can take most Perl regular expressions and use them unaltered in Allegro CL. The only difference between Perl and Lisp has to do with the Lisp reader: you must double a backslash in a string to get a backslash itself in the string. That is, "\\a" is a string of two characters, a backslash (\) followed by the letter `a'. The following is a Perl regular expression converted to ACL:

$_ =~ /^(\/\S+)\/(.*)/;         # Perl

(match-re "^(/\\S+)/(.*)" ...)  ; Lisp

First, the / does not have to be escaped in Lisp since it's not a special character like it is in Perl (in the context above). Second, the \S+ in Perl must be \\S+ in Lisp because the Lisp regexp is in a string and \\ means that one backslash will get into the string, and \S+ means the same in ACL as it does in Perl. (Regexp2 operators are typically named operation-re or re-operation. match-re is documented here.)

Quick Start

While the API functions for this module are autoloaded (i.e. loaded when a relevant function is called), it is best to load the module when you know you will need it.

cl-user(2): (require :regexp2)
; Fast loading /acl/acl70/src/code/regexp2.fasl
;;; Installing regexp2 patch, version 1
;   Fast loading /acl/acl70/src/code/yacc.fasl
t
cl-user(3): 

In your source files, you would add the following:

(eval-when (compile eval load) (require :regexp2))

Now, you are ready to use it:

cl-user(3): (match-re "\\s+(\\S+)\\s+" "  foo the bar  ")
t
"  foo "
"foo"
cl-user(4): 

match-re either returns the single value nil, meaning no match, or two or more values, indicating match(es), the first value being t. In this case, three values are returned. they are:

  1. t indicating the regular expression matched the string.
  2. The portion of the string that matched.
  3. The group matches. The third (and possibly subsequent return values) are for groups -- the matches for items surrounded by ( and ) in the regular expression. In this case, the match was \\S+ (one or more non-space characters).

Another example is:

cl-user(11): (match-re "\\s+(?i:bar)(\\S+)\\s+" "  foo the BARbam  ")
t
" BARbam  "
"bam"
cl-user(12): 

The addition in this example is using (?i:bar). This construct is the same in Allegro CL as it is in Perl. It causes the match of bar to be case insensitive. By default case is important in Perl and Allegro CL regular expressions.

Sometimes you need to match a newline in the middle of a string. This is done with the :single-line keyword, or the s matching mode:

cl-user(18): (match-re "foo.*bar" "foo
bar")
nil
cl-user(19): (match-re "foo(?s:.*)bar" "foo
bar")
t
"foo
bar"
cl-user(20): (match-re "foo.*bar" "foo
bar"
                       :single-line t)
t
"foo
bar"
cl-user(21): 

The first form failed because '.' (i.e. a period) does not by default match newlines. The second and third forms are equivalent, and the choice of which to use is a stylistic one.

Replacement of portions of a string are also a common use of the regexp2 API:

cl-user(22): (replace-re "Foo the bar, baz the baaaar."
			 "ba+r"
			 "BAR")
"Foo the BAR, baz the BAR."
cl-user(23): 

This replaces b, followed by one or more a's, followed by an r, with the string "BAR".

Strings can also be split according to a regular expression:

cl-user(27): (split-re "[;,]\\s+"
		       "foo, bar, baz,  bam; wham, foop; fop")
("foo" "bar" "baz" "bam" "wham" "foop" "fop")
cl-user(28): 

The macros re-lambda, re-let.htm and re-case have no Perl analog:

cl-user(4): (setq f
	      (re-lambda "([^ ]+) ([^ ]+) ([^ ]+)"
		  ((foo 1) (bar 2) (baz 3))
		(list foo bar baz)))
#<Interpreted Function (unnamed) @ #x71ed7892>
cl-user(5): (funcall f "foo the bar")
("foo" "the" "bar")
cl-user(6): (re-let "([^ ]+) ([^ ]+) ([^ ]+)"
		"foo the bar"
		((foo 1) (bar 2) (baz 3))
	      (list foo bar baz))
("foo" "the" "bar")
cl-user(7): 


cl-user(9): (re-case "foo the barmy"
	      ("foo a (.*)" ((it 1)) (list it))
	      ("foo the (.*)" ((it 1)) (list it))
	      (t :no-match))
("barmy")
cl-user(10): (re-case "foo a barmy"
	      ("foo a (.*)" ((it 1)) (list it))
	      ("foo the (.*)" ((it 1)) (list it))
	      (t :no-match))
("barmy")
cl-user(11): (re-case "foo xx barmy"
	      ("foo a (.*)" ((it 1)) (list it))
	      ("foo the (.*)" ((it 1)) (list it))
	      (t :no-match))
:no-match
cl-user(12): 

Efficiency

Consider the following non-constant use of a regular expressions, from match-re's point of view:

(let ((re "..."))
  (match-re re ...))

At compile time, the regular expression cannot really be compiled to something more efficient. It is best to write the above code like this:

(let ((re (load-time-value (compile-re "..."))))
  (match-re re ...))

This makes sure that the regular expression will be compiled once.

Again, the documentation is in regexp.htm. Both the old and the new Regexp implementations are supported in Allegro CL (it is the new one we have been describing) and both are documented in regexp.htm. The newer Regexp2 module is described at The new regexp2 module in that document.

Go to the tutorial main page.