A new Regular Expression API (added 6/10/04)

A new fast, Perl-compatible regular expression matcher is now available as a patch to 6.2 (it is included with the Allegro CL 7.0 distribution). The following bullets describe some of the features of the new matcher.

  • The new matcher uses Perl syntax, and is nearly feature compatible with Perl. Only a few obscure features have not been implemented.
  • Speed: the new matcher was designed to be fast. On the CL-PPCRE test suite (with 1600+ tests) it is on average 30% faster than Perl.
  • Regular expressions can use the Unicode character set (UCS-2).
  • Named capture for submatches. For example, (match-re "(?<foo>ab)\\k<foo>\\k<foo>" "aabababa") returns the values t, "ababab", and "ab".

The new regular expression module, :regexp2, is modeled on the original regular expression API, using similar symbol names (thus, for example, old: compile-regexp, new compile-re). The symbols naming functions in the new API are in the excl package, just as those in original API are. Both modules can be loaded into the same running Lisp and used independently.

Even though the functions take the same keyword arguments, the regular expression syntax is very different. Here are some examples:

New and old:

  (match-regexp "foo" "frob foo bar") => t, "foo"
  (match-regexp "foo[0-9]+" "foo1234xxx") => t, "foo1234"

  (match-re "foo" "frob foo bar") => t, "foo"
  (match-re "foo[0-9]+" "foo1234xxx") => t, "foo1234"

New:

  (match-re "(a|b)c" "ac") => t, "ac", "a"

Old:

  (match-regexp "\\(a\\|b\\)c" "ac") => t, "ac", "a"

New:

  (match-re "\\bfoo" "the foo") => t, " foo"
  (match-re "\\bfoo" "thefoo") => nil

Old:

  [Not supported]

New:

  (match-re "(a{1,2})ab" "aab") => t, "aab", "a"
  (match-re "(a{1,2})ab" "aaab") => t, "aaab", "aa"

Old:

  [Not supported]

See The new regexp2 module in regexp.htm for more details on the new regexp API.

Copyright © 2023 Franz Inc., All Rights Reserved | Privacy Statement Twitter