I stumbled over Google’s new RE engine. Unfortunately it is not handling backreferences, so it is not a drop-in replacement for the regular expressions code in FreeBSD. It has a POSIX mode, but this only seems to be enough for the egrep syntax. For people which need backreferences, they refer to the Google Chrome’s RE engine irregexp which in turn references a paper from 2007 which is titled Regular Expression Matching Can Be Simple And Fast.
The techniques in the paper can not be applied to the irregexp engine, but maybe could help to speed up awk, egrep and similar programs.
I think it would be interesting to compare those recent developments to what we have in FreeBSD, and if they are faster, to see if it is possible to improve the FreeBSD implementation based upon them (either by writing new code, or by importing existing code, depending on the corresponding license and the language the code is written in).
Maybe a candidate for the GSoC?
Tags: egrep, freebsd, google, google engine, gsoc, posix, recent developments, regular expression matching, regular expressions, syntax —