I stumbled over Google’s new RE engine. Unfortunately it is not handling backreferences, so it is not a drop-in replacement for the regular expressions code in FreeBSD. It has a POSIX mode, but this only seems to be enough for the egrep syntax. For people which need backreferences, they refer to the Google Chrome’s RE engine irregexp which in turn references a paper from 2007 which is titled Regular Expression Matching Can Be Simple And Fast.
The techniques in the paper can not be applied to the irregexp engine, but maybe could help to speed up awk, egrep and similar programs.
I think it would be interesting to compare those recent developments to what we have in FreeBSD, and if they are faster, to see if it is possible to improve the FreeBSD implementation based upon them (either by writing new code, or by importing existing code, depending on the corresponding license and the language the code is written in).
Maybe a candidate for the GSoC?
I still have the goal of importing PCRE in the FreeBSD libc…
Sounds like a nice project for gsoc
I think for libc TRE is the way to go. But still, PCRE is a good candidate for base system so that tools, like grep can have Perl-syntax support, as well. And this work could serve to review TRE algorithms and maybe improve them. TRE is also BSD-licensed and very well documented in the author’s MSc thesis.
That would be an interesting GSoC project. You should add this to the FreeBSD ideas page.
Or I do not do this and have a look if someone is reading the FreeBSD blogs. Some kind of experiment to see if people interested in the GSoC look further into FreeBSD than just following the link from Google to the FreeBSD ideas page.