Google’s new RE engine

I stum­bled over Google’s new RE engine. Unfor­tu­nate­ly it is not han­dling back­ref­er­ences, so it is not a drop-in replace­ment for the reg­u­lar expres­sions code in FreeB­SD. It has a POSIX mode, but this only seems to be enough for the egrep syn­tax. For peo­ple which need back­ref­er­ences, they refer to the Google Chrome’s RE engine irreg­exp which in turn ref­er­ences a paper from 2007 which is titled Reg­u­lar Expres­sion Match­ing Can Be Sim­ple And Fast.

The tech­niques in the paper can not be applied to the irreg­exp engine, but maybe could help to speed up awk, egrep and sim­i­lar programs.

I think it would be inter­est­ing to com­pare those recent devel­op­ments to what we have in FreeB­SD, and if they are faster, to see if it is pos­si­ble to improve the FreeB­SD imple­men­ta­tion based upon them (either by writ­ing new code, or by import­ing exist­ing code, depend­ing on the cor­re­spond­ing license and the lan­guage the code is writ­ten in).

Maybe a can­di­date for the GSoC?