Al­gorithm to de­tect repo-​copies in CVS

FreeBSD is on its way to move from CVS to SVN  for the ver­sion con­trol sys­tem for the Ports Col­lec­tion. The de­cision was made to keep the com­plete his­tory, so the com­plete CVS re­pos­it­ory has to be con­ver­ted to SVN.

As CVS has no way to re­cord a copy or move of files in­side the re­pos­it­ory, we copied the CVS files in­side the re­pos­it­ory in case we wanted to copy or move a file (the so called “re­po­copy”). While this al­lows to see the full his­tory of a file, the draw­back is that you do not really know when a file was copied/​moved if you are not strict at re­cord­ing this in­fo af­ter do­ing a copy. Guess what, we where not.

Now with the move to SVN which has a build-​in way for copies/​moves, it would be nice if we could re­cord this in­fo. In an in­tern­al dis­cus­sion someone told its not pos­sible to de­tect a re­po­copy re­li­ably.

Well, I thought oth­er­wise and an hour later my mail went out how to de­tect one. The longest time was needed to write how to do it, not to come up with a solu­tion. I do not know if someone picked up this al­gorithm and im­ple­men­ted some­thing for the cvs2svn con­verter, but I de­cided to pub­lish the al­gorithm here if someone needs a sim­il­ar func­tion­al­ity some­where else. Note, the fol­low­ing is tailored to the struc­ture of the Ports Col­lec­tion. This al­lows to speed up some things (no need to do all steps on all files). If you want to use this in a gen­er­ic re­pos­it­ory where the struc­ture is not as reg­u­lar as in our Ports Col­lec­tion, you have to run this al­gorithm on all files.

It also de­tects com­mits where mul­tiple files where com­mit­ted at once in one com­mit (sweep­ing com­mits).


  • check only category/​name/​Make­file
  • gen­er­ate a hash of each commitlog+committer
  • if you are memory-​limited use ha/​sh/​ed/​dirs/​cvs-​rev and store path­name in the list cvs-​rev (path­name = “category-​name”) as stor­age
  • store the hash also in pathname/​cvs-​rev

If you have only one item in ha/​sh/​ed/​dirs/​cvs-​rev in the end, there was no re­po­copy and no sweep­ing com­mit, you can de­lete this ha/​sh/​ed/​dirs/​cvs-​rev.

If you have more than … let’s say … 10 (sub­ject to tun­ing) path­names in ha/​sh/​ed/​dirs/​cvs-​rev you found a sweep­ing com­mit and you can de­lete the ha/​sh/​ed/​dirs/​cvs-​rev.

The meat

The re­main­ing ha/​sh/​ed/​dirs/​cvs-​rev are prob­ably re­po­cop­ies. Take one ha/​sh/​ed/​dirs/​cvs-​rev and for each path­name (there may be more than 2 path­names) in there have a look at pathname/​. Take the first cvs-​rev of each and check if they have the same hash. Con­tin­ue with the next rev-​number for each un­til you found a cvs-​rev which does not con­tain the same hash. If the num­ber of cvs-​revs since the be­gin­ning is >= … let’s say … 3 (sub­ject to tun­ing), you have a can­did­ate for a re­po­copy. If it is >=  … 10 (sub­ject to tun­ing), you have a very good in­dic­at­or for a re­po­copy. You have to pro­ceed un­til you have only one path­name left.

You may de­tect mul­tiple re­po­cop­ies like A->B->C->D or A->B + A->D + A->C here.

Write out the re­po­copy can­did­ate to a list and de­lete the ha/​sh/​ed/​dirs/​cvs-​rev for each cvs-​rev in a de­tec­ted se­quence.

This finds re­po­copy can­did­ates for category/​name/​Makefile. To de­tect the cor­rect repocopy-​date (there are may­be cases where an­other file was changed af­ter the Make­file but be­fore the re­po­copy), you now have to look at all the files for a given repocopy-​pair and check if there is a match­ing com­mit af­ter the Makefile-​commit-​date. If you want to be 100% sure, you com­pare the com­plete commit-​history of all files for a given repocopy-​pair.


Seems I for­got to an­nounce that the linux_​base-​c6 is in the Ports Col­lec­tion now. Well, it is not a re­place­ment for the cur­rent de­fault linux base, the linuxu­lat­or in­fra­struc­ture ports are miss­ing and we need to check if the ker­nel sup­ports enough of 2.6.18 that noth­ing breaks.


To my know­ledge, nobody is work­ing on any­thing of this. Any­one is wel­come to have a look and provide patches.

New CentOS linux_​base for test­ing soon­ish

It seems my HOWTO cre­ate a new linux_​base port was not too bad. There is now a PR for a CentOS 6 based linux_​base port. I had a quick look at it and it seems that it is nearly us­able to in­clude in­to the Ports Col­lec­tion (the SRPMs need to be ad­ded, but that can be done with­in some minutes).

When FreeBSD 8.3 is re­leased and the Ports Col­lec­tion open for sweep­ing com­mits again, I will ask port­m­gr to do a repo-​copy for the new port and com­mit it. This is just the linux_​base port, not the com­plete in­fra­struc­ture which is needed to com­pletely re­place the cur­rent de­fault linuxu­lat­or user­land. This is just a start. The pro­cess of switch­ing to a more re­cent linux_​base port is a long pro­cess, and in this case de­pends upon enough sup­port in the sup­por­ted FreeBSD re­leases.

At­ten­tion: Any­one in­stalling the port from the PR should be aware that us­ing it is a highly ex­per­i­ment­al task. You need to change the linuxu­lat­or to im­per­son­ate him­self as a linux 2.6.18 ker­nel (de­scribed in the pkg-​message of the port), and the code in FreeBSD is far from sup­port­ing this. Any­one who wants to try it is wel­come, but you have to run FreeBSD-​current as of at least the last week­end, and watch out for ker­nel mes­sages about un­sup­por­ted sy­scalls. Re­ports to emulation@​FreeBSD.​org please, not here on the webpage.

HOWTO add linux-​infrastructure ports for a new linux_​base port

In my last blog-​post I de­scribed how to cre­ate a new linux_​base port. This blog-​post is about the oth­er Linux-ports which make up the Linux–in­fra­struc­ture in the FreeBSD Ports Col­lec­tion for a given Linux-​release.

What are linux-​infrastructure ports?

A linux_​base port con­tains as much as pos­sible and at the same time as little as pos­sible to make up a use­ful Linux-​compatibility-​experience in FreeBSD. I know, this is not a de­script­ive ex­plan­a­tion. And it is not on pur­pose. There are no fixed rules what has to be in­side or what not. It “ma­tured” in­to the cur­rent shape. A prac­tic­al ex­ample is, that there is no GUI–stuff in the linux_​base. While you need the GUI parts like GTK or QT for soft­ware like Skype and acror­ead, you do not need them for head­less game servers. While you may need vari­ous lib­rar­ies for game servers, you may not need those for Skype or acror­ead. As such some stand­ard parts are in sep­ar­ate ports which are named linux–LINUX_​DIST_​SUFFIX-NAME. For GTK and the Fe­dora 10 re­lease this res­ults in linux-​f10-​gtk2. Such gen­er­ic ports which de­pend upon a spe­cific Linux-​release make up the Linux-​infrastructure in the FreeBSD Ports Col­lec­tion. Those ports are ref­er­enced in port-​Makefiles via the USE_​LINUX_​APPS vari­able, e.g. USE_LINUX_APPS=gtk2.

If you cre­ated a new linux_​base port, you need most stand­ard in­fra­struc­ture ports in a ver­sion for the Linux-​release used in the linux_​base port, to have the Linux-​application ports in the FreeBSD Ports Col­lec­tion work­ing (if you are un­lucky, some ports do not play well with the Linux-​release you have chosen, but this is out of the scope of this HOWTO).

Up­dat­ing Mk/

 First we need to set the LINUX_​DIST_​SUFFIX vari­able to a value suit­able to the new Linux-​release. This is done in the con­di­tion­al which checks the OVERRIDE_​LINUX_​NONBASE_​PORTS vari­able for val­id val­ues. Add an ap­pro­pri­ate con­di­tion­al, and do not for­get to add the new val­id value to the IGNORE line in the last else branch of the con­di­tion­al.

The next step is to check the _​LINUX_​APPS_​ALL and _​LINUX_​26_​APPS vari­ables. If there are some in­fra­struc­ture ports which are not avail­able for the new Linux-​release, the con­di­tion­al which checks the avail­ab­il­ity of a given in­fra­struc­ture port for a given Linux-​release needs to be mod­i­fied. If at a later step you no­tice that there are some ad­di­tion­al in­fra­struc­ture ports ne­ces­sary for the new Linux-​release, _​LINUX_​APPS_​ALL and the check-​logic needs to be mod­i­fied too (e.g. add a new vari­able for your Linux-​release, add the con­tent of the vari­able to _​LINUX_​APPS_​ALL, and change the check to do the right thing).

Af­ter that two te­di­ous parts need to be done.

For each in­fra­struc­ture port there is a set of vari­ables. The name_​PORT vari­able con­tains the loc­a­tion of the port in the Ports Col­lec­tion. Typ­ic­ally you do not have to change it (if you really want to change it, do not do it, fix the nam­ing of the in­fra­struc­ture port in­stead), be­cause we use a nam­ing con­ven­tion here which in­cludes the LINUX_​DIST_​SUFFIX. The name_​DETECT vari­able is an in­tern­al vari­able, do not change it (if you cre­ate a new in­fra­struc­ture port, copy it from some­where else and make sure the name in value of the vari­able matches the port name in the name of the vari­able). Then there are sev­er­al name_​suf­fix_​FILE vari­ables. Leave the ex­ist­ing ones alone, and add a new one with the cor­rect suf­fix for your new Linux-​release. The value of the vari­able needs to be an im­port­ant file which is in­stalled by the in­fra­struc­ture port in ques­tion. FYI: The con­tent of the name_​suf­fix_​FILE vari­ables are used to set the name_​DETECT vari­ables, de­pend­ing on the Linux-​relase the name_​DETECT vari­ables are used to check if the port is already in­stalled. Ideally the name_​suf­fix_​FILE vari­able points to a lib­rary in the port. The name_​DEPENDS vari­able lists de­pend­en­cies of this in­fra­struc­ture port. If the de­pend­en­cies changed in your Linux-​release, you need to add a con­di­tion­al to change the de­pend­ency if LINUX_​DIST_​SUFFIX is set to your Linux-​release.

Nor­mally this is all what needs to be done in PORTSDIR/Mk/, the rest of the file is code to check de­pend­en­cies and some cor­rect­ness checks.

The second te­di­ous part is to ac­tu­ally cre­ate all those in­fra­struc­ture ports. Nor­mally you can copy an ex­ist­ing in­fra­struc­ture port, re­name it, ad­just the PORTNAME, PORTVERSION, PORTREVISION, MASTER_​SITES, PKGNAMEPREFIX, DISTFILES, CONFLICTS (also in all oth­er Linux-​release ver­sions of this in­fra­struc­ture port), LINUX_​DIST_​VER, RPMVERSION (if set/​neccesary) and SRC_​DISTFILE vari­ables, gen­er­ate the dist­file check­sums (make make­sum), and fix the plist. I sug­gest to script parts of this work (as of this writ­ing Fresh­ports counts 68 ports where the port­name starts with linux-​f10-).

Adding new in­fra­struc­ture ports, or re­mov­ing in­fra­struc­ture ports for a given Linux-​release

If your Linux-​release does not come with a pack­age for an ex­ist­ing in­fra­struc­ture port, just do not cre­ate a cor­res­pond­ing name_​suf­fix_​FILE line. You still need to do the right thing re­gard­ing de­pend­en­cies of ports which de­pend upon this non-​existing in­fra­struc­ture port (if your Linux-​release comes with pack­ages for them).

To add a new in­fra­struc­ture port, copy an ex­ist­ing block, re­name the vari­ables, set them cor­rectly, add a new vari­able for your Linux-​release in the first _​LINUX_​APPS_​ALL sec­tion, add the con­tent of this vari­able to _​LINUX_​APPS_​ALL, and change the check-​logic as de­scribed above.

Fi­nal words

If you have some­thing which in­stalls and dein­stalls cor­rectly, feel free to provide it on freebsd-​emulation@​FreeBSD.​org for re­view/​testing. If you have ques­tions dur­ing the port­ing, feel also free to send a mail there.

HOWTO cre­ate a new linux_​base port

FreeBSD is in need of a new linux_​base port. It is on my TODO list since a long time, but I do not get the time to cre­ate one. I still do not have the time to work on a new one, but when you read this, I man­aged to get the time to cre­ate a HOWTO which de­scribes what needs to be done to cre­ate a new linux_​base port.

I will not de­scribe how to cre­ate a new linux_​base port from scratch, I will just de­scribe how you can copy the last one and up­date it to some­thing new­er based upon the ex­ist­ing in­fra­struc­ture for RPM pack­ages.

Spe­cific ques­tions which come up dur­ing port­ing a new Linux re­lease should be asked on freebsd-​emulation@​FreeBSD.​org,  there are more people which can an­swer ques­tions than here in my blog. I will add use­ful in­form­a­tion to this HOWTO if ne­ces­sary.

In the easy case most of the work is search­ing the right RPMs and their de­pend­en­cies to use, and to cre­ate the plist.

Why do we need a new linux_​base port?

The cur­rent linux_​base port is based upon Fe­dora 10, which is end of life since Decem­ber 2009. Even Fe­dora 13 is already end of life. Fe­dora 16 is sup­posed to be re­leased this year. From a sup­port point of view, Fe­dora 15 or may­be even Fe­dora 16 would be a good tar­get for the next linux_​base port. Oth­er al­tern­at­ives would be to use an ex­ten­ded life­time re­lease of an­other RPM based dis­tri­bu­tion, like for ex­ample CentOS 6 (which seems to be based upon Fe­dora 12 with back­ports from Fe­dora 13 and 14). Us­ing a Linux re­lease which is told to be sup­por­ted for at least 10 years, sounds nice from a FreeBSD point of view (only minor changes to the linux ports in such a case, in­stead of cre­at­ing a com­plete new linux_​base each N+2 re­leases like with Fe­dora), but it also means ad­di­tion­al work if you want to cre­ate the first linux_​base port for it.

The mys­ter­ies you have to con­quer if you want to cre­ate a new linux_​base port

What we do not know is, if Fe­dora 15/​16, CentOS 6, or any oth­er Linux re­lease will work in a sup­por­ted FreeBSD re­lease. There are two ways to find this out.

The first one is to take an ex­ist­ing Linux sys­tem, ch­root in­to it (either via NFS or af­ter mak­ing a copy in­to a dir­ect­ory of a FreeBSD sys­tem), and to run a lot of pro­grams (acror­ead, skype, shells, scripts, …). The LTP test­suite is not that much use­ful here, as it will test mostly ker­nel fea­tures, but we do not know which ker­nel fea­tures are man­dat­ory for a given user­land of a Linux re­lease.

The second way of test­ing if a given Linux re­lease works on FreeBSD is to ac­tu­ally cre­ate a new linux_​base port for it and test it without ch­root­ing.

The first way is faster, if you are only in­ter­ested in test­ing if some­thing works. The second way provides an easy to setup test­bed for FreeBSD ker­nel de­velopers to fix the Linuxu­lat­or so that it works with the new linux_​base port. Both ways have their mer­its, but it is up to the per­son do­ing the work to de­cide which way to go.

The meat: HOWTO cre­ate a new linux_​base port

First off, you need a sys­tem (or a jail) without any linux_​base port in­stalled. Af­ter that you can cre­ate a new linux_​base port (= lbN), by just mak­ing a copy of the latest one (= lbO). In lbN you need to add lbO as a CONFLICT, and in all oth­er ex­ist­ing linux_​base ports, you need to add lbN as a con­flict.

Change the PORTNAME, PORTVERSION, re­set the PORTREVISION in lbN, and set LINUX_​DIST_​VER  to the new Linux-​release ver­sion in the lbN Make­file (this is used in PORTSDIR/Mk/ and PORTSDIR/Mk/

If you do not stay with Fe­dora, there is some more work to do be­fore you can have a look at chos­ing RPMs for in­stall­a­tion. You need to have a look at PORTSDIR/Mk/ and add some cases for the new LINUX_​DIST you want to use. Do not for­get to set LINUX_​DIST in the lbN Make­file to the name of the dis­tri­bu­tion you use. You also need to aug­ment the LINUX_​DIST_​VER check in PORTSDIR/Mk/ with some LINUX_​DIST con­di­tion­als. If you are lucky, the dir­ect­ory struc­ture for down­loads is sim­il­ar to the Fe­dora struc­ture, and there is not a lot to do here.

When this is done, you can have a look at the BIN_​DISTFILES vari­able in the lbN Make­file. Try to find sim­il­ar RPMs for the new Linux re­lease you want to port. Some may not be avail­able, and it may also be the case that dif­fer­ent ones are needed in­stead. I sug­gest to first work with the ones which are avail­able (make make­sum, test in­stall and cre­ate plist). Af­ter that you need to find out what the re­place­ment RPMs for non-​existing ones are. You are on your own here. Search around the net, and/​or have a look at the de­pend­en­cies in the RPMs of lbO to de­term­ine if some­thing was ad­ded as a de­pend­ency of some­thing else or not (if not, for­get about it ATM). When you man­aged to find re­place­ment RPMs, you can now have a look at the de­pend­en­cies of the RPMs in lbN. Do not add blindly all de­pend­en­cies, not all are needed in FreeBSD (the linux_​base ports are not sup­posed to cre­ate an en­vir­on­ment which you can ch­root in­to, they are sup­posed to aug­ment the FreeBSD sys­tem to be able to run Linux pro­grams in ports like they where FreeBSD nat­ive pro­grams). What you need in the linux_​base ports are lib­rar­ies, con­fig and data files which do not ex­ist in FreeBSD or have a dif­fer­ent syn­tax than in FreeBSD (those con­fig or data files which are just in a dif­fer­ent place, can be sym­linked), and ba­sic shell com­mands (which com­mands are needed or not… well… good ques­tion, in the past we made de­cisions what to in­clude based upon prob­lem re­ports from users). Now for the things which are not avail­able and where not ad­ded as a de­pend­ency. Those are things which are either used dur­ing in­stall, or where use­ful to have in the past. Find out by what it was re­placed and have a look if this re­place­ment can eas­ily be used in­stead. If it can be used, add it. If not, well… bad luck, we (the FreeBSD com­munity) will see how to handle this some­how.

If you think that you have all you need in BIN_​DISTFILES, please up­date SRC_​DISTFILES ac­cord­ingly and gen­er­ate the dist­file via  make –DPACKAGE_​BUILDING make­sum to have the check­sums of the sources (for leg­al reas­ons we need them on our mir­rors).

The next step is to have a look at REMOVE_​DIRS, REMOVE_​FILES and ADD_​DIRS if some­thing needs to be mod­i­fied. Most of them are there to fall back to the cor­res­pond­ing FreeBSD directories/​files, or be­cause they are not needed at all (REMOVE_​*). Do not re­move dir­ect­or­ies from ADD_​DIRS, they are cre­ated here to fix some edge con­di­tions (I do not re­mem­ber ex­actly why we had to add them, and I do not take the time ATM to search in the CVS his­tory).

If you are lucky, this is all (make sure the plist is cor­rect). If you are not lucky and you need to make some modi­fic­a­tions to files, have a look at the do-​build tar­get in the Make­file, this is the place where some changes are done to cre­ate a nice user ex­per­i­ence.

If you ar­rive here while cre­at­ing a new linux_​base port, lean back and feel a bit proud. You man­aged to cre­ate a new linux_​base port. It is not very well tested at this mo­ment, and it is far from everything which needs to be done to have the com­plete Linux in­fra­struc­ture for a given Linux re­lease, but the most im­port­ant part is done. Please no­ti­fy freebsd-​emulation@​FreeBSD.​org and call for test­ers.

What is miss­ing?

The full Linuxu­lat­or in­fra­struc­ture for the FreeBSD Ports Col­lec­tion has some more ports around a linux_​base port. Most of the in­fra­struc­ture for this is handled in Mk/

UPDATE: I got some time to write how to up­date the Linux-​infrastructure ports.