Algo­rithm to detect repo-copies in CVS

FreeB­SD is on its way to move from CVS to SVN  for the ver­sion con­trol sys­tem for the Ports Col­lec­tion. The deci­sion was made to keep the com­plete his­to­ry, so the com­plete CVS repos­i­to­ry has to be con­vert­ed to SVN.

As CVS has no way to record a copy or move of files inside the repos­i­to­ry, we copied the CVS files inside the repos­i­to­ry in case we want­ed to copy or move a file (the so called “repocopy”). While this allows to see the full his­to­ry of a file, the draw­back is that you do not real­ly know when a file was copied/moved if you are not strict at record­ing this info after doing a copy. Guess what, we where not.

Now with the move to SVN which has a build-in way for copies/moves, it would be nice if we could record this info. In an inter­nal dis­cus­sion some­one told its not pos­si­ble to detect a repocopy reliably.

Well, I thought oth­er­wise and an hour lat­er my mail went out how to detect one. The longest time was need­ed to write how to do it, not to come up with a solu­tion. I do not know if some­one picked up this algo­rithm and imple­ment­ed some­thing for the cvs2svn con­vert­er, but I decid­ed to pub­lish the algo­rithm here if some­one needs a sim­i­lar func­tion­al­i­ty some­where else. Note, the fol­low­ing is tai­lored to the struc­ture of the Ports Col­lec­tion. This allows to speed up some things (no need to do all steps on all files). If you want to use this in a gener­ic repos­i­to­ry where the struc­ture is not as reg­u­lar as in our Ports Col­lec­tion, you have to run this algo­rithm on all files.

It also detects com­mits where mul­ti­ple files where com­mit­ted at once in one com­mit (sweep­ing commits).

Prepa­ra­tion

  • check only category/name/Makefile
  • gen­er­ate a hash of each commitlog+committer
  • if you are memory-limited use ha/sh/ed/dirs/cvs-rev and store path­name in the list cvs-rev (path­name = “category-name”) as storage
  • store the hash also in pathname/cvs-rev

If you have only one item in ha/sh/ed/dirs/cvs-rev in the end, there was no repocopy and no sweep­ing com­mit, you can delete this ha/sh/ed/dirs/cvs-rev.

If you have more than … let’s say … 10 (sub­ject to tun­ing) path­names in ha/sh/ed/dirs/cvs-rev you found a sweep­ing com­mit and you can delete the ha/sh/ed/dirs/cvs-rev.

The meat

The remain­ing ha/sh/ed/dirs/cvs-rev are prob­a­bly repocopies. Take one ha/sh/ed/dirs/cvs-rev and for each path­name (there may be more than 2 path­names) in there have a look at pathname/. Take the first cvs-rev of each and check if they have the same hash. Con­tin­ue with the next rev-number for each until you found a cvs-rev which does not con­tain the same hash. If the num­ber of cvs-revs since the begin­ning is >= … let’s say … 3 (sub­ject to tun­ing), you have a can­di­date for a repocopy. If it is >=  … 10 (sub­ject to tun­ing), you have a very good indi­ca­tor for a repocopy. You have to pro­ceed until you have only one path­name left.

You may detect mul­ti­ple repocopies like A->B->C->D or A->B + A->D + A->C here.

Write out the repocopy can­di­date to a list and delete the ha/sh/ed/dirs/cvs-rev for each cvs-rev in a detect­ed sequence.

This finds repocopy can­di­dates for category/name/Makefile. To detect the cor­rect repocopy-date (there are maybe cas­es where anoth­er file was changed after the Make­file but before the repocopy), you now have to look at all the files for a giv­en repocopy-pair and check if there is a match­ing com­mit after the Makefile-commit-date. If you want to be 100% sure, you com­pare the com­plete commit-history of all files for a giv­en repocopy-pair.

linux_base-c6

Seems I for­got to announce that the linux_base-c6 is in the Ports Col­lec­tion now. Well, it is not a replace­ment for the cur­rent default lin­ux base, the lin­ux­u­la­tor infra­struc­ture ports are miss­ing and we need to check if the ker­nel sup­ports enough of 2.6.18 that noth­ing breaks.

TODO:

  • check for updat­ed RPMs for linux_base-c6
  • cre­ate lin­ux­u­la­tor infra­struc­ture ports
  • improve the ker­nel to sup­port more of lin­ux 2.6.18

To my knowl­edge, nobody is work­ing on any­thing of this. Any­one is wel­come to have a look and pro­vide patches.

New Cen­tOS linux_base for test­ing soonish

It seems my HOWTO cre­ate a new linux_base port was not too bad. There is now a PR for a Cen­tOS 6 based linux_base port. I had a quick look at it and it seems that it is near­ly usable to include into the Ports Col­lec­tion (the SRPMs need to be added, but that can be done with­in some minutes).

When FreeB­SD 8.3 is released and the Ports Col­lec­tion open for sweep­ing com­mits again, I will ask port­m­gr to do a repo-copy for the new port and com­mit it. This is just the linux_base port, not the com­plete infra­struc­ture which is need­ed to com­plete­ly replace the cur­rent default lin­ux­u­la­tor user­land. This is just a start. The process of switch­ing to a more recent linux_base port is a long process, and in this case depends upon enough sup­port in the sup­port­ed FreeB­SD releases.

Atten­tion: Any­one installing the port from the PR should be aware that using it is a high­ly exper­i­men­tal task. You need to change the lin­ux­u­la­tor to imper­son­ate him­self as a lin­ux 2.6.18 ker­nel (described in the pkg-message of the port), and the code in FreeB­SD is far from sup­port­ing this. Any­one who wants to try it is wel­come, but you have to run FreeBSD-current as of at least the last week­end, and watch out for ker­nel mes­sages about unsup­port­ed syscalls. Reports to emulation@FreeBSD.org please, not here on the webpage.

HOWTO add linux-infrastructure ports for a new linux_base port

In my last blog-post I described how to cre­ate a new linux_base port. This blog-post is about the oth­er Linux-ports which make up the Linux-infrastructure in the FreeB­SD Ports Col­lec­tion for a giv­en Linux-release.

What are linux-infrastructure ports?

A linux_base port con­tains as much as pos­si­ble and at the same time as lit­tle as pos­si­ble to make up a use­ful Linux-compatibility-experience in FreeB­SD. I know, this is not a descrip­tive expla­na­tion. And it is not on pur­pose. There are no fixed rules what has to be inside or what not. It “matured” into the cur­rent shape. A prac­ti­cal exam­ple is, that there is no GUI-stuff in the linux_base. While you need the GUI parts like GTK or QT for soft­ware like Skype and acrore­ad, you do not need them for head­less game servers. While you may need var­i­ous libraries for game servers, you may not need those for Skype or acrore­ad. As such some stan­dard parts are in sep­a­rate ports which are named linux-LINUX_DIST_SUFFIX-NAME. For GTK and the Fedo­ra 10 release this results in linux-f10-gtk2. Such gener­ic ports which depend upon a spe­cif­ic Linux-release make up the Linux-infrastructure in the FreeB­SD Ports Col­lec­tion. Those ports are ref­er­enced in port-Makefiles via the USE_LINUX_APPS vari­able, e.g. USE_LINUX_APPS=gtk2.

If you cre­at­ed a new linux_base port, you need most stan­dard infra­struc­ture ports in a ver­sion for the Linux-release used in the linux_base port, to have the Linux-application ports in the FreeB­SD Ports Col­lec­tion work­ing (if you are unlucky, some ports do not play well with the Linux-release you have cho­sen, but this is out of the scope of this HOWTO).

Updat­ing Mk/bsd.linux-apps.mk

 First we need to set the LINUX_DIST_SUFFIX vari­able to a val­ue suit­able to the new Linux-release. This is done in the con­di­tion­al which checks the OVERRIDE_LINUX_NONBASE_PORTS vari­able for valid val­ues. Add an appro­pri­ate con­di­tion­al, and do not for­get to add the new valid val­ue to the IGNORE line in the last else branch of the conditional.

The next step is to check the _LINUX_APPS_ALL and _LINUX_26_APPS vari­ables. If there are some infra­struc­ture ports which are not avail­able for the new Linux-release, the con­di­tion­al which checks the avail­abil­i­ty of a giv­en infra­struc­ture port for a giv­en Linux-release needs to be mod­i­fied. If at a lat­er step you notice that there are some addi­tion­al infra­struc­ture ports nec­es­sary for the new Linux-release, _LINUX_APPS_ALL and the check-logic needs to be mod­i­fied too (e.g. add a new vari­able for your Linux-release, add the con­tent of the vari­able to _LINUX_APPS_ALL, and change the check to do the right thing).

After that two tedious parts need to be done.

For each infra­struc­ture port there is a set of vari­ables. The name_PORT vari­able con­tains the loca­tion of the port in the Ports Col­lec­tion. Typ­i­cal­ly you do not have to change it (if you real­ly want to change it, do not do it, fix the nam­ing of the infra­struc­ture port instead), because we use a nam­ing con­ven­tion here which includes the LINUX_DIST_SUFFIX. The name_DETECT vari­able is an inter­nal vari­able, do not change it (if you cre­ate a new infra­struc­ture port, copy it from some­where else and make sure the name in val­ue of the vari­able match­es the port name in the name of the vari­able). Then there are sev­er­al name_suf­fix_FILE vari­ables. Leave the exist­ing ones alone, and add a new one with the cor­rect suf­fix for your new Linux-release. The val­ue of the vari­able needs to be an impor­tant file which is installed by the infra­struc­ture port in ques­tion. FYI: The con­tent of the name_suf­fix_FILE vari­ables are used to set the name_DETECT vari­ables, depend­ing on the Linux-relase the name_DETECT vari­ables are used to check if the port is already installed. Ide­al­ly the name_suf­fix_FILE vari­able points to a library in the port. The name_DEPENDS vari­able lists depen­den­cies of this infra­struc­ture port. If the depen­den­cies changed in your Linux-release, you need to add a con­di­tion­al to change the depen­den­cy if LINUX_DIST_SUFFIX is set to your Linux-release.

Nor­mal­ly this is all what needs to be done in PORTSDIR/Mk/bsd.linux-apps.mk, the rest of the file is code to check depen­den­cies and some cor­rect­ness checks.

The sec­ond tedious part is to actu­al­ly cre­ate all those infra­struc­ture ports. Nor­mal­ly you can copy an exist­ing infra­struc­ture port, rename it, adjust the PORTNAME, PORTVERSION, PORTREVISION, MASTER_SITES, PKGNAMEPREFIX, DISTFILES, CONFLICTS (also in all oth­er Linux-release ver­sions of this infra­struc­ture port), LINUX_DIST_VER, RPMVERSION (if set/neccesary) and SRC_DISTFILE vari­ables, gen­er­ate the dis­t­file check­sums (make make­sum), and fix the plist. I sug­gest to script parts of this work (as of this writ­ing Fresh­ports counts 68 ports where the port­name starts with linux-f10-).

Adding new infra­struc­ture ports, or remov­ing infra­struc­ture ports for a giv­en Linux-release

If your Linux-release does not come with a pack­age for an exist­ing infra­struc­ture port, just do not cre­ate a cor­re­spond­ing name_suf­fix_FILE line. You still need to do the right thing regard­ing depen­den­cies of ports which depend upon this non-existing infra­struc­ture port (if your Linux-release comes with pack­ages for them).

To add a new infra­struc­ture port, copy an exist­ing block, rename the vari­ables, set them cor­rect­ly, add a new vari­able for your Linux-release in the first _LINUX_APPS_ALL sec­tion, add the con­tent of this vari­able to _LINUX_APPS_ALL, and change the check-logic as described above.

Final words

If you have some­thing which installs and dein­stalls cor­rect­ly, feel free to pro­vide it on freebsd-emulation@FreeBSD.org for review/testing. If you have ques­tions dur­ing the port­ing, feel also free to send a mail there.

HOWTO cre­ate a new linux_base port

FreeB­SD is in need of a new linux_base port. It is on my TODO list since a long time, but I do not get the time to cre­ate one. I still do not have the time to work on a new one, but when you read this, I man­aged to get the time to cre­ate a HOWTO which describes what needs to be done to cre­ate a new linux_base port.

I will not describe how to cre­ate a new linux_base port from scratch, I will just describe how you can copy the last one and update it to some­thing new­er based upon the exist­ing infra­struc­ture for RPM packages.

Spe­cif­ic ques­tions which come up dur­ing port­ing a new Lin­ux release should be asked on freebsd-emulation@FreeBSD.org,  there are more peo­ple which can answer ques­tions than here in my blog. I will add use­ful infor­ma­tion to this HOWTO if necessary.

In the easy case most of the work is search­ing the right RPMs and their depen­den­cies to use, and to cre­ate the plist.

Why do we need a new linux_base port?

The cur­rent linux_base port is based upon Fedo­ra 10, which is end of life since Decem­ber 2009. Even Fedo­ra 13 is already end of life. Fedo­ra 16 is sup­posed to be released this year. From a sup­port point of view, Fedo­ra 15 or maybe even Fedo­ra 16 would be a good tar­get for the next linux_base port. Oth­er alter­na­tives would be to use an extend­ed life­time release of anoth­er RPM based dis­tri­b­u­tion, like for exam­ple Cen­tOS 6 (which seems to be based upon Fedo­ra 12 with back­ports from Fedo­ra 13 and 14). Using a Lin­ux release which is told to be sup­port­ed for at least 10 years, sounds nice from a FreeB­SD point of view (only minor changes to the lin­ux ports in such a case, instead of cre­at­ing a com­plete new linux_base each N+2 releas­es like with Fedo­ra), but it also means addi­tion­al work if you want to cre­ate the first linux_base port for it.

The mys­ter­ies you have to con­quer if you want to cre­ate a new linux_base port

What we do not know is, if Fedo­ra 1516, Cen­tOS 6, or any oth­er Lin­ux release will work in a sup­port­ed FreeB­SD release. There are two ways to find this out.

The first one is to take an exist­ing Lin­ux sys­tem, chroot into it (either via NFS or after mak­ing a copy into a direc­to­ry of a FreeB­SD sys­tem), and to run a lot of pro­grams (acrore­ad, skype, shells, scripts, …). The LTP test­suite is not that much use­ful here, as it will test most­ly ker­nel fea­tures, but we do not know which ker­nel fea­tures are manda­to­ry for a giv­en user­land of a Lin­ux release.

The sec­ond way of test­ing if a giv­en Lin­ux release works on FreeB­SD is to actu­al­ly cre­ate a new linux_base port for it and test it with­out chrooting.

The first way is faster, if you are only inter­est­ed in test­ing if some­thing works. The sec­ond way pro­vides an easy to set­up test­bed for FreeB­SD ker­nel devel­op­ers to fix the Lin­ux­u­la­tor so that it works with the new linux_base port. Both ways have their mer­its, but it is up to the per­son doing the work to decide which way to go.

The meat: HOWTO cre­ate a new linux_base port

First off, you need a sys­tem (or a jail) with­out any linux_base port installed. After that you can cre­ate a new linux_base port (= lbN), by just mak­ing a copy of the lat­est one (= lbO). In lbN you need to add lbO as a CONFLICT, and in all oth­er exist­ing linux_base ports, you need to add lbN as a conflict.

Change the PORTNAME, PORTVERSION, reset the PORTREVISION in lbN, and set LINUX_DIST_VER  to the new Linux-release ver­sion in the lbN Make­file (this is used in PORTSDIR/Mk/bsd.linux-rpm.mk and PORTSDIR/Mk/bsd.linux-apps.mk).

If you do not stay with Fedo­ra, there is some more work to do before you can have a look at chos­ing RPMs for instal­la­tion. You need to have a look at PORTSDIR/Mk/bsd.linux-rpm.mk and add some cas­es for the new LINUX_DIST you want to use. Do not for­get to set LINUX_DIST in the lbN Make­file to the name of the dis­tri­b­u­tion you use. You also need to aug­ment the LINUX_DIST_VER check in PORTSDIR/Mk/bsd.linux-rpm.mk with some LINUX_DIST con­di­tion­als. If you are lucky, the direc­to­ry struc­ture for down­loads is sim­i­lar to the Fedo­ra struc­ture, and there is not a lot to do here.

When this is done, you can have a look at the BIN_DISTFILES vari­able in the lbN Make­file. Try to find sim­i­lar RPMs for the new Lin­ux release you want to port. Some may not be avail­able, and it may also be the case that dif­fer­ent ones are need­ed instead. I sug­gest to first work with the ones which are avail­able (make make­sum, test install and cre­ate plist). After that you need to find out what the replace­ment RPMs for non-existing ones are. You are on your own here. Search around the net, and/or have a look at the depen­den­cies in the RPMs of lbO to deter­mine if some­thing was added as a depen­den­cy of some­thing else or not (if not, for­get about it ATM). When you man­aged to find replace­ment RPMs, you can now have a look at the depen­den­cies of the RPMs in lbN. Do not add blind­ly all depen­den­cies, not all are need­ed in FreeB­SD (the linux_base ports are not sup­posed to cre­ate an envi­ron­ment which you can chroot into, they are sup­posed to aug­ment the FreeB­SD sys­tem to be able to run Lin­ux pro­grams in ports like they where FreeB­SD native pro­grams). What you need in the linux_base ports are libraries, con­fig and data files which do not exist in FreeB­SD or have a dif­fer­ent syn­tax than in FreeB­SD (those con­fig or data files which are just in a dif­fer­ent place, can be sym­linked), and basic shell com­mands (which com­mands are need­ed or not… well… good ques­tion, in the past we made deci­sions what to include based upon prob­lem reports from users). Now for the things which are not avail­able and where not added as a depen­den­cy. Those are things which are either used dur­ing install, or where use­ful to have in the past. Find out by what it was replaced and have a look if this replace­ment can eas­i­ly be used instead. If it can be used, add it. If not, well… bad luck, we (the FreeB­SD com­mu­ni­ty) will see how to han­dle this somehow.

If you think that you have all you need in BIN_DISTFILES, please update SRC_DISTFILES accord­ing­ly and gen­er­ate the dis­t­file via  make ‑DPACKAGE_BUILDING make­sum to have the check­sums of the sources (for legal rea­sons we need them on our mirrors).

The next step is to have a look at REMOVE_DIRS, REMOVE_FILES and ADD_DIRS if some­thing needs to be mod­i­fied. Most of them are there to fall back to the cor­re­spond­ing FreeB­SD directories/files, or because they are not need­ed at all (REMOVE_*). Do not remove direc­to­ries from ADD_DIRS, they are cre­at­ed here to fix some edge con­di­tions (I do not remem­ber exact­ly why we had to add them, and I do not take the time ATM to search in the CVS history).

If you are lucky, this is all (make sure the plist is cor­rect). If you are not lucky and you need to make some mod­i­fi­ca­tions to files, have a look at the do-build tar­get in the Make­file, this is the place where some changes are done to cre­ate a nice user experience.

If you arrive here while cre­at­ing a new linux_base port, lean back and feel a bit proud. You man­aged to cre­ate a new linux_base port. It is not very well test­ed at this moment, and it is far from every­thing which needs to be done to have the com­plete Lin­ux infra­struc­ture for a giv­en Lin­ux release, but the most impor­tant part is done. Please noti­fy freebsd-emulation@FreeBSD.org and call for testers.

What is missing?

The full Lin­ux­u­la­tor infra­struc­ture for the FreeB­SD Ports Col­lec­tion has some more ports around a linux_base port. Most of the infra­struc­ture for this is han­dled in Mk/bsd.linux-apps.mk.

UPDATE: I got some time to write how to update the Linux-infrastructure ports.