Al­gorithm to de­tect repo-​copies in CVS

FreeBSD is on its way to move from CVS to SVN  for the ver­sion con­trol sys­tem for the Ports Col­lec­tion. The de­cision was made to keep the com­plete his­tory, so the com­plete CVS re­pos­it­ory has to be con­ver­ted to SVN.

As CVS has no way to re­cord a copy or move of files in­side the re­pos­it­ory, we copied the CVS files in­side the re­pos­it­ory in case we wanted to copy or move a file (the so called “re­po­copy”). While this al­lows to see the full his­tory of a file, the draw­back is that you do not really know when a file was copied/​moved if you are not strict at re­cord­ing this info af­ter do­ing a copy. Guess what, we where not.

Now with the move to SVN which has a build-​in way for copies/​moves, it would be nice if we could re­cord this info. In an in­ternal dis­cus­sion someone told its not pos­sible to de­tect a re­po­copy re­li­ably.

Well, I thought oth­er­wise and an hour later my mail went out how to de­tect one. The longest time was needed to write how to do it, not to come up with a solu­tion. I do not know if someone picked up this al­gorithm and im­ple­men­ted some­thing for the cvs2svn con­verter, but I de­cided to pub­lish the al­gorithm here if someone needs a sim­ilar func­tion­al­ity some­where else. Note, the fol­low­ing is tailored to the struc­ture of the Ports Col­lec­tion. This al­lows to speed up some things (no need to do all steps on all files). If you want to use this in a gen­eric re­pos­it­ory where the struc­ture is not as reg­u­lar as in our Ports Col­lec­tion, you have to run this al­gorithm on all files.

It also de­tects com­mits where mul­tiple files where com­mit­ted at once in one com­mit (sweep­ing com­mits).

Pre­par­a­tion

  • check only category/​name/​Make­file
  • gen­er­ate a hash of each commitlog+committer
  • if you are memory-​limited use ha/​sh/​ed/​dirs/​cvs-​rev and store path­name in the list cvs-​rev (path­name = “category-​name”) as stor­age
  • store the hash also in pathname/​cvs-​rev

If you have only one item in ha/​sh/​ed/​dirs/​cvs-​rev in the end, there was no re­po­copy and no sweep­ing com­mit, you can de­lete this ha/​sh/​ed/​dirs/​cvs-​rev.

If you have more than … let’s say … 10 (sub­ject to tun­ing) path­names in ha/​sh/​ed/​dirs/​cvs-​rev you found a sweep­ing com­mit and you can de­lete the ha/​sh/​ed/​dirs/​cvs-​rev.

The meat

The re­main­ing ha/​sh/​ed/​dirs/​cvs-​rev are prob­ably re­po­cop­ies. Take one ha/​sh/​ed/​dirs/​cvs-​rev and for each path­name (there may be more than 2 path­names) in there have a look at pathname/​. Take the first cvs-​rev of each and check if they have the same hash. Con­tinue with the next rev-​number for each un­til you found a cvs-​rev which does not con­tain the same hash. If the num­ber of cvs-​revs since the be­gin­ning is >= … let’s say … 3 (sub­ject to tun­ing), you have a can­did­ate for a re­po­copy. If it is >=  … 10 (sub­ject to tun­ing), you have a very good in­dic­ator for a re­po­copy. You have to pro­ceed un­til you have only one path­name left.

You may de­tect mul­tiple re­po­cop­ies like A->B->C->D or A->B + A->D + A->C here.

Write out the re­po­copy can­did­ate to a list and de­lete the ha/​sh/​ed/​dirs/​cvs-​rev for each cvs-​rev in a de­tec­ted se­quence.

This finds re­po­copy can­did­ates for category/​name/​Makefile. To de­tect the cor­rect repocopy-​date (there are maybe cases where an­other file was changed af­ter the Make­file but be­fore the re­po­copy), you now have to look at all the files for a given repocopy-​pair and check if there is a match­ing com­mit af­ter the Makefile-​commit-​date. If you want to be 100% sure, you com­pare the com­plete commit-​history of all files for a given repocopy-​pair.

Free DLNA server which works good with my Sony BRAVIA TV

In sev­eral pre­vi­ous posts I wrote about my quest for the right source format to stream video to my Sony BRAVIA TV (build in 2009). The last week-​end I fi­nally found some­thing which sat­is­fies me.

What I found was ser­viio, a free UPnP-​AV (DLNA) server. It is writ­ten in java and runs on Win­dows, Linux and FreeBSD (it is not lis­ted on the web­site, but we have an not-​so-​up-​to-​date ver­sion in the ports tree). If ne­ces­sary it transcodes the in­put to an ap­pro­pri­ate format for the DLNA ren­derer (in my case the TV).

I tested it with my slow Net­book, so that I was able to see with which in­put format it will just re­mux the in­put con­tainer to a MPEG trans­port stream, and which in­put format would be really re-​encoded to a format the TV un­der­stands.

The bot­tom line of the tests is, that I just need to use a sup­por­ted con­tainer (like MKV or MP4 or AVI) with H.264-encoded video (e.g. en­coded by x264) and AC3 au­dio.

The TV is able to chose between sev­eral au­dio streams, but I have not tested if ser­viio is able to serve files with mul­tiple au­dio streams (my wife has a dif­fer­ent mother lan­guage than me, so it is in­ter­est­ing for us to have mul­tiple au­dio streams for a movie), and I do not know if DLNA sup­ports some­thing like this.

Now I just have to re­place min­idlna (which only works good with my TV for MP3s and Pic­tures) with ser­viio on my FreeBSD file server and we can for­get about the disk-​juggling.

What you should know about SSH

Mi­chael W. Lu­cas pub­lished his new book “SSH Mas­tery” (no link to an on­line store, get it from your pre­ferred on­line or off­line one in your part of the world).

Do you think you know a lot about SSH? I thought I did when Mi­chael searched tech­nical proof-​readers for this book. I offered to have a look at his work in pro­gress and he gently ac­cep­ted (while I do not get money for this, I am one of the per­sons he thanks for  the tech­nical re­view in the be­gin­ning, so I am in­volved some­how and as such you should take the fol­low­ing with a grain of salt).

I already had user re­stric­tions in place be­fore the re­view, but now I nar­rowed down some re­stric­tions based upon some con­di­tion­als. I already used SSH tun­nels for vari­ous things be­fore (where leg­ally ap­plic­able), but I learned some ad­di­tional VPN tech­niques with SSH. I already used mul­tiple ssh-​keys for vari­ous things, but Mi­chael provides some in­ter­est­ing ways of hand­ling a large-​volume of ssh-​keys over mul­tiple ma­chines. … I really hope that my re­view was as valu­able for Mi­chael, as it was for me to do the re­view.

He ends the book with “You now know more about SSH, OpenSSH and Putty than the vast ma­jor­ity of IT pro­fes­sion­als! Con­grat­u­la­tions”, and this is true, and all that in his writ­ing style where you can come with a prob­lem, read about it, and leave with a solu­tion (nor­mally with a little bit of en­ter­tain­ment in between).

I know a lot of people which work daily with SSH, and they know only a small part of what is presen­ted in this book. In my opin­ion this book is a must-​have for every System/​Database/​Application/​Whatever Ad­min­is­trator in charge of some­thing on an UNIX-​like sys­tem, and even “nor­mal users” of SSH (no mat­ter if they use PuTTY, or a ssh com­mand line pro­gram on an UNIX-​like sys­tem (most prob­ably it will be OpenSSH or a clone of it)) will get some help­ful in­form­a­tion from this book.

I can only re­com­mend it.

Tun­ing guide in the wiki

In the light of the re­cent bench­mark dis­cus­sion, a vo­lun­teer im­por­ted the tun­ing man-​page into the wiki. Some com­ments at some places for pos­sible im­prove­ments are already made. Please go over there, have a look, and par­ti­cip­ate please (testing/​verification/​discussion/​improvements/​…).

As al­ways, feel free to re­gister with First­nameLast­name and tell a FreeBSD com­mit­ter to add you to the con­trib­ut­ors group for write ac­cess (you also get the be­ne­fit to be able to re­gister for an email no­ti­fic­a­tion for spe­cific pages).