FreeBSD is on its way to move from CVS to SVN for the version control system for the Ports Collection. The decision was made to keep the complete history, so the complete CVS repository has to be converted to SVN.
As CVS has no way to record a copy or move of files inside the repository, we copied the CVS files inside the repository in case we wanted to copy or move a file (the so called “repocopy”). While this allows to see the full history of a file, the drawback is that you do not really know when a file was copied/moved if you are not strict at recording this info after doing a copy. Guess what, we where not.
Now with the move to SVN which has a build-in way for copies/moves, it would be nice if we could record this info. In an internal discussion someone told its not possible to detect a repocopy reliably.
Well, I thought otherwise and an hour later my mail went out how to detect one. The longest time was needed to write how to do it, not to come up with a solution. I do not know if someone picked up this algorithm and implemented something for the cvs2svn converter, but I decided to publish the algorithm here if someone needs a similar functionality somewhere else. Note, the following is tailored to the structure of the Ports Collection. This allows to speed up some things (no need to do all steps on all files). If you want to use this in a generic repository where the structure is not as regular as in our Ports Collection, you have to run this algorithm on all files.
It also detects commits where multiple files where committed at once in one commit (sweeping commits).
Preparation
- check only category/name/Makefile
- generate a hash of each commitlog+committer
- if you are memory-limited use ha/sh/ed/dirs/cvs-rev and store pathname in the list cvs-rev (pathname = “category-name”) as storage
- store the hash also in pathname/cvs-rev
If you have only one item in ha/sh/ed/dirs/cvs-rev in the end, there was no repocopy and no sweeping commit, you can delete this ha/sh/ed/dirs/cvs-rev.
If you have more than … let’s say … 10 (subject to tuning) pathnames in ha/sh/ed/dirs/cvs-rev you found a sweeping commit and you can delete the ha/sh/ed/dirs/cvs-rev.
The meat
The remaining ha/sh/ed/dirs/cvs-rev are probably repocopies. Take one ha/sh/ed/dirs/cvs-rev and for each pathname (there may be more than 2 pathnames) in there have a look at pathname/. Take the first cvs-rev of each and check if they have the same hash. Continue with the next rev-number for each until you found a cvs-rev which does not contain the same hash. If the number of cvs-revs since the beginning is >= … let’s say … 3 (subject to tuning), you have a candidate for a repocopy. If it is >= … 10 (subject to tuning), you have a very good indicator for a repocopy. You have to proceed until you have only one pathname left.
You may detect multiple repocopies like A->B->C->D or A->B + A->D + A->C here.
Write out the repocopy candidate to a list and delete the ha/sh/ed/dirs/cvs-rev for each cvs-rev in a detected sequence.
This finds repocopy candidates for category/name/Makefile. To detect the correct repocopy-date (there are maybe cases where another file was changed after the Makefile but before the repocopy), you now have to look at all the files for a given repocopy-pair and check if there is a matching commit after the Makefile-commit-date. If you want to be 100% sure, you compare the complete commit-history of all files for a given repocopy-pair.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: algorithm,
category name,
cvs repository,
dirs,
drawback,
generic repository,
hash,
longest time,
mail,
multiple files —
In several previous posts I wrote about my quest for the right source format to stream video to my Sony BRAVIA TV (build in 2009). The last week-end I finally found something which satisfies me.
What I found was serviio, a free UPnP-AV (DLNA) server. It is written in java and runs on Windows, Linux and FreeBSD (it is not listed on the website, but we have an not-so-up-to-date version in the ports tree). If necessary it transcodes the input to an appropriate format for the DLNA renderer (in my case the TV).
I tested it with my slow Netbook, so that I was able to see with which input format it will just remux the input container to a MPEG transport stream, and which input format would be really re-encoded to a format the TV understands.
The bottom line of the tests is, that I just need to use a supported container (like MKV or MP4 or AVI) with H.264-encoded video (e.g. encoded by x264) and AC3 audio.
The TV is able to chose between several audio streams, but I have not tested if serviio is able to serve files with multiple audio streams (my wife has a different mother language than me, so it is interesting for us to have multiple audio streams for a movie), and I do not know if DLNA supports something like this.
Now I just have to replace minidlna (which only works good with my TV for MP3s and Pictures) with serviio on my FreeBSD file server and we can forget about the disk-juggling.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: audio streams,
dlna server,
input container,
input format,
mother language,
netbook,
sony bravia tv,
source format,
transport stream,
x264 —
Michael W. Lucas published his new book “SSH Mastery” (no link to an online store, get it from your preferred online or offline one in your part of the world).
Do you think you know a lot about SSH? I thought I did when Michael searched technical proof-readers for this book. I offered to have a look at his work in progress and he gently accepted (while I do not get money for this, I am one of the persons he thanks for the technical review in the beginning, so I am involved somehow and as such you should take the following with a grain of salt).
I already had user restrictions in place before the review, but now I narrowed down some restrictions based upon some conditionals. I already used SSH tunnels for various things before (where legally applicable), but I learned some additional VPN techniques with SSH. I already used multiple ssh-keys for various things, but Michael provides some interesting ways of handling a large-volume of ssh-keys over multiple machines. … I really hope that my review was as valuable for Michael, as it was for me to do the review.
He ends the book with “You now know more about SSH, OpenSSH and Putty than the vast majority of IT professionals! Congratulations”, and this is true, and all that in his writing style where you can come with a problem, read about it, and leave with a solution (normally with a little bit of entertainment in between).
I know a lot of people which work daily with SSH, and they know only a small part of what is presented in this book. In my opinion this book is a must-have for every System/Database/Application/Whatever Administrator in charge of something on an UNIX-like system, and even “normal users” of SSH (no matter if they use PuTTY, or a ssh command line program on an UNIX-like system (most probably it will be OpenSSH or a clone of it)) will get some helpful information from this book.
I can only recommend it.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: conditionals,
database application,
grain of salt,
michael w lucas,
openssh,
proof readers,
putty ssh,
ssh command,
ssh putty,
system database —
In the light of the recent benchmark discussion, a volunteer imported the tuning man-page into the wiki. Some comments at some places for possible improvements are already made. Please go over there, have a look, and participate please (testing/verification/discussion/improvements/…).
As always, feel free to register with FirstnameLastname and tell a FreeBSD committer to add you to the contributors group for write access (you also get the benefit to be able to register for an email notification for specific pages).
GD Star Rating
loading…
GD Star Rating
loading…
Tags: benchmark,
benefit,
email notification,
improvements,
man page,
tuning guide,
volunteer,
wiki —
I have the habit to chmod with the relative notation (e.g. g+w or a+r or go-w or similar) instead of the absolute one (e.g. 0640 or u=rw,g=r,o=). Recently I had to chmod a lot of files. As usual I was using the relative notation. With a lot of files, this took a lot of time. Time was not really an issue, so I did not stop it to restart with a better performing command (e.g. find /path –type f –print0 | xargs –0 chmod 0644; find /path –type d –print0 | xargs –0 chmod 0755), but I thought a little tips&tricks posting may be in order, as not everyone knows the difference.
The relative notation
When you specify g+w, it means to remove the write access for the group, but keep everything else like it is. Naturally this means that chmod first has to lookup the current access rights. So for each async write request, there has to be a read-request first.
The absolute notation
The absolute notation is what most people are used to (at least the numeric one). It does not need to read the access rights before changing them, so there is less I/O to be done to get what you want. The drawback is that it is not so nice for recursive changes. You do not want to have the x-bit for data files, but you need it for directories. If you only have a tree with data files where you want to have an uniform access, the example above via find is probably faster (for sure if the directory meta-data is still in RAM).
If you have a mix of binaries and data, it is a little bit more tricky to come up with a way which is faster. If the data has a name-pattern, you could use it in the find.
And if you have a non-uniform access for the group bits and want to make sure the owner has write access to everything, it may be faster to use the relative notation than to find a replacement command-sequence with the absolute notation.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: binaries,
chmod,
command sequence,
drawback,
habit,
little bit,
meta data,
path type,
speed traps,
time time —