Tarsnap us­age stat­ist­ics

The more time passes with tarsnap, the more impressive it is.

Following is a list of all my privately used systems (2 machines which only host jails -- here named Prison1 and Prison2 -- and several jails -- here named according to their functionality) together with some tarsnap statistics. For each backup tarsnap prints out some statistics. The amount of uncompressed storage space of all archives of this machine, the compressed storage space of all archives, the unique uncompressed storage space of all archives, the unique compressed storage space of all archives, and the same mount of info for the current archive. The unique storage space is after deduplication. The most interesting information is the unique and compressed one. For a specific archive it shows the amount of data which is different to all other archives, and for the total amount it tells how much storage space is used on the tarsnap server. I do not backup all data in tarsnap. I do a full backup on external storage (zfs snapshot + zfs send | zfs receive) once in a while and tarsnap is only for the stuff which could change daily or is very small (my mails belong to the first group, the config of applications or the system to the second group). At the end of the post there is also an overview of the money I have spend so far in tarsnap for the backups.

Attention: the following graphs are displaying small values in KB, while the text is telling about sizes in MB or even GB!


The backup of one day covers 1.1 GB of uncompressed data, the subtrees I backup are /etc, /usr/local/etc, /home, /root, /var/db/pkg, /var/db/mergemaster.mtree, /space/jails/flavours and a subversion checkout of /usr/src (excluding the kernel compile directory; I backup this as I have local modifications to FreeBSD). If I want to have all days uncompressed on my harddisk, I would have to provide 10 GB of storage space. Compressed this comes down to 2.4 GB, unique uncompressed this is 853 MB, and unique compressed this is 243 MB. The following graph splits this up into all the backups I have as of this writting. I only show the unique values, as including the total values would make the unique values disappear in the graph (values too small).


In this graph we see that I have a constant rate of new data. I think this is mostly references to already stored data (/usr/src being the most likely cause of this, nothing changed in those directories).


One day covers 7 MB of uncompressed data, all archives take 56 MB uncompressed, unique and compressed this comes down to 1.3 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/named, and /var/db/mergemaster.mtree.


This graph is strange. I have no idea why there is so much data for the second and the last day. Nothing changed.


One day covers 8 MB of uncompressed data, all archives take 62 MB uncompressed, unique and compressed this comes down to 1.5 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/spool/postfix, and /var/db/mergemaster.mtree.


This looks not bad. I was sending a lot of mails on the 25th. And the days in the middle I was not sending much.


One day covers about 900 MB of uncompressed data, all archives take 7.2 GB uncompressed, unique and compressed this comes down to 526 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /home (mail folders) and /usr/local/share/courier-imap.


Obviously I have a not so small amount of change in my mailbox. As my spamfilter is working nicely this is directly correlated to mails from various mailinglists (mostly FreeBSD).

MySQL (for the Horde webmail interface)

One day covers 100 MB of uncompressed data, all archives take 801 MB uncompressed, unique and compressed this comes down to 19 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mysql and /var/db/mergemaster.mtree.


This is correlated with the use of my webmail interface, and as such is also correlated with the amount of mails I get and send. Obviously I did not use my webmail interface at the weekend (as the backup covers the change of the previous day).


One day covers 121 MB of uncompressed data, all archives take 973 MB uncompressed, unique and compressed this comes down to 33 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /usr/local/www/horde and /home.


This one is strange again. Nothing in the data changed.


One day covers 10 MB of uncompressed data, all archives take 72 MB uncompressed, unique and compressed this comes down to 1.9 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree and /var/db/samba.


Here we see the changes to /var/db/samba, this should be mostly my Wii accessing multimedia files there.


One day covers 31 MB of uncompressed data, all archives take 223 MB uncompressed, unique and compressed this comes down to 6.6 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg and /var/db/mergemaster.mtree.


This is also a strange graph. Again, nothing changed there (the cache directory is not in the backup).


One day covers 44 MB of uncompressed data, all archives take 310 uncompressed, unique and compressed this comes down to 11 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /home and /usr/local/www/phpMyAdmin.


And again a strange graph. No changes in the FS.


One day covers 120 MB of uncompressed data, all archives take 845 MB uncompressed, unique and compressed this comes down to 25 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /usr/local/www/gallery2 and /home/gallery (excluding some parts of /home/gallery).


This one is OK. Friends and Family accessing the pictures.


One day covers 7 MB of uncompressed data, all archives take 28 MB uncompressed, unique and compressed this comes down to 1.3 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /space/jails/flavours and /home.


This one looks strange to me again. Same reasons as with the previous graphs.


One day covers 56 MB of uncompressed data, all archives take 225 MB uncompressed, unique and compressed this  comes down to 5.4 MB. This covers /etc, /usr/local/etc, /usr/local/www/postfixadmin, /root/, /var/db/pkg, /var/db/mysql, /var/spool/postfix and /var/db/mergemaster.mtree.


This graph looks OK to me.


One day covers 59 MB of uncompressed data, all archives take 478 MB uncompressed, unique and compressed this comes down to 14 MB. This covers /etc, /usr/local/etc, /root, /home, /var/db/pkg, /var/db/mergemaster.mtree, /var/db/mysql and /var/spool/ejabberd (yes, no backup of the web-data, I have it in another jail, no need to backup it again).


With the MySQL and XMPP databases in the backup, I do not think this graph is wrong.


The total amount of stored data per system is:



Since I use tarsnap (8 days), I have spend 38 cents, most of this is bandwidth cost for the transfer of the initial backup (29.21 cents). According to the graphs, I am currently at about 8-14 cents per week (or about half a dollar per month) for my backups (I still have a machine to add, and this may increase the amount in a similar way than the Prison1 system with 2-3 jails). The amount of money spend in US-cents (rounded!) per day is:


ZFS & power-​failure: stable

At the week­end there was a power–fail­ure at our disaster-​recovery-​site. As everything should be con­nec­ted to the UPS, this should not have had an im­pact… un­for­tu­nately the guys re­spons­ible for the cabling seem to have not provided enough power con­nec­tions from the UPS. Res­ult: one of our stor­age sys­tems (all volumes in sev­er­al RAID5 vir­tu­al disks) for the test sys­tems lost power, 10 hard­disks switched in­to failed state when the power was stable again (I was told there where sev­er­al small power-​failures that day). Af­ter telling the soft­ware to have a look at the drives again, all phys­ic­al disks where ac­cep­ted.

All volumes on one of the vir­tu­al disks where dam­aged (ac­tu­ally, one of the vir­tu­al disks was dam­aged) bey­ond re­pair and we had to re­cov­er from backup.

All ZFS based moun­t­points on the good vir­tu­al disks did not show bad be­ha­vi­or (zfs clear + zfs scrub for those which showed check­sum er­rors to make us feel bet­ter). For the UFS based ones… some caused a pan­ic af­ter re­boot and we had to run fsck on them be­fore try­ing a second boot.

We spend a lot more time to get UFS back on­line, than get­ting ZFS back on­line. Af­ter this ex­per­i­ence it looks like our fu­ture Sol­ar­is 10u8 in­stalls will be with root on ZFS (our work­sta­tions are already like this, but our servers are still at Sol­ar­is 10u6).

EMC^2/Legato Net­work­er status

We up­dated Net­work­er to as the Networker-​Support thought it will fix at least one of our prob­lems (“ghost” volumes in the DB). Un­for­tu­nately the up­date does not fix any bug we see in our en­vir­on­ment.

Spe­cially for the “post-​command runs 1 minute af­ter pre-​command even if the backup is not finished”-bug this is not sat­is­fy­ing: no con­sist­ent DB backup where the ap­plic­a­tion has to be stopped to­geth­er with the DB to get a con­sist­ent snap­shot (FS+DB in sync).

SUN Open­Stor­age present­a­tion

At work (cli­ent site) SUN made a present­a­tion about their Open­Stor­age products (Sun Stor­age 7000 Uni­fied Stor­age Sys­tems) today.

From a tech­no­logy point of view, the soft­ware side is noth­ing new to me. Us­ing SSDs for zfs as a read-​/​write-​cache is some­thing we can do (partly) already since at least Sol­ar­is 10u6 (that is the lowest Sol­ar­is 10 ver­sion we have in­stalled here, so I can not check quickly if the ZIL can be on a sep­ar­ate disk in pre­vi­ous ver­sions of Sol­ar­is, but I think we have to wait un­til we up­dated to Sol­ar­is 10u8 un­til we can have the L2ARC on a sep­ar­ate disk) or in FreeBSD. All oth­er nice ZFS fea­tures avail­able in the Open­Stor­age web in­ter­face are also not sur­pris­ing.

But the demon­stra­tion with the Stor­age Sim­u­lat­or im­pressed me. The in­ter­ac­tion with Win­dows via CIFS makes the older ver­sion of files in snap­shots avail­able in Win­dows (I as­sume this is the Volume Shad­ow Copy fea­ture of Win­dows), and the stat­ist­ics avail­able via DTrace in the web in­ter­face are also im­press­ive. All this tech­no­logy seems to be well in­teg­rated in­to an easy to use pack­age for het­ero­gen­eous en­vir­on­ments. If you would like to setup some­thing like this by hand, you would need to have a lot of know­ledge about a lot of stuff (and in the FreeBSD case, you would prob­ably need to aug­ment the ker­nel with ad­di­tion­al DTrace probes to be able to get a sim­il­ar gran­u­lar­ity of the stat­ist­ics), noth­ing a small com­pany is will­ing to pay.

I know that I can get a lot of in­form­a­tion with DTrace (from time to time I have some free cycles to ex­tend the FreeBSD DTrace im­ple­ment­a­tion with ad­di­tion­al DTrace probes for the linuxu­lat­or), but what they did with DTrace in the Open­Stor­age soft­ware is great. If you try to do this at home your­self, you need some time to im­ple­ment some­thing like this (I do not think you can take the DTrace scripts and run them on FreeBSD, this will prob­ably take some weeks un­til it works).

It is also the first time I see this new CIFS im­ple­ment­a­tion from SUN in ZFS life in ac­tion. It looks well done. In­teg­ra­tion with AD looks more easy than do­ing it by hand in Sam­ba (at least from look­ing at the Open­Stor­age web in­ter­face). If we could get this in FreeBSD… it would rock!

The en­tire Open­Stor­age web in­ter­face looks us­able. I think SUN has a pro­duct there which al­lows them to en­ter new mar­kets. A pro­duct which they can sell to com­pan­ies which did not buy some­thing from SUN be­fore (even Windows-​only com­pan­ies). I think even those Win­dows ad­mins which nev­er touch a com­mand line in­ter­face (read: the low-​level ones; not com­par­able at all with the really high-​profile Win­dows ad­mins of our cli­ent) could be able to get this up and run­ning.

As it seems at the mo­ment, our cli­ent will get a Sun Stor­age F5100 Flash Ar­ray for tech­no­logy eval­u­ation in the be­gin­ning of next year. Un­for­tu­nately the tech­no­logy looks to easy to handle, so I as­sume I have to take care about more com­plex things when this ma­chine ar­rives… 🙁

Fight­ing with the SUN LDAP server

At work we de­cided to up­date our LDAP in­fra­struc­ture. From SUN Dir­ect­ory Server 5.2 to 6.3(.1). The per­son do­ing this is: me.

We have some re­quire­ments for the ap­plic­a­tions we in­stall, we want them in spe­cific loc­a­tions so that we are able to move them between servers more eas­ily (no need to search all stuff in the en­tire sys­tem, just the gen­er­ic loc­a­tion and some stuff in /​etc needs to be taken care of… in the be­st case). SUN of­fers the DSEE 6.3.1 as a pack­age or as a ZIP-​distribution. I de­cided to down­load the ZIP-​distribution, as this im­plies less stuff in non-​conforming places.

The in­stall­a­tion went OK. Af­ter the ini­tial hurdles of search­ing the SMF mani­fest ref­er­enced in the docs (a com­mand shall in­stall it) but not find­ing them be­cause the ZIP-​distribution does not con­tain this func­tion­al­ity (I see no tech­nic­al reas­on; I in­stalled the mani­fest by hand), I had the new server up, the data im­por­ted, and a work­sta­tion con­figured to use this new server.

The next step was to setup a second server for multi-​master rep­lic­a­tion. The docs for DSEE tell to use the web in­ter­face to con­fig­ure the rep­lic­a­tion (this is pre­ferred over the com­mand line way). I am more a com­mand line guy, but OK, if it is that much re­com­men­ded, I de­cided to give it a try… and the web in­ter­face had to be in­stalled any­way, so that the less com­mand line af­fine people in our team can have a look in case it is needed.

The bad news, it was hard to get the webin­ter­face up and run­ning. In the pack­age dis­tri­bu­tion all this is sup­posed to be very easy, but in the ZIP-​distribution I stumbled over a lot of hurdles. The GUI had to be in­stalled in the java ap­plic­a­tion server by hand in­stead of the more auto­mat­ic way when in­stalled as a pack­age. When fol­low­ing the in­stall­a­tion pro­ced­ure, the ap­plic­a­tion server wants a pass­word to start the web in­ter­face. The pack­age ver­sion al­lows to re­gister it in the sol­ar­is man­age­ment in­ter­face, the ZIP-​distribution does not (dir­ect ac­cess to it works, off course). Adding a server to the dir­ect­ory server web in­ter­face does not work via the web in­ter­face, I had to re­gister it on the com­mand line. Once it is re­gistered, not everything of the LDAP server is ac­cess­ible, e.g. the er­ror mes­sages and sim­il­ar. This may or may not be re­lated to the fact that it is not very clear which programs/​dae­mons/​services have to run, for ex­ample do I need to use the ca­caoadm of the sys­tem, or the one which comes with DSEE? In my tests it looks like they are dif­fer­ent beasts in­de­pend­ent from each oth­er, but I did not try all pos­sible com­bin­a­tions to see if this af­fects the be­ha­vi­or of the web in­ter­face or not.

All the prob­lems may be doc­u­mented in one or two of the DSEE doc­u­ments, but at least in the in­stall­a­tion doc­u­ment there is not enough doc­u­ment­a­tion re­gard­ing all my ques­tions. Seems I have to read a lot more doc­u­ment­a­tion to get the web in­ter­face run­ning… which is a shame, as the man­age­ment in­ter­face which is sup­posed to make the ad­min­is­tra­tion more easy needs more doc­u­ment­a­tion than the pro­duct it is sup­posed to man­age.

Oh, yes, once I had both LDAP servers re­gistered in the web in­ter­face, set­ting up the rep­lic­a­tion was very easy.