Tarsnap us­age stat­ist­ics

The more time passes with tarsnap, the more impressive it is.

Following is a list of all my privately used systems (2 machines which only host jails -- here named Prison1 and Prison2 -- and several jails -- here named according to their functionality) together with some tarsnap statistics. For each backup tarsnap prints out some statistics. The amount of uncompressed storage space of all archives of this machine, the compressed storage space of all archives, the unique uncompressed storage space of all archives, the unique compressed storage space of all archives, and the same mount of info for the current archive. The unique storage space is after deduplication. The most interesting information is the unique and compressed one. For a specific archive it shows the amount of data which is different to all other archives, and for the total amount it tells how much storage space is used on the tarsnap server. I do not backup all data in tarsnap. I do a full backup on external storage (zfs snapshot + zfs send | zfs receive) once in a while and tarsnap is only for the stuff which could change daily or is very small (my mails belong to the first group, the config of applications or the system to the second group). At the end of the post there is also an overview of the money I have spend so far in tarsnap for the backups.

Attention: the following graphs are displaying small values in KB, while the text is telling about sizes in MB or even GB!


The backup of one day covers 1.1 GB of uncompressed data, the subtrees I backup are /etc, /usr/local/etc, /home, /root, /var/db/pkg, /var/db/mergemaster.mtree, /space/jails/flavours and a subversion checkout of /usr/src (excluding the kernel compile directory; I backup this as I have local modifications to FreeBSD). If I want to have all days uncompressed on my harddisk, I would have to provide 10 GB of storage space. Compressed this comes down to 2.4 GB, unique uncompressed this is 853 MB, and unique compressed this is 243 MB. The following graph splits this up into all the backups I have as of this writting. I only show the unique values, as including the total values would make the unique values disappear in the graph (values too small).


In this graph we see that I have a constant rate of new data. I think this is mostly references to already stored data (/usr/src being the most likely cause of this, nothing changed in those directories).


One day covers 7 MB of uncompressed data, all archives take 56 MB uncompressed, unique and compressed this comes down to 1.3 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/named, and /var/db/mergemaster.mtree.


This graph is strange. I have no idea why there is so much data for the second and the last day. Nothing changed.


One day covers 8 MB of uncompressed data, all archives take 62 MB uncompressed, unique and compressed this comes down to 1.5 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/spool/postfix, and /var/db/mergemaster.mtree.


This looks not bad. I was sending a lot of mails on the 25th. And the days in the middle I was not sending much.


One day covers about 900 MB of uncompressed data, all archives take 7.2 GB uncompressed, unique and compressed this comes down to 526 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /home (mail folders) and /usr/local/share/courier-imap.


Obviously I have a not so small amount of change in my mailbox. As my spamfilter is working nicely this is directly correlated to mails from various mailinglists (mostly FreeBSD).

MySQL (for the Horde webmail interface)

One day covers 100 MB of uncompressed data, all archives take 801 MB uncompressed, unique and compressed this comes down to 19 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mysql and /var/db/mergemaster.mtree.


This is correlated with the use of my webmail interface, and as such is also correlated with the amount of mails I get and send. Obviously I did not use my webmail interface at the weekend (as the backup covers the change of the previous day).


One day covers 121 MB of uncompressed data, all archives take 973 MB uncompressed, unique and compressed this comes down to 33 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /usr/local/www/horde and /home.


This one is strange again. Nothing in the data changed.


One day covers 10 MB of uncompressed data, all archives take 72 MB uncompressed, unique and compressed this comes down to 1.9 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree and /var/db/samba.


Here we see the changes to /var/db/samba, this should be mostly my Wii accessing multimedia files there.


One day covers 31 MB of uncompressed data, all archives take 223 MB uncompressed, unique and compressed this comes down to 6.6 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg and /var/db/mergemaster.mtree.


This is also a strange graph. Again, nothing changed there (the cache directory is not in the backup).


One day covers 44 MB of uncompressed data, all archives take 310 uncompressed, unique and compressed this comes down to 11 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /home and /usr/local/www/phpMyAdmin.


And again a strange graph. No changes in the FS.


One day covers 120 MB of uncompressed data, all archives take 845 MB uncompressed, unique and compressed this comes down to 25 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /usr/local/www/gallery2 and /home/gallery (excluding some parts of /home/gallery).


This one is OK. Friends and Family accessing the pictures.


One day covers 7 MB of uncompressed data, all archives take 28 MB uncompressed, unique and compressed this comes down to 1.3 MB. This covers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /space/jails/flavours and /home.


This one looks strange to me again. Same reasons as with the previous graphs.


One day covers 56 MB of uncompressed data, all archives take 225 MB uncompressed, unique and compressed this  comes down to 5.4 MB. This covers /etc, /usr/local/etc, /usr/local/www/postfixadmin, /root/, /var/db/pkg, /var/db/mysql, /var/spool/postfix and /var/db/mergemaster.mtree.


This graph looks OK to me.


One day covers 59 MB of uncompressed data, all archives take 478 MB uncompressed, unique and compressed this comes down to 14 MB. This covers /etc, /usr/local/etc, /root, /home, /var/db/pkg, /var/db/mergemaster.mtree, /var/db/mysql and /var/spool/ejabberd (yes, no backup of the web-data, I have it in another jail, no need to backup it again).


With the MySQL and XMPP databases in the backup, I do not think this graph is wrong.


The total amount of stored data per system is:



Since I use tarsnap (8 days), I have spend 38 cents, most of this is bandwidth cost for the transfer of the initial backup (29.21 cents). According to the graphs, I am currently at about 8-14 cents per week (or about half a dollar per month) for my backups (I still have a machine to add, and this may increase the amount in a similar way than the Prison1 system with 2-3 jails). The amount of money spend in US-cents (rounded!) per day is:


ZFS & power-​failure: stable

At the week­end there was a power–fail­ure at our disaster-​recovery-​site. As everything should be con­nec­ted to the UPS, this should not have had an im­pact… un­for­tu­nately the guys re­spons­ible for the cabling seem to have not provided enough power con­nec­tions from the UPS. Res­ult: one of our stor­age sys­tems (all volumes in sev­er­al RAID5 vir­tu­al disks) for the test sys­tems lost power, 10 hard­disks switched in­to failed state when the power was stable again (I was told there where sev­er­al small power-​failures that day). After telling the soft­ware to have a look at the drives again, all phys­ic­al disks where ac­cep­ted.

All volumes on one of the vir­tu­al disks where dam­aged (ac­tu­ally, one of the vir­tu­al disks was dam­aged) bey­ond re­pair and we had to re­cov­er from backup.

All ZFS based moun­t­points on the good vir­tu­al disks did not show bad be­ha­vi­or (zfs clear + zfs scrub for those which showed check­sum er­rors to make us feel bet­ter). For the UFS based ones… some caused a pan­ic after re­boot and we had to run fsck on them be­fore try­ing a second boot.

We spend a lot more time to get UFS back on­line, than get­ting ZFS back on­line. After this ex­per­i­ence it looks like our fu­ture Sol­ar­is 10u8 in­stalls will be with root on ZFS (our work­sta­tions are already like this, but our serv­ers are still at Sol­ar­is 10u6).

EMC^2/Legato Net­work­er status

We up­dated Net­work­er to as the Networker-​Support thought it will fix at least one of our prob­lems (“ghost” volumes in the DB). Un­for­tu­nately the up­date does not fix any bug we see in our en­vir­on­ment.

Spe­cially for the “post-​command runs 1 minute after pre-​command even if the backup is not finished”-bug this is not sat­is­fy­ing: no con­sist­ent DB backup where the ap­plic­a­tion has to be stopped to­geth­er with the DB to get a con­sist­ent snap­shot (FS+DB in sync).

SUN Open­Stor­age present­a­tion

At work (cli­ent site) SUN made a present­a­tion about their Open­Stor­age products (Sun Stor­age 7000 Uni­fied Stor­age Sys­tems) today.

From a tech­no­logy point of view, the soft­ware side is noth­ing new to me. Us­ing SSDs for zfs as a read-​/​write-​cache is some­thing we can do (partly) already since at least Sol­ar­is 10u6 (that is the low­est Sol­ar­is 10 ver­sion we have in­stalled here, so I can not check quickly if the ZIL can be on a sep­ar­ate disk in pre­vi­ous ver­sions of Sol­ar­is, but I think we have to wait un­til we up­dated to Sol­ar­is 10u8 un­til we can have the L2ARC on a sep­ar­ate disk) or in FreeBSD. All oth­er nice ZFS fea­tures avail­able in the Open­Stor­age web in­ter­face are also not sur­pris­ing.

But the demon­stra­tion with the Stor­age Sim­u­lat­or im­pressed me. The in­ter­ac­tion with Win­dows via CIFS makes the older ver­sion of files in snap­shots avail­able in Win­dows (I as­sume this is the Volume Shad­ow Copy fea­ture of Win­dows), and the stat­ist­ics avail­able via DTrace in the web in­ter­face are also im­press­ive. All this tech­no­logy seems to be well in­teg­rated in­to an easy to use pack­age for het­ero­gen­eous en­vir­on­ments. If you would like to setup some­thing like this by hand, you would need to have a lot of know­ledge about a lot of stuff (and in the FreeBSD case, you would prob­ably need to aug­ment the ker­nel with ad­di­tion­al DTrace probes to be able to get a sim­il­ar gran­u­lar­ity of the stat­ist­ics), noth­ing a small com­pany is will­ing to pay.

I know that I can get a lot of in­form­a­tion with DTrace (from time to time I have some free cycles to ex­tend the FreeBSD DTrace im­ple­ment­a­tion with ad­di­tion­al DTrace probes for the linuxu­lat­or), but what they did with DTrace in the Open­Stor­age soft­ware is great. If you try to do this at home your­self, you need some time to im­ple­ment some­thing like this (I do not think you can take the DTrace scripts and run them on FreeBSD, this will prob­ably take some weeks un­til it works).

It is also the first time I see this new CIFS im­ple­ment­a­tion from SUN in ZFS life in ac­tion. It looks well done. In­teg­ra­tion with AD looks more easy than do­ing it by hand in Samba (at least from look­ing at the Open­Stor­age web in­ter­face). If we could get this in FreeBSD… it would rock!

The en­tire Open­Stor­age web in­ter­face looks us­able. I think SUN has a product there which al­lows them to enter new mar­kets. A product which they can sell to com­pan­ies which did not buy some­thing from SUN be­fore (even Windows-​only com­pan­ies). I think even those Win­dows ad­mins which nev­er touch a com­mand line in­ter­face (read: the low-​level ones; not com­par­able at all with the really high-​profile Win­dows ad­mins of our cli­ent) could be able to get this up and run­ning.

As it seems at the mo­ment, our cli­ent will get a Sun Stor­age F5100 Flash Ar­ray for tech­no­logy eval­u­ation in the be­gin­ning of next year. Un­for­tu­nately the tech­no­logy looks to easy to handle, so I as­sume I have to take care about more com­plex things when this ma­chine ar­rives… 🙁

Fight­ing with the SUN LDAP serv­er

At work we de­cided to up­date our LDAP in­fra­struc­ture. From SUN Dir­ect­ory Serv­er 5.2 to 6.3(.1). The per­son do­ing this is: me.

We have some re­quire­ments for the ap­plic­a­tions we in­stall, we want them in spe­cif­ic loc­a­tions so that we are able to move them between serv­ers more eas­ily (no need to search all stuff in the en­tire sys­tem, just the gen­er­ic loc­a­tion and some stuff in /​etc needs to be taken care of… in the best case). SUN of­fers the DSEE 6.3.1 as a pack­age or as a ZIP-​distribution. I de­cided to down­load the ZIP-​distribution, as this im­plies less stuff in non-​conforming places.

The in­stall­a­tion went OK. After the ini­tial hurdles of search­ing the SMF mani­fest ref­er­enced in the docs (a com­mand shall in­stall it) but not find­ing them be­cause the ZIP-​distribution does not con­tain this func­tion­al­ity (I see no tech­nic­al reas­on; I in­stalled the mani­fest by hand), I had the new serv­er up, the data im­por­ted, and a work­sta­tion con­figured to use this new serv­er.

The next step was to setup a second serv­er for multi-​master rep­lic­a­tion. The docs for DSEE tell to use the web in­ter­face to con­fig­ure the rep­lic­a­tion (this is pre­ferred over the com­mand line way). I am more a com­mand line guy, but OK, if it is that much re­com­men­ded, I de­cided to give it a try… and the web in­ter­face had to be in­stalled any­way, so that the less com­mand line af­fine people in our team can have a look in case it is needed.

The bad news, it was hard to get the webin­ter­face up and run­ning. In the pack­age dis­tri­bu­tion all this is sup­posed to be very easy, but in the ZIP-​distribution I stumbled over a lot of hurdles. The GUI had to be in­stalled in the java ap­plic­a­tion serv­er by hand in­stead of the more auto­mat­ic way when in­stalled as a pack­age. When fol­low­ing the in­stall­a­tion pro­ced­ure, the ap­plic­a­tion serv­er wants a pass­word to start the web in­ter­face. The pack­age ver­sion al­lows to re­gister it in the sol­ar­is man­age­ment in­ter­face, the ZIP-​distribution does not (dir­ect ac­cess to it works, off course). Adding a serv­er to the dir­ect­ory serv­er web in­ter­face does not work via the web in­ter­face, I had to re­gister it on the com­mand line. Once it is re­gistered, not everything of the LDAP serv­er is ac­cess­ible, e.g. the er­ror mes­sages and sim­il­ar. This may or may not be re­lated to the fact that it is not very clear which programs/​dae­mons/​services have to run, for ex­ample do I need to use the ca­caoadm of the sys­tem, or the one which comes with DSEE? In my tests it looks like they are dif­fer­ent beasts in­de­pend­ent from each oth­er, but I did not try all pos­sible com­bin­a­tions to see if this af­fects the be­ha­vi­or of the web in­ter­face or not.

All the prob­lems may be doc­u­mented in one or two of the DSEE doc­u­ments, but at least in the in­stall­a­tion doc­u­ment there is not enough doc­u­ment­a­tion re­gard­ing all my ques­tions. Seems I have to read a lot more doc­u­ment­a­tion to get the web in­ter­face run­ning… which is a shame, as the man­age­ment in­ter­face which is sup­posed to make the ad­min­is­tra­tion more easy needs more doc­u­ment­a­tion than the product it is sup­posed to man­age.

Oh, yes, once I had both LDAP serv­ers re­gistered in the web in­ter­face, set­ting up the rep­lic­a­tion was very easy.

Test­ing tarsnap

I am im­pressed. Yes, really. It seems tarsnap DTRT.

I made a test with tarsnap. I made a backup of some data (a full backup of everything is kept on a ZFS volume on an ex­tern­al disk which is only at­tached to make a full backup once in a while) of one of my sys­tems. This data is 1.1 GB in size (most of it is /​usr/​src checked out via sub­ver­sion and ex­ten­ded with some patches – no mu­sic, pic­tures or oth­er such data). This com­presses down to 325 MB. Of this 325 MB of data only 242 MB is stored en­cryp­ted on the tarsnap serv­er (auto­mat­ic de-​duplication on the backup cli­ent). The second backup of the same data in the fol­low­ing night (again 1.1 GB in total, 325 MB com­pressed) caused 216 kB of new data to be stored on the tarsnap serv­er (again, de-​duplication on the cli­ent). What I have now are two full off-​site backups of this data (two archives with 1.1 GB of data after de­com­pres­sion), with the be­ne­fit that the ad­di­tion­al stor­age space re­quired is the stor­age space of an in­cre­ment­al backup.

The cost (in dol­lar) of this so far is 0.074603634 for the ini­tial trans­fer of the data, 0.00242067783 for the data stor­age on the first day, plus 0.0019572486 for the trans­fer for the second backup. From the ini­tial 29.93 I still have 29.85 (roun­ded) left. If I factor out the ini­tial trans­fer and as­sum­ing that the rate of change for this sys­tem stays con­stant, this comes down to 0.01 (rounded-​up) per day for this sys­tem (or about 8 years of backup if I do not add more sys­tems and do not add more than the ini­tial 29.93 (= EUR 20) – and the price of this ser­vice does not in­crease, off course). Is this data worth 1 cent per day for me? Yes, for sure! Even more, but hey, you did not read this here. 🙂

That is what you get when a per­son designs a ser­vice which he is will­ing to use him­self for a price he wants to pay him­self (while still not lose money with the ser­vice).

Colin, again, I am im­pressed. Big thumbs up for you!

Now I just have to add some more sys­tems (/​etc and sim­il­ar of vari­ous jails… and of course the data of this blog) and pol­ish my tarsnap script for “peri­od­ic daily” .

P.S.: Yes there are also places to im­prove, I found already some things (the con­fig file pars­er is a little bit strict what it ac­cepts, and some things should be more doc­u­mented) but Colin is re­spons­ive and open to im­prove­ment sug­ges­tions.