Alexander Leidinger

Just another weblog

Jun
04

Under­stand­ing latency

Bren­dan Gregg of Sun Ora­cle fame made a good expla­na­tion how to visu­al­ize latency to get a bet­ter under­stand­ing of what is going on (and as such about how to solve bot­tle­necks). I have seen all this already in var­i­ous posts in his blog and in the Ana­lyt­ics pack­age in an Open­Stor­age pre­sen­ta­tion, but the ACM arti­cle sum­ma­rizes it very good.

Unfor­tu­nately Ana­lyt­ics is AFAIK not avail­able in Open­So­laris, so we can not go out and adapt it for FreeBSD (which would prob­a­bly require to port/implement some addi­tional dtrace stuff/probes). I am sure some­thing like this would be very inter­est­ing to all those com­pa­nies which use FreeBSD in an appli­ance (regard­less if it is a stor­age appli­ance like NetApp, or a net­work appli­ance like a Cisco/Juniper router, or any­thing else which has to per­form good).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark

Dec
10

Progress with Net­worker bugs

Our bug with savep­npc which causes the post-command to start one minute after the pre-command even if the backup is not done yet is now hope­fully near the res­o­lu­tion point. We opened a prob­lem report for this in July, this week we where told that there is a patch for it avail­able. The bad part is, that it is avail­able since 3 weeks and nobody told us. The good part is, that we have it installed on a machine now to see if it helps (all zones there seem to be OK, but we have zones where it some­times works and some­times fails, so we are not 100% sure, but we hope the best). We where told that it will be included in Net­worker 7.5.1.8.

Our other issues are now at least not in a helpdesk-loop any­more, they seem to have reached the devel­op­ers now.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark

Nov
25

Tarsnap usage statistics

The more time passes with tarsnap, the more impres­sive it is.

Fol­low­ing is a list of all my pri­vately used sys­tems (2 machines which only host jails — here named Prison1 and Prison2 — and sev­eral jails — here named accord­ing to their func­tion­al­ity) together with some tarsnap sta­tis­tics. For each backup tarsnap prints out some sta­tis­tics. The amount of uncom­pressed stor­age space of all archives of this machine, the com­pressed stor­age space of all archives, the unique uncom­pressed stor­age space of all archives, the unique com­pressed stor­age space of all archives, and the same mount of info for the cur­rent archive. The unique stor­age space is after dedu­pli­ca­tion. The most inter­est­ing infor­ma­tion is the unique and com­pressed one. For a spe­cific archive it shows the amount of data which is dif­fer­ent to all other archives, and for the total amount it tells how much stor­age space is used on the tarsnap server. I do not backup all data in tarsnap. I do a full backup on exter­nal stor­age (zfs snap­shot + zfs send | zfs receive) once in a while and tarsnap is only for the stuff which could change daily or is very small (my mails belong to the first group, the con­fig of appli­ca­tions or the sys­tem to the sec­ond group). At the end of the post there is also an overview of the money I have spend so far in tarsnap for the backups.

Atten­tion: the fol­low­ing graphs are dis­play­ing small val­ues in KB, while the text is telling about sizes in MB or even GB!

Prison1

The backup of one day cov­ers 1.1 GB of uncom­pressed data, the sub­trees I backup are /etc, /usr/local/etc, /home, /root, /var/db/pkg, /var/db/mergemaster.mtree, /space/jails/flavours and a sub­ver­sion check­out of /usr/src (exclud­ing the ker­nel com­pile direc­tory; I backup this as I have local mod­i­fi­ca­tions to FreeBSD). If I want to have all days uncom­pressed on my hard­disk, I would have to pro­vide 10 GB of stor­age space. Com­pressed this comes down to 2.4 GB, unique uncom­pressed this is 853 MB, and unique com­pressed this is 243 MB. The fol­low­ing graph splits this up into all the back­ups I have as of this writ­ting. I only show the unique val­ues, as includ­ing the total val­ues would make the unique val­ues dis­ap­pear in the graph (val­ues too small).

chart


In this graph we see that I have a con­stant rate of new data. I think this is mostly ref­er­ences to already stored data (/usr/src being the most likely cause of this, noth­ing changed in those directories).

Internal-DNS

One day cov­ers 7 MB of uncom­pressed data, all archives take 56 MB uncom­pressed, unique and com­pressed this comes down to 1.3 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg, /var/named, and /var/db/mergemaster.mtree.

chart


This graph is strange. I have no idea why there is so much data for the sec­ond and the last day. Noth­ing changed.

Outgoing-Postfix

One day cov­ers 8 MB of uncom­pressed data, all archives take 62 MB uncom­pressed, unique and com­pressed this comes down to 1.5 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg, /var/spool/postfix, and /var/db/mergemaster.mtree.

chart


This looks not bad. I was send­ing a lot of mails on the 25th. And the days in the mid­dle I was not send­ing much.

IMAP

One day cov­ers about 900 MB of uncom­pressed data, all archives take 7.2 GB uncom­pressed, unique and com­pressed this comes down to 526 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /home (mail fold­ers) and /usr/local/share/courier-imap.

chart


Obvi­ously I have a not so small amount of change in my mail­box. As my spam­fil­ter is work­ing nicely this is directly cor­re­lated to mails from var­i­ous mail­inglists (mostly FreeBSD).

MySQL (for the Horde web­mail interface)

One day cov­ers 100 MB of uncom­pressed data, all archives take 801 MB uncom­pressed, unique and com­pressed this comes down to 19 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mysql and /var/db/mergemaster.mtree.

chart


This is cor­re­lated with the use of my web­mail inter­face, and as such is also cor­re­lated with the amount of mails I get and send. Obvi­ously I did not use my web­mail inter­face at the week­end (as the backup cov­ers the change of the pre­vi­ous day).

Web­mail

One day cov­ers 121 MB of uncom­pressed data, all archives take 973 MB uncom­pressed, unique and com­pressed this comes down to 33 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /usr/local/www/horde and /home.

chart


This one is strange again. Noth­ing in the data changed.

Samba

One day cov­ers 10 MB of uncom­pressed data, all archives take 72 MB uncom­pressed, unique and com­pressed this comes down to 1.9 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree and /var/db/samba.

chart


Here we see the changes to /var/db/samba, this should be mostly my Wii access­ing mul­ti­me­dia files there.

Proxy

One day cov­ers 31 MB of uncom­pressed data, all archives take 223 MB uncom­pressed, unique and com­pressed this comes down to 6.6 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg and /var/db/mergemaster.mtree.

chart


This is also a strange graph. Again, noth­ing changed there (the cache direc­tory is not in the backup).

php­MyAd­min

One day cov­ers 44 MB of uncom­pressed data, all archives take 310 uncom­pressed, unique and com­pressed this comes down to 11 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /home and /usr/local/www/phpMyAdmin.

chart


And again a strange graph. No changes in the FS.

Gallery

One day cov­ers 120 MB of uncom­pressed data, all archives take 845 MB uncom­pressed, unique and com­pressed this comes down to 25 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /usr/local/www/gallery2 and /home/gallery (exclud­ing some parts of /home/gallery).

chart


This one is OK. Friends and Fam­ily access­ing the pictures.

Prison2

One day cov­ers 7 MB of uncom­pressed data, all archives take 28 MB uncom­pressed, unique and com­pressed this comes down to 1.3 MB. This cov­ers /etc, /usr/local/etc, /root, /var/db/pkg, /var/db/mergemaster.mtree, /space/jails/flavours and /home.

chart


This one looks strange to me again. Same rea­sons as with the pre­vi­ous graphs.

Incoming-Postfix

One day cov­ers 56 MB of uncom­pressed data, all archives take 225 MB uncom­pressed, unique and com­pressed this  comes down to 5.4 MB. This cov­ers /etc, /usr/local/etc, /usr/local/www/postfixadmin, /root/, /var/db/pkg, /var/db/mysql, /var/spool/postfix and /var/db/mergemaster.mtree.

chart


This graph looks OK to me.

Blog-and-XMPP

One day cov­ers 59 MB of uncom­pressed data, all archives take 478 MB uncom­pressed, unique and com­pressed this comes down to 14 MB. This cov­ers /etc, /usr/local/etc, /root, /home, /var/db/pkg, /var/db/mergemaster.mtree, /var/db/mysql and /var/spool/ejabberd (yes, no backup of the web-data, I have it in another jail, no need to backup it again).

chart


With the MySQL and XMPP data­bases in the backup, I do not think this graph is wrong.

Totals

The total amount of stored data per sys­tem is:

chart


Costs

Since I use tarsnap (8 days), I have spend 38 cents, most of this is band­width cost for the trans­fer of the ini­tial backup (29.21 cents). Accord­ing to the graphs, I am cur­rently at about 8–14 cents per week (or about half a dol­lar per month) for my back­ups (I still have a machine to add, and this may increase the amount in a sim­i­lar way than the Prison1 sys­tem with 2–3 jails). The amount of money spend in US-cents (rounded!) per day is:

chart


GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark

Nov
24

ZFS & power-failure: stable

At the week­end there was a power-failure at our disaster-recovery-site. As every­thing should be con­nected to the UPS, this should not have had an impact… unfor­tu­nately the guys respon­si­ble for the cabling seem to have not pro­vided enough power con­nec­tions from the UPS. Result: one of our stor­age sys­tems (all vol­umes in sev­eral RAID5 vir­tual disks) for the test sys­tems lost power, 10 hard­disks switched into failed state when the power was sta­ble again (I was told there where sev­eral small power-failures that day). After telling the soft­ware to have a look at the dri­ves again, all phys­i­cal disks where accepted.

All vol­umes on one of the vir­tual disks where dam­aged (actu­ally, one of the vir­tual disks was dam­aged) beyond repair and we had to recover from backup.

All ZFS based mount­points on the good vir­tual disks did not show bad behav­ior (zfs clear + zfs scrub for those which showed check­sum errors to make us feel bet­ter). For the UFS based ones… some caused a panic after reboot and we had to run fsck on them before try­ing a sec­ond boot.

We spend a lot more time to get UFS back online, than get­ting ZFS back online. After this expe­ri­ence it looks like our future Solaris 10u8 installs will be with root on ZFS (our work­sta­tions are already like this, but our servers are still at Solaris 10u6).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark

Nov
24

EMC^2/Legato Net­worker 7.5.1.6 status

We updated Net­worker 7.5.1.4 to 7.5.1.6 as the Networker-Support thought it will fix at least one of our prob­lems (“ghost” vol­umes in the DB). Unfor­tu­nately the update does not fix any bug we see in our environment.

Spe­cially for the “post-command runs 1 minute after pre-command even if the backup is not finished”-bug this is not sat­is­fy­ing: no con­sis­tent DB backup where the appli­ca­tion has to be stopped together with the DB to get a con­sis­tent snap­shot (FS+DB in sync).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark