@Work | Alexander Leidinger

ARC (adaptive replacement cache) explained

At work we have the situation of a slow application. The vendor of the custom application insists that the ZFS (Solaris 10u8) and the Oracle DB are badly tuned for the application. Part of their tuning is to limit the ARC to 1 GB (our max size is 24 GB on this machine). One problem we see is that there are many write operations (rounded values: 1k ops for up to 100 MB) and the DB is complaining that the logwriter is not able to write out the data fast enough. At the same time our database admins see a lot of commits and/or rollbacks so that the archive log grows very fast to 1.5 GB. The funny thing is… the performance tests are supposed to only cover SELECTs and small UPDATEs.

I proposed to reduce the zfs_txg_timeout from the default value of 30 to some seconds (and as no reboot is needed like for the max arc size, this can be done fast instead of waiting some minutes for the boot-checks of the M5000). The first try was to reduce it to 5 seconds and it improved the situation. The DB still complained about not being able to write out the logs fast enough, but it did not do it as often as before. To make the vendor happy we reduced the max arc size and tested again. First we have not seen any complains from the DB anymore, which looked strange to me because my understanding of the ARC (and the description of the ZFS Evil Tuning Guide regarding the max size setting) suggest that this should not show this behavior we have seen, but the machine was also rebooted for this, so there could also be another explanation.

Luckily we found out that our testing infrastructure had a problem so that only a fraction of the performance test was performed. This morning the people responsible for that made some changes and now the DB is complaining again.

This is what I expected. To make sure I fully understand the ARC, I had a look at the theory behind it at the IBM research center (update: PDF link). There are some papers which explain how to extend a cache which uses the LRU replacement policy with some lines of code to an ARC. It looks like it would be an improvement to have a look at which places in FreeBSD a LRU policy is used to test if an ARC would improve the cache hit rate. From reading the paper it looks like there are a lot of places where this should be the case. The authors also provide two adaptive extensions to the CLOCK algorithm (used in various OS in the VM subsystem) which indicate that such an approach could be beneficial for a VM system. I already contacted Alan (the FreeBSD one) and asked if he knows about it and if it could be beneficial for FreeBSD.

Share/Save

Showing off some numbers…

At work we have some performance problems.

One application (not off-the-shelf software) is not performing good. The problem is that the design of the application is far from good (auto-commit is used, and the Oracle DB is doing too much writes for what the application is supposed to do because of this). During helping our DBAs in their performance analysis (the vendor of the application is telling our hardware is not fast enough and I had to provide some numbers to show that this is not the case and they need to improve the software as it does not comply to the performance requirements they got before developing the application) I noticed that the filesystem where the DB and the application are located (a ZFS if someone is interested) is doing sometimes 1.200 IO (write) operations per second (to write about 100 MB). Yeah, that is a lot of IOops our SAN is able to do! Unfortunately too expensive to buy for use at home. 🙁

Another application (nagios 3.0) was generating a lot of major faults (caused by a lot of fork()s for the checks). It is a SunFire V890, and the highest number of MF per second I have seen on this machine was about 27.000. It never went below 10.000. On average maybe somewhere between 15.000 and 20.000. My Solaris-Desktop (an Ultra 20) is generating maybe several hundred MF if a lot is going on (most of the time is does not generate much). Nobody can say the V890 is not used… 🙂 Oh, yes, I suggested to enable the nagios config setting for large sites, now the major faults are around 0−10.000 and the machine is not that stressed anymore. The next step is probably to have a look at the ancient probes (migrated from the big brother setup which was there several years before) and reduce the number of forks they do.

Share/Save

Firefox 3.6, finally delivering a sane proxy handling

At work we have to use a proxy which requires authorization. With previous versions (firefox 3.0.x and 3.5.y for each valid x and y) I had the problem that each tab requested to enter the master password when starting firefox, to be able to fill in the proxy-auth data (shortcut: fill in only the first request, and for all others just hit return/OK). So for each tab I had to do something for the master-password, and after that for each tab I also had to confirm the proxy-auth stuff.

Very annoying! Oh, I should maybe mention that as of this writing I have 31 tabs open. Sometimes there are more, sometimes there are less.

Now with firefox 3.6 this is not the case anymore. Yeah! Great! Finally only one time the master password stuff, and then one time the proxy-auth stuff, and then all tabs proceed.

It took a long time since my first report about this, but now it is finally there. This is the best improvement in 3.6 for me.

Share/Save

Progress with Networker bugs

Our bug with savepnpc which causes the post-command to start one minute after the pre-command even if the backup is not done yet is now hopefully near the resolution point. We opened a problem report for this in July, this week we where told that there is a patch for it available. The bad part is, that it is available since 3 weeks and nobody told us. The good part is, that we have it installed on a machine now to see if it helps (all zones there seem to be OK, but we have zones where it sometimes works and sometimes fails, so we are not 100% sure, but we hope the best). We where told that it will be included in Networker 7.5.1.8.

Our other issues are now at least not in a helpdesk-loop anymore, they seem to have reached the developers now.

Share/Save

ZFS & power-failure: stable

At the weekend there was a power-failure at our disaster-recovery-site. As everything should be connected to the UPS, this should not have had an impact… unfortunately the guys responsible for the cabling seem to have not provided enough power connections from the UPS. Result: one of our storage systems (all volumes in several RAID5 virtual disks) for the test systems lost power, 10 harddisks switched into failed state when the power was stable again (I was told there where several small power-failures that day). After telling the software to have a look at the drives again, all physical disks where accepted.

All volumes on one of the virtual disks where damaged (actually, one of the virtual disks was damaged) beyond repair and we had to recover from backup.

All ZFS based mountpoints on the good virtual disks did not show bad behavior (zfs clear + zfs scrub for those which showed checksum errors to make us feel better). For the UFS based ones… some caused a panic after reboot and we had to run fsck on them before trying a second boot.

We spend a lot more time to get UFS back online, than getting ZFS back online. After this experience it looks like our future Solaris 10u8 installs will be with root on ZFS (our workstations are already like this, but our servers are still at Solaris 10u6).

Share/Save

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31