Alexander Leidinger

Just another weblog

Apr
14

ARC (adap­tive replace­ment cache) explained

At work we have the sit­u­a­tion of a slow appli­ca­tion. The ven­dor of the cus­tom appli­ca­tion insists that the ZFS (Solaris 10u8) and the Ora­cle DB are badly tuned for the appli­ca­tion. Part of their tun­ing is to limit the ARC to 1 GB (our max size is 24 GB on this machine). One prob­lem we see is that there are many write oper­a­tions (rounded val­ues: 1k ops for up to 100 MB) and the DB is com­plain­ing that the log­writer is not able to write out the data fast enough. At the same time our data­base admins see a lot of com­mits and/or roll­backs so that the archive log grows very fast to 1.5 GB. The funny thing is… the per­for­mance tests are sup­posed to only cover SELECTs and small UPDATEs.

I pro­posed to reduce the zfs_txg_timeout from the default value of 30 to some sec­onds (and as no reboot is needed like for the max arc size, this can be done fast instead of wait­ing some min­utes for the boot-checks of the M5000). The first try was to reduce it to 5 sec­onds and it improved the sit­u­a­tion. The DB still com­plained about not being able to write out the logs fast enough, but it did not do it as often as before. To make the ven­dor happy we reduced the max arc size and tested again. First we have not seen any com­plains from the DB any­more, which looked strange to me because my under­stand­ing of the ARC (and the descrip­tion of the ZFS Evil Tun­ing Guide regard­ing the max size set­ting) sug­gest that this should not show this behav­ior we have seen, but the machine was also rebooted for this, so there could also be another explanation.

Luck­ily we found out that our test­ing infra­struc­ture had a prob­lem so that only a frac­tion of the per­for­mance test was per­formed. This morn­ing the peo­ple respon­si­ble for that made some changes and now the DB is com­plain­ing again.

This is what I expected. To make sure I fully under­stand the ARC, I had a look at the the­ory behind it at the IBM research cen­ter. There are some papers which explain how to extend a cache which uses the LRU replace­ment pol­icy with some lines of code to an ARC. It looks like it would be an improve­ment to have a look at which places in FreeBSDLRU pol­icy is used to test if an ARC would improve the cache hit rate. From read­ing the paper it looks like there are a lot of places where this should be the case. The authors also pro­vide two adap­tive exten­sions to the CLOCK algo­rithm (used in var­i­ous OS in the VM sub­sys­tem) which indi­cate that such an approach could be ben­e­fi­cial for a VM sys­tem. I already con­tacted Alan (the FreeBSD one) and asked if he knows about it and if it could be ben­e­fi­cial for FreeBSD.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark

Apr
02

WITH_CTF is really usable now

I just com­mit­ted a patch which makes WITH_CTF usable now.

Yes, you could use it before, but you had to remem­ber to spec­ify it at each build. Now you can add it to your ker­nel con­fig (via makeop­tions), and then you can for­get about it.

Thanks to jhb and imp for review and suggestions.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark

Jan
13

Sta­bil­ity prob­lems solved (hard­ware problem)

After putting the disks of the 7-stable sys­tem which exhib­ited sta­bil­ity prob­lems into a com­pletely dif­fer­ent sys­tem (it is a rented root-server, not our own hard­ware), the sys­tem now sur­vived more than a day (and still no trace of prob­lems) with the UFS setup. Pre­vi­ously it would crash after some minutes.

The ZFS setup with the changed hard­ware had a prob­lem dur­ing the night before (like always after all my ZFS related changes on this machine), but on this machine I changed all locks in ZFS from shared locks to exclu­sive locks (this extended the uptime from 4–6 hours to “until I rebooted the morn­ing after because of hang­ing processes”), so this may be because of this. I do not know yet if we will test the ZFS setup with the pure 7-stable source we use now or not (the goal was to get back a sta­ble sys­tem, instead of play­ing around with unre­lated stuff).

It looks like some kind of hard­ware prob­lem was uncov­ered by updat­ing from 7.1 to 7.2 (and 7-stable sub­se­quently). This new machine has a com­pletely dif­fer­ent chipset, a new CPU and RAM and PSU and … so I do not really know what caused this (but the fact that the pre­vi­ous sys­tem did not rec­og­nize the CPU after replac­ing it with a big­ger one and the obser­va­tion that only shared locks with a spe­cific usage pat­tern where affected lets me point towards miss­ing microc­ode updates…).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark

Jan
07

I merged a lot of ZFS patches to 7-stable

Dur­ing the last weeks I iden­ti­fied 64 patches for ZFS which are in 8-stable but not in 7-stable. For 56 of them I had a deeper look and most of them are com­mited now to 7-stable. The ones of those 56 which I did not com­mit are not applic­a­ble to 7-stable (infra­struc­ture dif­fer­ences between 8 and 7).

Unfor­tu­nately this did not solve the sta­bil­ity prob­lems I have on a 7-stable system.

I also com­mit­ted a diff reduc­tion (between 8-stable and 7-stable) patch which also fixed some not so harm­less mis­merges (mem-leak and ini­tial­iz­ing the same mutex twice at dif­fer­ent places). No idea yet if it helps in my case.

I also want to merge the new arc reclaim logic from head to 8-stable and 7-stable. Maybe I can do this tomorrow.

Cur­rently I run a test with a ker­nel where the shared locks for ZFS are switched to exclu­sive locks.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark

Dec
30

Some fixes for ZFS on 7-stable (more testers wanted)

Due to the prob­lems with a 7-stable machine, I had a look at some unmerged fixes for ZFS (58 changes not merged).

I back­ported some of those changes from 8-stable to 7-stable, I have this run­ning on one 7-stable machine. I would like to get some more feed­back for it (even an “it works for me” would be great). The main part of this change is that the FreeBSD taskqueue is used now instead of the open­so­laris one (and some other changes which may improve the ZFS experience).

It would also be nice if some­one could have a look at the FIRST_THREAD_IN_PROC part. Can there be more than one thread at this place (I do not think so) and I should use FOREACH_THREAD_IN_PROC_instead?

How to apply:

  • cd /usr/src/
  • fetch http://www.Leidinger.net/FreeBSD/test/releng7_zfs_merge3.diff
  • fetch http://www.Leidinger.net/FreeBSD/test/opensolaris_taskq.c
  • fetch http://www.Leidinger.net/FreeBSD/test/taskq.h
  • mv taskq.h sys/cddl/contrib/opensolaris/uts/common/sys/taskq.h
  • mv opensolaris_taskq.c sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c
  • patch –p 0 –quiet <releng7_zfs_merge3.diff
  • ignore the 2 .rej files
  • rm –f sys/cddl/compat/opensolaris/sys/taskq_impl.h*
  • rm –f sys/cddl/compat/opensolaris/sys/taskq.h*
  • rm –f sys/cddl/contrib/opensolaris/uts/common/os/taskq.c*
  • rebuild ker­nel

I do not list all of those 16 of 58 out­stand­ing patches which are cov­ered here, a detailed list can be found on the sta­ble and fs mailinglists.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
  • Share/Bookmark