ARC (ad­apt­ive re­place­ment cache) ex­plained

At work we have the situ­ation of a slow ap­plic­a­tion. The vendor of the cus­tom ap­plic­a­tion in­sists that the ZFS (Sol­ar­is 10u8) and the Or­acle DB are badly tuned for the ap­plic­a­tion. Part of their tun­ing is to lim­it the ARC to 1 GB (our max size is 24 GB on this ma­chine). One prob­lem we see is that there are many write op­er­a­tions (roun­ded val­ues: 1k ops for up to 100 MB) and the DB is com­plain­ing that the log­writer is not able to write out the data fast enough. At the same time our data­base ad­mins see a lot of com­mits and/​or roll­backs so that the archive log grows very fast to 1.5 GB. The funny thing is… the per­form­ance tests are sup­posed to only cov­er SE­LECTs and small UP­DATEs.

I pro­posed to re­duce the zfs_​txg_​timeout from the de­fault value of 30 to some seconds (and as no re­boot is needed like for the max arc size, this can be done fast in­stead of wait­ing some minutes for the boot-​checks of the M5000). The first try was to re­duce it to 5 seconds and it im­proved the situ­ation. The DB still com­plained about not be­ing able to write out the logs fast enough, but it did not do it as of­ten as be­fore. To make the vendor happy we re­duced the max arc size and tested again. First we have not seen any com­plains from the DB any­more, which looked strange to me be­cause my un­der­stand­ing of the ARC (and the de­scrip­tion of the ZFS Evil Tun­ing Guide re­gard­ing the max size set­ting) sug­gest that this should not show this be­ha­vi­or we have seen, but the ma­chine was also re­booted for this, so there could also be an­oth­er ex­plan­a­tion.

Luck­ily we found out that our test­ing in­fra­struc­ture had a prob­lem so that only a frac­tion of the per­form­ance test was per­formed. This morn­ing the people re­spons­ible for that made some changes and now the DB is com­plain­ing again.

This is what I ex­pec­ted. To make sure I fully un­der­stand the ARC, I had a look at the the­ory be­hind it at the IBM re­search cen­ter (up­date: PDF link). There are some pa­pers which ex­plain how to ex­tend a cache which uses the LRU re­place­ment policy with some lines of code to an ARC. It looks like it would be an im­prove­ment to have a look at which places in FreeBSD a LRU policy is used to test if an ARC would im­prove the cache hit rate. From read­ing the pa­per it looks like there are a lot of places where this should be the case. The au­thors also provide two ad­apt­ive ex­ten­sions to the CLOCK al­gorithm (used in vari­ous OS in the VM sub­sys­tem) which in­dic­ate that such an ap­proach could be be­ne­fi­cial for a VM sys­tem. I already con­tac­ted Alan (the FreeBSD one) and asked if he knows about it and if it could be be­ne­fi­cial for FreeBSD.

4 thoughts on “ARC (ad­apt­ive re­place­ment cache) ex­plained”

  1. Pingback: ARC comme Adaptative Replacement Cache at FreeBSD-fr: Les nouvelles du géant en français
  2. ARC is pat­en­ted by IBM res­ult­ing in the open-​source com­munity be­ing re­luct­ant to ad­opt it. Post­gr­eSQL has a thread on their con­cerns. The LIRS policy was ad­op­ted by MySql and seems to be a strong con­tender without leg­al is­sues. I am in the midst of im­ple­ment­ing it in my Con­cur­rent­Linked­HashMap pro­ject (Java) and find it com­plic­ated but quite ef­fi­cient. Both can be im­ple­men­ted cheaply with great res­ults, but are a more com­plex than a simple LRU policy.

    1. Does someone know if the ARC is part of the pat­ent port­fo­lio for which IBM al­lows the use in Open­Source code? If yes, this would al­low to of­fer it as a compile-​time al­tern­at­ive.

      I have read about the con­cerns of the Post­gr­eSQL people (doc­u­ment from 2005), but have not seen a ref­er­ence to an ex­plan­a­tion of their al­tern­at­ive. Googling for “LIRS policy” gives a link to a pa­per from 2002 in the ACM portal. Just by read­ing the ab­stract, it seems to be an al­tern­atie to LRU(-K), but not to the CLOCK al­gorithm. Do you know about some per­form­ance bench­marks with LIRS and ARC? I would ex­pect that ARC can still out­per­form LIRS for fre­quently ac­cessed pages.

  3. LIRS takes both re­cency and fre­quency in­to ac­count. The con­cur­rent vari­ant of LIRS is ClockPro. The buf­fer­ing ap­proach used in my CLHM provides al­lows non-​concurrent policies like LRU and LIRS to not suf­fer from lock con­ten­tion, so the most ef­fi­cient policy can be used. LIRS is an O(1) al­gorithm. The full pa­per is at [1].

    An­oth­er pa­per [2] bench­marks mul­tiple page re­place­ment policies. LIRS per­formed ex­tremely well.

    I have only seen ARC used where there is an ob­vi­ous cross-​licensing agree­ment with IBM. How­ever, I doubt that any­one has put much ef­fort in­to de­term­in­ing if its us­able and in­stead ad­op­ted al­tern­at­ive ap­proaches.

    [1] http://​cite​seerx​.ist​.psu​.edu/​v​i​e​w​d​o​c​/​s​u​m​m​a​r​y​?​d​o​i​=​1​0​.​1​.​1​.​2​.​437

Comments are closed.