Alexander Leidinger

Just another weblog

Apr
14

ARC (adap­tive replace­ment cache) explained

At work we have the sit­u­a­tion of a slow appli­ca­tion. The ven­dor of the cus­tom appli­ca­tion insists that the ZFS (Solaris 10u8) and the Ora­cle DB are badly tuned for the appli­ca­tion. Part of their tun­ing is to limit the ARC to 1 GB (our max size is 24 GB on this machine). One prob­lem we see is that there are many write oper­a­tions (rounded val­ues: 1k ops for up to 100 MB) and the DB is com­plain­ing that the log­writer is not able to write out the data fast enough. At the same time our data­base admins see a lot of com­mits and/or roll­backs so that the archive log grows very fast to 1.5 GB. The funny thing is… the per­for­mance tests are sup­posed to only cover SELECTs and small UPDATEs.

I pro­posed to reduce the zfs_txg_timeout from the default value of 30 to some sec­onds (and as no reboot is needed like for the max arc size, this can be done fast instead of wait­ing some min­utes for the boot-checks of the M5000). The first try was to reduce it to 5 sec­onds and it improved the sit­u­a­tion. The DB still com­plained about not being able to write out the logs fast enough, but it did not do it as often as before. To make the ven­dor happy we reduced the max arc size and tested again. First we have not seen any com­plains from the DB any­more, which looked strange to me because my under­stand­ing of the ARC (and the descrip­tion of the ZFS Evil Tun­ing Guide regard­ing the max size set­ting) sug­gest that this should not show this behav­ior we have seen, but the machine was also rebooted for this, so there could also be another explanation.

Luck­ily we found out that our test­ing infra­struc­ture had a prob­lem so that only a frac­tion of the per­for­mance test was per­formed. This morn­ing the peo­ple respon­si­ble for that made some changes and now the DB is com­plain­ing again.

This is what I expected. To make sure I fully under­stand the ARC, I had a look at the the­ory behind it at the IBM research cen­ter (update: PDF link). There are some papers which explain how to extend a cache which uses the LRU replace­ment pol­icy with some lines of code to an ARC. It looks like it would be an improve­ment to have a look at which places in FreeBSDLRU pol­icy is used to test if an ARC would improve the cache hit rate. From read­ing the paper it looks like there are a lot of places where this should be the case. The authors also pro­vide two adap­tive exten­sions to the CLOCK algo­rithm (used in var­i­ous OS in the VM sub­sys­tem) which indi­cate that such an approach could be ben­e­fi­cial for a VM sys­tem. I already con­tacted Alan (the FreeBSD one) and asked if he knows about it and if it could be ben­e­fi­cial for FreeBSD.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…

Tags: , , , , , , , , , ,

4 Responses to “ARC (adap­tive replace­ment cache) explained”

  1. ARC comme Adaptative Replacement Cache at FreeBSD-fr: Les nouvelles du géant en français Says:

    […] post rapide pour vous sig­naler sur le blog d’Alexander, un post trai­tant de la con­fig­u­ra­tion de l’AdaptativeReplacement Cache (ARC) pour ZFS et […]

  2. Ben Manes Says:

    ARC is patented by IBM result­ing in the open-source com­mu­nity being reluc­tant to adopt it. Post­greSQL has a thread on their con­cerns. The LIRS pol­icy was adopted by MySql and seems to be a strong con­tender with­out legal issues. I am in the midst of imple­ment­ing it in my Con­cur­rentLinked­HashMap project (Java) and find it com­pli­cated but quite effi­cient. Both can be imple­mented cheaply with great results, but are a more com­plex than a sim­ple LRU policy.

    GD Star Rating
    loading...
    GD Star Rating
    loading...
  3. netchild Says:

    Does some­one know if the ARC is part of the patent port­fo­lio for which IBM allows the use in Open­Source code? If yes, this would allow to offer it as a compile-time alternative.

    I have read about the con­cerns of the Post­greSQL peo­ple (doc­u­ment from 2005), but have not seen a ref­er­ence to an expla­na­tion of their alter­na­tive. Googling for “LIRS pol­icy” gives a link to a paper from 2002 in the ACM por­tal. Just by read­ing the abstract, it seems to be an alter­natie to LRU(-K), but not to the CLOCK algo­rithm. Do you know about some per­for­mance bench­marks with LIRS and ARC? I would expect that ARC can still out­per­form LIRS for fre­quently accessed pages.

    GD Star Rating
    loading...
    GD Star Rating
    loading...
  4. Ben Manes Says:

    LIRS takes both recency and fre­quency into account. The con­cur­rent vari­ant of LIRS is Clock­Pro. The buffer­ing approach used in my CLHM pro­vides allows non-concurrent poli­cies like LRU and LIRS to not suf­fer from lock con­tention, so the most effi­cient pol­icy can be used. LIRS is an O(1) algo­rithm. The full paper is at [1].

    Another paper [2] bench­marks mul­ti­ple page replace­ment poli­cies. LIRS per­formed extremely well.

    I have only seen ARC used where there is an obvi­ous cross-licensing agree­ment with IBM. How­ever, I doubt that any­one has put much effort into deter­min­ing if its usable and instead adopted alter­na­tive approaches.

    [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.437
    [2] http://people.cs.vt.edu/~butta/docs/sigmetrics05_kernelPrefetch.pdf

    GD Star Rating
    loading...
    GD Star Rating
    loading...