I counted 18 projects which are given to FreeBSD in this years GSoC. For 3 of them I have some comments.
Very interesting to me is the project which is named Collective limits on set of processes (a.k.a. jobs). This looks a bit like the Solaris contract/project IDs. If this project results in something which allows the userland to query which PID belongs to which set, than this allows some nice improvement for start scripts. For example at work on Solaris each application is a mix of several projects (apache = “name:web” project, tomcat = “name:app” project, Oracle DB = “name:ora” project). Our management framework (written by a co-worker) allows to easily do something with those projects, a “show” displays the prstat (similar to top) info just for processes which belong to the project, a “kill” sends a kill-signal to all processes of the project, and so on. We could do something similar with our start scripts by declaring a namespace (FreeBSD:base:XXX / FreeBSD:ports:XXX?) and maybe number space (depending on the implementation) as reserved and use it to see if processes which belong to a particular script are still running or kill them or whatever.
The other two projects I want to comment upon here are Complete libpkg and create new pkg tools and Complete Package support in the pkg_install tools and cleanup. Both projects reference libpkg in their description. I hope the mentors of both projects pay some attention to what is going on in the other project to not cause dependencies/clashes between the students.
That I do not mention other projects does not mean that they are not interesting or similar, it is just that I do not have to say something valuable about them…
At work we have the situation of a slow application. The vendor of the custom application insists that the ZFS (Solaris 10u8) and the Oracle DB are badly tuned for the application. Part of their tuning is to limit the ARC to 1 GB (our max size is 24 GB on this machine). One problem we see is that there are many write operations (rounded values: 1k ops for up to 100 MB) and the DB is complaining that the logwriter is not able to write out the data fast enough. At the same time our database admins see a lot of commits and/or rollbacks so that the archive log grows very fast to 1.5 GB. The funny thing is… the performance tests are supposed to only cover SELECTs and small UPDATEs.
I proposed to reduce the zfs_txg_timeout from the default value of 30 to some seconds (and as no reboot is needed like for the max arc size, this can be done fast instead of waiting some minutes for the boot-checks of the M5000). The first try was to reduce it to 5 seconds and it improved the situation. The DB still complained about not being able to write out the logs fast enough, but it did not do it as often as before. To make the vendor happy we reduced the max arc size and tested again. First we have not seen any complains from the DB anymore, which looked strange to me because my understanding of the ARC (and the description of the ZFS Evil Tuning Guide regarding the max size setting) suggest that this should not show this behavior we have seen, but the machine was also rebooted for this, so there could also be another explanation.
Luckily we found out that our testing infrastructure had a problem so that only a fraction of the performance test was performed. This morning the people responsible for that made some changes and now the DB is complaining again.
This is what I expected. To make sure I fully understand the ARC, I had a look at the theory behind it at the IBM research center (update: PDF link). There are some papers which explain how to extend a cache which uses the LRU replacement policy with some lines of code to an ARC. It looks like it would be an improvement to have a look at which places in FreeBSD a LRU policy is used to test if an ARC would improve the cache hit rate. From reading the paper it looks like there are a lot of places where this should be the case. The authors also provide two adaptive extensions to the CLOCK algorithm (used in various OS in the VM subsystem) which indicate that such an approach could be beneficial for a VM system. I already contacted Alan (the FreeBSD one) and asked if he knows about it and if it could be beneficial for FreeBSD.
At work we have some performance problems.
One application (not off-the-shelf software) is not performing good. The problem is that the design of the application is far from good (auto-commit is used, and the Oracle DB is doing too much writes for what the application is supposed to do because of this). During helping our DBAs in their performance analysis (the vendor of the application is telling our hardware is not fast enough and I had to provide some numbers to show that this is not the case and they need to improve the software as it does not comply to the performance requirements they got before developing the application) I noticed that the filesystem where the DB and the application are located (a ZFS if someone is interested) is doing sometimes 1.200 IO (write) operations per second (to write about 100 MB). Yeah, that is a lot of IOops our SAN is able to do! Unfortunately too expensive to buy for use at home. 🙁
Another application (nagios 3.0) was generating a lot of major faults (caused by a lot of fork()s for the checks). It is a SunFire V890, and the highest number of MF per second I have seen on this machine was about 27.000. It never went below 10.000. On average maybe somewhere between 15.000 and 20.000. My Solaris-Desktop (an Ultra 20) is generating maybe several hundred MF if a lot is going on (most of the time is does not generate much). Nobody can say the V890 is not used… 🙂 Oh, yes, I suggested to enable the nagios config setting for large sites, now the major faults are around 0−10.000 and the machine is not that stressed anymore. The next step is probably to have a look at the ancient probes (migrated from the big brother setup which was there several years before) and reduce the number of forks they do.