Sta­bil­i­ty prob­lems with 7‑stable

On the machine where I host this blog, I have/had some sta­bil­i­ty problems.

Last week I updat­ed the machine from FreeB­SD 7.1‑pX to 7.2‑p5 (GENERIC ker­nel in both cas­es). 5 – 10 Min­utes after the reboot into the new ver­sion the machine had a dead­lock. After some road­blocks (order­ing a KVM-switch from the hoster, the KVM-switch not work­ing with a proxy (dur­ing lunchtime at work), a bro­ken video-capture of the KVM-switch and a replace­ment on Mon­day morn­ing to not pay the WE-fees), I spend a big part of the night to get it sta­ble. I tried dis­abling SMP, enabling INVARIANTS and WITNESS, chang­ing the sched­uler, cut­ting the soft­ware mir­ror (to rule out a mis­match between the con­tent of the disks after all the hard reboots) and updat­ing to 7‑stable.

Unfor­tu­nate­ly noth­ing helped. 🙁

Googling a lit­tle bit around (it is a AMD Dual-Core sys­tem with NVidia MCP61 chipset) was lead­ing me to a post on the mail­inglists from 2008 which talks about an issue with the buffer cache. I do not know if this is still an issue (I have send a email to kib@ to ask about it), and my sce­nario is not the same as the one which is described in the mail, but because of this I decid­ed to switch one of the two UFS mir­rors to ZFS.

The first boot into the ZFS caused again a reboot after some min­utes (I do not know if it was because of a mem­o­ry exhaust­ed pan­ic, or because of a dead­lock), but as I did not tune the ker­nel for ZFS I am tempt­ed to believe that I should not count that. Now, after tun­ing the ker­nel (increas­ing the kmem_size to 700M, no prefetch­ing, lim­it­ing the ARC to 40M) it is up since near­ly 2h (as of this writ­ting… cross­ing fin­gers). Before it was not able to sur­vive more than some min­utes with just the jail for the mails up. Now I not only have the mail-jail up, but also the jail for the blog (one jail still dis­abled, but I will take care about that after this post).

I do not know if only increas­ing the kmem_size would have helped with the prob­lem, but as I was test­ing a GENERIC ker­nel + gmir­ror mod­ule in the begin­ning, I expect­ed that the auto-tuning of this val­ue should have been enough for such a sim­ple set­up (2GB RAM, 2 disks with 3 par­ti­tions each, one par­ti­tion pair for root, one for swap, one for the jails).

I hope that I sta­bi­lized the sys­tem now. It may be the case that I will test some patch­es in case some­one comes up with some­thing, so do not be sur­prised if the blog and email to me is a lit­tle bit flaky.

FreeNAS & Sen­sors for FreeBSD

This WE I was told that FreeNAS seems to want to move from FreeB­SD to Lin­ux (since then it seems there could be a lin­ux and a FreeB­SD ver­sion). One of the rea­sons seems to be a miss­ing sen­sors framework.

As I was com­mit­ting a port of the OpenB­SD sen­sors frame­work (pro­duced as part of the Google Sum­mer of Code 2007) to FreeB­SD and had to remove it after­wards because one com­mit­ter com­plained very loud­ly, I was asked what the sta­tus of this is.

The short sta­tus is: Nobody is doing some­thing about it.

Before I explain the long sta­tus, I give  a short overview what this sen­sors frame­work is:

  • a ker­nel API which allows to add sensors
  • an inter­face for the user­land to query the sen­sor data
  • some basic user­land code to show and log the sen­sor info

The API and the query inter­face are more or less inde­pen­dent. For the user­land code it was more a log­ging infra­struc­ture than a real mon­i­tor­ing solu­tion. The rea­son was the real mon­i­tor­ing solu­tions already exist (Nagios, snm­pd, …) and can be adapt­ed to query the sen­sors. Ide­al­ly a query in user­land should be han­dled by a library instead of direct­ly access­ing the sysctl inter­face, this way the kernel<->userland inter­face would be abstract­ed away (and could b replaced as needs arise). This was not done, it was some­thing to be done lat­er (Rome was not build in a day).

The user­land inter­face also only cared about dumb sen­sors (those which you need to query man­u­al­ly to get the infor­ma­tion), smart sen­sors (those which are able to send events them­self) where not tak­en care about in the sense of real­ly send­ing sensor-triggered events, but the ker­nel API allowed to add such sen­sors. The sysctl inter­face has no way of send­ing events, but FreeB­SD already has an event inter­face (devd is tak­ing care about it). It would have been not a prob­lem to send events via this chan­nel and let an user­land library take care about the deliv­ery togeth­er with oth­er sensor-data in userland.

And now the long sta­tus is:

PHK com­plained loud­ly about it. First he said he did not look at it but he com­plained that is not good regard­less. After a lot of nag­ging from me he had a look at it and was not hap­py about the time stuff in it (short: the FreeB­SD time­counter code is bet­ter). This was not a prob­lem in my opin­ion, we could have dis­abled this part with­out prob­lems. After such an offer from me, he com­plained that the sen­sors frame­work uses the sysctl inter­face instead of an entry in /dev.

At this point in time already sev­er­al user­land util­i­ties used the sysctl frame­work to query for sta­tus data in the ker­nel. So there was already prece­dence for such an use of it. Lat­er some more such uses where added too (e.g. the proc­stat stuff by core team mem­ber Robert Watson).

I saved some of the cor­re­spond­ing mails (to pub­lic mail­ing lists) in a mbox file, read the mess your­self if you want.

The bot­tom line is: Sev­er­al com­mit­ters (even some which we could call high pro­file com­mit­ters) told me that they do not see a prob­lem in the use of the sysctl inter­face. They do not seem to want to tell it in pub­lic (nobody of them voiced their opin­ion in the thread, so do not ask me who those peo­ple are). I am not inter­est­ed in invest­ing more of my spare time into fight­ing wind­mills (it looks like this to me).

So, if some­one is inter­st­ed in the code, r172631 has it. In the per­force repos­i­to­ry you can maybe find some sen­sors. I think most of it can still be used with­out much changes.

If some­one tries it with a more recent FreeB­SD, please drop me a note if it just applies fine, or a patch (or an URL to it) if it needs some mod­i­fi­ca­tions. Who knows, maybe in a future project it may be use­ful for me.

If there is enough inter­est by sev­er­al peo­ple, I can even put up a wiki page where those peo­ple can coor­di­nate, but that is most prob­a­bly all I am will­ing to invest fur­ther into this (at least in my unpaid time).

Video4Linux sup­port in FreeBSD

Yes­ter­day I com­mit­ted the v4l sup­port into the lin­ux­u­la­tor (in 9‑current). Part of this was the import of the v4l head­er from lin­ux. We have the per­mis­sion to use it, it is not licensed via GPL. This means we can use it in FreeB­SD native dri­vers, and they are even allowed to be com­piled into GENERIC (but I doubt we have a dri­ver which could pro­vide the v4l inter­face in GENERIC).

The code I com­mit­ted is “just” the glue-code which allows to use FreeB­SD native devices which pro­vide a v4l inter­face (e.g. multimedia/pwcbsd) from lin­ux programs.

If some­one is will­ing to write the glue-code for the v4l2 inter­face please con­tact me. We have the per­mis­sion to use the v4l2 head­er too, we just need some­one doing the coding.

In a sim­i­lar way, if some­one is will­ing to add v4l2 inter­face sup­port to FreeB­SD native dri­vers (I do not know any FreeB­SD dri­ver which pro­vides a v4l2 inter­face) , just tell me and I import the v4l2 head­er into FreeBSD.

And if some­one wants to add v4l sup­port to FreeB­SD native dri­vers but does not know where to start, feel free to con­tact me too.

Regard­ing the code which is in FreeB­SD ATM: it is not com­plete­ly fin­ished yet (some clip­ping relat­ed stuff is being worked on), but the not fin­ished part can not even be test­ed, as we do not know about a FreeB­SD device which pro­vides this functionality.

There is no MFC planned yet, but the more suc­cess sto­ries and test sce­nar­ios are being told about on the emu­la­tion or mul­ti­me­dia mail­inglists, the more like­ly I will do a MFC soon­er than later.

Video for lin­ux (v4l) emu­la­tion com­ing to the linuxulator

I am in the process of prepar­ing the import of code which makes v4l devices usable in the lin­ux­u­la­tor. Basi­cal­ly this means you can use your web­cam in skype (test­ed by the sub­mit­ter of the patch on amd64).

This is not a “apply patch and com­mit” thing, because the orig­i­nal videodev.h (with some mod­i­fi­ca­tions) is used. I was seek­ing the OK from core@ for this. As there is no license in the head­er, and the orig­i­nal author (Alan Cox, the lin­ux one, not our FreeB­SD one) gave per­mis­sions to use it, core@ is OK with the import.

I intent to do a ven­dor import of the lin­ux head­er (pre­pared today, togeth­er with some readme which explains where it comes from and some stuff to show that we are on the safe side regard­ing legal stuff), and then I want to copy this over to the lin­ux­u­la­tor as linux_videodev.h and com­mit the patch (prob­a­bly a lit­tle bit mod­i­fied in a few places). My plan is to com­mit it this week. Peo­ple which already want to play around with it now can have a look at the emu­la­tion mail­inglist, a link to the patch is post­ed there.

With the head­er being in a ven­dor branch, inter­est­ed peo­ple could then start to sub­mit new BSD licensed dri­vers or mod­i­fy exist­ing dri­vers which make use of the v4l inter­face, but I let the import of the head­er into the FreeB­SD include direc­to­ry up to the per­son which wants to com­mit the first native FreeBSD-v4l support.

When such native FreeBSD-v4l sup­port is com­mit­ted, the lin­ux­u­la­tor code needs to be revised.

SUN Open­Stor­age presentation

At work (client site) SUN made a pre­sen­ta­tion about their Open­Stor­age prod­ucts (Sun Stor­age 7000 Uni­fied Stor­age Sys­tems) today.

From a tech­nol­o­gy point of view, the soft­ware side is noth­ing new to me. Using SSDs for zfs as a read-/write-cache is some­thing we can do (part­ly) already since at least Solaris 10u6 (that is the low­est Solaris 10 ver­sion we have installed here, so I can not check quick­ly if the ZIL can be on a sep­a­rate disk in pre­vi­ous ver­sions of Solaris, but I think we have to wait until we updat­ed to Solaris 10u8 until we can have the L2ARC on a sep­a­rate disk) or in FreeB­SD. All oth­er nice ZFS fea­tures avail­able in the Open­Stor­age web inter­face are also not surprising.

But the demon­stra­tion with the Stor­age Sim­u­la­tor impressed me. The inter­ac­tion with Win­dows via CIFS makes the old­er ver­sion of files in snap­shots avail­able in Win­dows (I assume this is the Vol­ume Shad­ow Copy fea­ture of Win­dows), and the sta­tis­tics avail­able via DTrace in the web inter­face are also impres­sive. All this tech­nol­o­gy seems to be well inte­grat­ed into an easy to use pack­age for het­ero­ge­neous envi­ron­ments. If you would like to set­up some­thing like this by hand, you would need to have a lot of knowl­edge about a lot of stuff (and in the FreeB­SD case, you would prob­a­bly need to aug­ment the ker­nel with addi­tion­al DTrace probes to be able to get a sim­i­lar gran­u­lar­i­ty of the sta­tis­tics), noth­ing a small com­pa­ny is will­ing to pay.

I know that I can get a lot of infor­ma­tion with DTrace (from time to time I have some free cycles to extend the FreeB­SD DTrace imple­men­ta­tion with addi­tion­al DTrace probes for the lin­ux­u­la­tor), but what they did with DTrace in the Open­Stor­age soft­ware is great. If you try to do this at home your­self, you need some time to imple­ment some­thing like this (I do not think you can take the DTrace scripts and run them on FreeB­SD, this will prob­a­bly take some weeks until it works).

It is also the first time I see this new CIFS imple­men­ta­tion from SUN in ZFS life in action. It looks well done. Inte­gra­tion with AD looks more easy than doing it by hand in Sam­ba (at least from look­ing at the Open­Stor­age web inter­face). If we could get this in FreeB­SD… it would rock!

The entire Open­Stor­age web inter­face looks usable. I think SUN has a prod­uct there which allows them to enter new mar­kets. A prod­uct which they can sell to com­pa­nies which did not buy some­thing from SUN before (even Windows-only com­pa­nies). I think even those Win­dows admins which nev­er touch a com­mand line inter­face (read: the low-level ones; not com­pa­ra­ble at all with the real­ly high-profile Win­dows admins of our client) could be able to get this up and running.

As it seems at the moment, our client will get a Sun Stor­age F5100 Flash Array for tech­nol­o­gy eval­u­a­tion in the begin­ning of next year. Unfor­tu­nate­ly the tech­nol­o­gy looks to easy to han­dle, so I assume I have to take care about more com­plex things when this machine arrives… 🙁