Ker­nel fea­tures patch­set (from GSoC 2010)

I am play­ing around with the patch­set “my” stu­dent gen­er­ated dur­ing this years GSoC (the code for all pro­jects is avail­able from Google). In short, it gives you the pos­sib­il­ity to query from user­land, which op­tion­al ker­nel fea­tures are avail­able. I have let him mostly do those fea­tures, which are not so easy to de­tect from user­land, or where the de­tec­tion could trig­ger an auto­load of a ker­nel mod­ule.

I let the out­put speak for him­self, first the out­put be­fore his patch­set:

kern.features.compat_freebsd7: 1
kern.features.compat_freebsd6: 1
kern.features.posix_shm: 1

And now with his patch­set:

kern.features.compat_freebsd6: 1
kern.features.compat_freebsd7: 1
kern.features.ffs_snapshot: 1
kern.features.geom_label: 1
kern.features.geom_mirror: 1
kern.features.geom_part_bsd: 1
kern.features.geom_part_ebr: 1
kern.features.geom_part_ebr_compat: 1
kern.features.geom_part_mbr: 1
kern.features.geom_vol: 1
kern.features.invariant_support: 1
kern.features.kdtrace_hooks: 1
kern.features.kposix_priority_scheduling: 1
kern.features.ktrace: 1
kern.features.nfsclient: 1
kern.features.nfsserver: 1
kern.features.posix_shm: 1
kern.features.pps_sync: 1
kern.features.quota: 1
kern.features.scbus: 1
kern.features.softupdates: 1
kern.features.stack: 1
kern.features.sysv_msg: 1
kern.features.sysv_sem: 1
kern.features.sysv_shm: 1
kern.features.ufs_acl: 1

With his patches we have a total of 84 ker­nel fea­tures which can be quer­ied (ob­vi­ously I do not have all op­tion­al op­tions en­abled in the ker­nel which pro­duces this out­put). All of the fea­tures also have a de­scrip­tion, and it is easy to add more fea­tures. As an ex­ample I present what is ne­ces­sary to pro­duce the kern.features.stack out­put:

./kern/subr_stack.c:FEATURE(stack, “Sup­port for cap­tur­ing ker­nel stack”);

There is also a little user­land ap­plic­a­tion (and a lib­rary in­ter­face) which al­lows to query sev­er­al fea­tures from scripts/​applications with the pos­sib­il­ity to pre­tend a fea­ture is not there (the re­quire­ment for this was for ports; pre­tend­ing a fea­ture is there if it is not was ruled out be­cause such run-​time de­tec­tion is only ne­ces­sary for things which have to run soon and pre­tend­ing some fea­ture is there while it is not will cause big prob­lems). Un­for­tu­nately the man page for the ap­plic­a­tion is not yet ready, but I’m sure you can fig­ure out how to use it.

The names of the fea­tures and the de­scrip­tion fol­lows an easy scheme, what is writ­ten down in NOTES is used as a name and a de­scrip­tion for the fea­ture (an ex­cep­tion is geom_​part_​X, there we de­cided to use a com­mon theme (“GEOM par­ti­tion­ing class for XXX”) which is dis­tinct from the cor­res­pond­ing geom_​X class). If you have com­plains about what is used in a spe­cific fea­ture, do not com­plain to him: change it in NOTES and the fea­ture will fol­low.

If you have ques­tions, sug­ges­tions, or some oth­er in­terest to con­tact him, his FreeBSD ad­dress is kibab@. Feel free to en­cour­age him to go ahead with the next steps (fin­ish­ing the man page, split­ting up the patches in­to sens­ible pieces and present­ing them on ap­pro­pri­ate mailing­lists for re­view). 🙂

The FreeBSD-​linuxulator ex­plained (for users)

Af­ter an­other mail where I ex­plained a little bit of the linuxu­lat­or be­ha­vi­or, it is time to try to make an easy text which I can ref­er­ence in fu­ture an­swers. If someone wants to add parts of this ex­plan­a­tion to the FreeBSD hand­book, go ahead.

Linux emu­la­tion? No, “nat­ive” ex­e­cu­tion (sort of)!

First, the linuxu­lat­or is not an emu­la­tion. It is “just” a bin­ary in­ter­face which is a little bit dif­fer­ent from the FreeBSD-“native”-one. This means that the bin­ary files in FreeBSD and Linux are both files which com­ply to the ELF spe­cific­a­tion.

When the FreeBSD ker­nel loads an ELF file, it looks if it is a FreeBSD ELF file or a Linux ELF file (or some oth­er fla­vor it knows about). Based upon this it looks up ap­pro­pri­ate ac­tions in a table for this bin­ary (it can also dif­fer­en­ti­ate between 64-​bit and 32-​bit, and prob­ably oth­er things too).

The FreeBSD-​table is al­ways com­piled in (for a bet­ter big pic­ture: at least on an AMD/​Intel 64-​bit plat­form there is also the pos­sib­il­ity to in­clude a 32-​bit ver­sion of this table ad­di­tion­ally, to be able to ex­ecute 32-​bit pro­grams on 64-​bit sys­tems), and oth­er ones like the Linux one can be loaded ad­di­tion­ally in­to the ker­nel (or build stat­ic­ally in the ker­nel, if de­sired).

Those tables con­tain some para­met­ers and point­ers which al­low to ex­ecute the bin­ary. If a pro­gram is mak­ing a sys­tem call, the ker­nel will look up the cor­rect func­tion in­side this table. It will do this for FreeBSD bin­ar­ies, and for Linux bin­ar­ies. This means that there is no emulation/​simulation (over­head) go­ing on… at least ideally. Some be­ha­vi­or is a little bit dif­fer­ently between Linux and FreeBSD, so that a little bit of translation/​house-​keeping has to go on for some Linux sys­tem calls for the un­der­ly­ing FreeBSD ker­nel func­tions.

This means that a lot of Linux stuff in FreeBSD is handled at the same speed as if this Linux pro­gram would be a FreeBSD pro­gram.

Linux file/​directory tricks

When the ker­nel de­tects a Linux pro­gram, it is also play­ing some tricks with files and dir­ect­or­ies (also a prop­er­ty of the above men­tioned table in the ker­nel, so the­or­et­ic­ally the ker­nel could play tricks for FreeBSD pro­grams too).

If you look up for a file or dir­ect­ory /​A, the ker­nel will first look for /​compat/​linux/​A, and if it does not find it, it will look for /​A. This is im­port­ant! For ex­ample if you have an empty /​compat/​linux/​home, any ap­plic­a­tion which wants to dis­play the con­tents of /​home will show /​compat/​linux/​home. As it is empty, you see noth­ing. If this ap­plic­a­tion does not al­low you to en­ter a dir­ect­ory manu­ally via the key­board, you have lost (ok, you can re­move /​compat/​linux/​home or fill it with what you want to have). If you can en­ter a dir­ect­ory via the key­board, you could en­ter /​home/​yourlogin, this would first let the ker­nel look for /​compat/​linux/​home/​yourlogin, and as it can not find it then have a look for /​home/​yourlogin (which we as­sume is there), and as such would dis­play the con­tents of your home dir­ect­ory.

This im­plies sev­er­al things:

  • you can hide FreeBSD dir­ect­ory con­tents from Linux pro­grams while still be­ing able to ac­cess the con­tent
  • “badly” pro­grammed Linux ap­plic­a­tions (more cor­rectly: Linux pro­grams which make as­sump­tions which do not hold in FreeBSD) can pre­vent you from ac­cess­ing FreeBSD files, or files which are the same in Linux and FreeBSD (like /​etc/​group which is not avail­able in /​compat/​linux in the linux_​base ports, so that the FreeBSD one is read)
  • you can have dif­fer­ent files for Linux than for FreeBSD

The Linux user­land

The linux_​base port in FreeBSD is com­ing from a plain in­stall­a­tion of Linux pack­ages. The dif­fer­ence is that some files are de­leted, either be­cause we can not use them in the linuxu­lat­or, or be­cause they ex­ist already in the FreeBSD tree at the same place and we want that the Linux pro­grams use the FreeBSD file (/​etc/​group and /​etc/​passwd come to mind). The in­stall­a­tion also marks bin­ary pro­grams as Linux pro­grams, so that the ker­nel knows which kernel-​table to con­sult for sys­tem calls and such (this is not really ne­ces­sary for all bin­ary pro­grams, but it is harder to script the cor­rect de­tec­tion lo­gic, than to just “brand” all bin­ary pro­grams).

Ad­di­tion­ally some con­fig­ur­a­tions are made to (hope­fully) make it do the right thing out of the box. The com­plete setup of the linux_​base ports is done to let Linux pro­grams in­teg­rate in­to FreeBSD. This means if you start acror­ead or skype, you do not want to have to have to con­fig­ure some things in /​compat/​linux/​etc/​ first to have your fonts look the same and your user IDs re­solved to names (this does not work if you use LDAP or ker­ber­os or oth­er dir­ect­ory ser­vices for the user/​group ID man­age­ment, you need to con­fig­ure this your­self). All this should just work and the ap­plic­a­tion win­dows shall just pop up on your screen so that you can do what you want to do. Some linux_​base ports also do not work on all FreeBSD re­leases. This can be be­cause some ker­nel fea­tures which this linux_​base ports de­pends upon is not avail­able (yet) in FreeBSD. Be­cause of this you should not choice a linux_​base port your­self. Just go and in­stall the pro­gram from the Ports Col­lec­tion and let it in­stall the cor­rect linux_​base port auto­mat­ic­ally (a dif­fer­ent FreeBSD re­lease may have a dif­fer­ent de­fault linux_​base port).

A note of cau­tion, there are in­struc­tions out there which tell how to in­stall more re­cent linux_​base ports in­to FreeBSD re­leases which do not have them as de­fault. You do this on your own risk, it may or may not work. It de­pends upon which pro­grams you use and at which ver­sion those pro­grams are (or more tech­nic­ally, which ker­nel fea­tures they de­pend upon). If it does not work for you, you just have two pos­sib­il­it­ies: re­vert back and for­get about it, or up­date your FreeBSD ver­sion to a more re­cent one (but it could be the case, that even the most re­cent de­vel­op­ment ver­sion of FreeBSD does not have sup­port for what you need).

Linux lib­rar­ies and “ELF file OS ABI invalid”-error mes­sages

Due to the above ex­plained fact about file/​directory tricks by the ker­nel, you have to be care­ful with (ad­di­tion­al) Linux lib­rar­ies. When a Linux pro­gram needs some lib­rar­ies, sev­er­al dir­ect­or­ies (spe­cified in /compat/linux/etc/ are searched. Let us as­sume that the /compat/linux/etc/ spe­cifies to search in /​A, /​B and /​C. This means the FreeBSD ker­nel first gets a re­quest to open /​A/​libXYZ. Be­cause of this he first tries /​compat/​linux/​A/​libXYZ, and if it does not ex­ist he tries /​A/​libXYZ. When this fails too, the Linux runtime linker tries the next dir­ect­ory in the con­fig, so that the ker­nel looks now for /​compat/​linux/​B/​libXYZ and if it does not ex­ist for /​B/​libXYZ.

Now as­sume that libXYZ is in /​compat/​linux/​C/​ as a Linux lib­rary, and in /​B as a FreeBSD lib­rary. This means that the ker­nel will first find the FreeBSD lib­rary /​B/​libXYZ. The Linux bin­ary which needs it can not do any­thing with this FreeBSD lib­rary (which de­pends upon the FreeBSD sy­scall table and FreeBSD sym­bols from e.g. libc), and the Linux runtime linker will bail out be­cause of this (ac­tu­ally he sees that the lin is not of the re­quired type by read­ing the ELF head­er of it). Un­for­tu­nately the Linux runtime linker will not con­tin­ue to search for an­other lib­rary with the same name in an­other dir­ect­ory (at least this was the case last time I checked and mod­i­fied the or­der in which the Linux runtime linker searches for lib­rar­ies… this has been a while, so he may be smarter now) and you will see the above er­ror mes­sage (if you star­ted the linux pro­gram in a ter­min­al).

The bot­tom line of all this is: the er­ror mes­sage about ELF file OS ABI in­val­id just means that the Linux pro­gram was not able to find the cor­rect Linux lib­rary and got a FreeBSD lib­rary in­stead. Go, in­stall the cor­res­pond­ing Linux lib­rary, and make sure the Linux pro­gram can find it in­stead of the FreeBSD lib­rary (do not for­get to run “/​compat/​linux/​sbin/​ldconfig –r /​compat/​linux” if you make changes by hand in­stead of us­ing a port, else your changes may not be taken in­to ac­count).

Con­straints re­gard­ing ch­root in­to /​compat/​linux

The linux_​base ports are de­signed to have a nice install-​and-​start ex­per­i­ence. The draw­back of this is, that there is not a full Linux sys­tem in /​compat/​linux, so do­ing a ch­root in­to /​compat/​linux will cause trouble (de­pend­ing on what you want to do). If you want to ch­root in­to the linux sys­tem on your FreeBSD ma­chine, you bet­ter in­stall a linux_​dist port. A linux_​dist port can be in­stalled in par­al­lel to a linux_​base port. Both of them are in­de­pend­ent and as such you need to redo/​copy con­fig­ur­a­tion changes you want to have in both en­vir­on­ments.

All in­tern­al ser­vices mi­grated to IPv6

In the last days I mi­grated all my in­tern­al ser­vices to IPv6.

All my jails have an IPv4 and an IPv6 ad­dress now. All Apaches (I have one for my pic­ture gal­lery, one for web­mail, and one for in­tern­al man­age­ment) now listen on the in­tern­al IPv6 ad­dress too. Squid is up­dated from 2.x to 3.1 (the most re­cent ver­sion in the Ports Col­lec­tion) and I ad­ded some IPv6 ACLs. The in­tern­al Post­fix is con­figured to handle IPv6 too (it is de­liv­er­ing everything via an au­then­tic­ated and en­cryp­ted chan­nel to a ma­chine with a stat­ic IPv4 ad­dress for fi­nal de­liv­ery). My MySQL does not need an IPv6 ad­dress, as it is only listen­ing to re­quests via IPC (the sock­et is hard­linked between jails). All ssh dae­mons are con­figured to listen to IPv6 too. The IMAP and CUPS server was pick­ing the new IPv6 ad­dresses auto­mat­ic­ally. I also up­dated Sam­ba to handle IPv6, but due to lack of a Win­dows ma­chine which prefers IPv6 over IPv4 for CIFS ac­cess (at least I think my Win­dows XP net­book only tries IPv4 con­nec­tions) I can not really test this.

Only my Wii is a little bit be­hind, and I have not checked if my Sony-​TV will DTRT (but for this I first have to get some time to have a look if I have to up­date my DD-​WRT firm­ware on the little WLAN-​router which is “ex­tend­ing the cable” from the TV to the in­tern­al net­work, and I have to look how to con­fig­ure IPv6 with DD-​WRT).

ZFS and NFS /​ on-​disk-​cache

In the FreeBSD mailing­lists I stumbled over  a post which refers to a blog-​post which de­scribes why ZFS seems to be slow (on Sol­ar­is).

In short: ZFS guar­an­tees that the NFS cli­ent does not ex­per­i­ence si­lent cor­rup­tion of data (NFS server crash and loss of data which is sup­posed to be already on disk for the cli­ent). A re­com­mend­a­tion is to en­able the disk-​cache for disks which are com­pletely used by ZFS, as ZFS (un­like UFS) is aware of disk–caches. This in­creases the per­form­ance to what UFS is de­liv­er­ing in the NFS case.

There is no in-​deep de­scrip­tion of what it means that ZFS is aware of disk-​caches, but I think this is a ref­er­ence to the fact that ZFS is send­ing a flush com­mand to the disk at the right mo­ments. Let­ting aside the fact that there are disks out there which lie to you about this (they tell the flush com­mand fin­ished when it is not), this would mean that this is sup­por­ted in FreeBSD too.

So every­one who is cur­rently dis­abling the ZIL to get bet­ter NFS per­form­ance (and ac­cept si­lent data cor­rup­tion on the cli­ent side): move your zpool to ded­ic­ated (no oth­er real FS than ZFS, swap and dump devices are OK) disks (hon­est ones) and en­able the disk-​caches in­stead of dis­abling the ZIL.

I also re­com­mend that people which have ZFS already on ded­ic­ated (and hon­est) disks have a look if the disk-​caches are en­abled.

IPv6 in my LAN

Af­ter en­abling IPv6 in my WLAN router, I also en­abled IPv6 in my FreeBSD sys­tems. I have to tell that the IPv6 chapter in the FreeBSD hand­book does not con­tain as much in­form­a­tion as I would like to have about this.

Con­fig­ur­ing the in­ter­faces of my two 9-​current sys­tems to also carry a spe­cific IPv6 ad­dress (an easy one from the ULA I use) was easy af­ter read­ing the man-​page for rc.conf. Af­ter a little bit of ex­per­i­ment­ing it came down to:

ifconfig_rl0_ipv6=“inet6 ::2:1 pre­fixlen 64 accept_​rtadv”
ipv6_defaultrouter=”<router ad­dress>”

Apart from this ad­dress (I chose it be­cause the IPv4 ad­dress ends in “.2”, this way I can add some easy to re­mem­ber ad­dresses for this ma­chine if needed), I also have two auto­mat­ic­ally con­figured ad­dresses. One is with the same ULA and some not so easy to re­mem­ber end (con­struc­ted from the MAC ad­dress), and one is from the of­fi­cial pre­fix the router con­struc­ted out of the of­fi­cial IPv4 ad­dress from the ISP (+ the same end than the oth­er end).

Ad­di­tion­ally I also have all my jails on this ma­chine with an IPv6 ad­dress now (yes, they are like “…:2:100” with the :100 be­cause the IPv4 ad­dress ends in “.100”). Still TODO is the con­ver­sion of all the ser­vices in the jails to also listen on the IPv6 ad­dress.

I already changed the con­fig of my in­tern­al DNS to have the IPv6 ad­dresses for all sys­tems, listen on the IPv6 ad­dress (when I add an IPv6 net­work to allow-​query/​allow-​query-​cache/​allow-​recursion bind does not want to start). And as I was there, I also en­abled the DNSSEC veri­fic­a­tion (but I get a lot of er­ror mes­sages in the logs: “un­able to con­vert er­rno to isc_​result: 42: Pro­to­col not avail­able”, one search res­ult which talks ex­actly about this er­ror tells it is a “cos­met­ic er­ror”…).

I no­ticed that an IPv6 ping between two phys­ic­al ma­chines takes a little bit more time than an IPv4 ping (no IPsec en­abled). It sur­prised me that this is such a no­tice­able dif­fer­ence (not with­in the std-​dev at all):

— m87​.Leidinger​.net ping stat­ist­ics —
10 pack­ets trans­mit­ted, 10 pack­ets re­ceived, 0.0% pack­et loss
round-​trip min/​avg/​max/​stddev = 0.168÷0.193÷0.220÷0.017 ms

— m87​.Leidinger​.net ping6 stat­ist­ics —
10 pack­ets trans­mit­ted, 10 pack­ets re­ceived, 0.0% pack­et loss
round-​trip min/​avg/​max/​std-​dev = 0.207÷0.325÷0.370÷0.047 ms

The in­form­a­tion I miss in the FreeBSD hand­book in the IPv6 chapter is what those oth­er IPv6 re­lated ser­vices are and when/​how to con­fig­ure them. I have an idea now what this rad­vd is, but I am not sure what the in­ter­ac­tion is with the accept_​rtadv set­ting for if­con­fig (and I do not think I need it, as my WLAN router seems to do it already). I know that I get the IPv6-​friendly net­work neigh­bor­hood dis­played with ndp(8). I did not have a look at en­abling IPv6 mul­tic­ast sup­port in FreeBSD, and I do not know what those oth­er IPv6 op­tions for rc.conf do.