Ker­nel fea­tures patch­set (from GSoC 2010)

I am play­ing around with the patch­set “my” stu­dent gen­er­at­ed dur­ing this years GSoC (the code for all projects is avail­able from Google). In short, it gives you the pos­si­bil­i­ty to query from user­land, which option­al ker­nel fea­tures are avail­able. I have let him most­ly do those fea­tures, which are not so easy to detect from user­land, or where the detec­tion could trig­ger an autoload of a ker­nel module.

I let the out­put speak for him­self, first the out­put before his patchset:

kern.features.compat_freebsd7: 1
kern.features.compat_freebsd6: 1
kern.features.posix_shm: 1

And now with his patchset:

kern.features.compat_freebsd6: 1
kern.features.compat_freebsd7: 1
kern.features.ffs_snapshot: 1
kern.features.geom_label: 1
kern.features.geom_mirror: 1
kern.features.geom_part_bsd: 1
kern.features.geom_part_ebr: 1
kern.features.geom_part_ebr_compat: 1
kern.features.geom_part_mbr: 1
kern.features.geom_vol: 1
kern.features.invariant_support: 1
kern.features.kdtrace_hooks: 1
kern.features.kposix_priority_scheduling: 1
kern.features.ktrace: 1
kern.features.nfsclient: 1
kern.features.nfsserver: 1
kern.features.posix_shm: 1
kern.features.pps_sync: 1
kern.features.quota: 1
kern.features.scbus: 1
kern.features.softupdates: 1
kern.features.stack: 1
kern.features.sysv_msg: 1
kern.features.sysv_sem: 1
kern.features.sysv_shm: 1
kern.features.ufs_acl: 1

With his patch­es we have a total of 84 ker­nel fea­tures which can be queried (obvi­ous­ly I do not have all option­al options enabled in the ker­nel which pro­duces this out­put). All of the fea­tures also have a descrip­tion, and it is easy to add more fea­tures. As an exam­ple I present what is nec­es­sary to pro­duce the kern.features.stack output:

./kern/subr_stack.c:FEATURE(stack, “Sup­port for cap­tur­ing ker­nel stack”);

There is also a lit­tle user­land appli­ca­tion (and a library inter­face) which allows to query sev­er­al fea­tures from scripts/applications with the pos­si­bil­i­ty to pre­tend a fea­ture is not there (the require­ment for this was for ports; pre­tend­ing a fea­ture is there if it is not was ruled out because such run-time detec­tion is only nec­es­sary for things which have to run soon and pre­tend­ing some fea­ture is there while it is not will cause big prob­lems). Unfor­tu­nate­ly the man page for the appli­ca­tion is not yet ready, but I’m sure you can fig­ure out how to use it.

The names of the fea­tures and the descrip­tion fol­lows an easy scheme, what is writ­ten down in NOTES is used as a name and a descrip­tion for the fea­ture (an excep­tion is geom_part_X, there we decid­ed to use a com­mon theme (“GEOM par­ti­tion­ing class for XXX”) which is dis­tinct from the cor­re­spond­ing geom_X class). If you have com­plains about what is used in a spe­cif­ic fea­ture, do not com­plain to him: change it in NOTES and the fea­ture will follow.

If you have ques­tions, sug­ges­tions, or some oth­er inter­est to con­tact him, his FreeB­SD address is kibab@. Feel free to encour­age him to go ahead with the next steps (fin­ish­ing the man page, split­ting up the patch­es into sen­si­ble pieces and pre­sent­ing them on appro­pri­ate mail­inglists for review). 🙂

The FreeBSD-linuxulator explained (for users)

After anoth­er mail where I explained a lit­tle bit of the lin­ux­u­la­tor behav­ior, it is time to try to make an easy text which I can ref­er­ence in future answers. If some­one wants to add parts of this expla­na­tion to the FreeB­SD hand­book, go ahead.

Lin­ux emu­la­tion? No, “native” exe­cu­tion (sort of)!

First, the lin­ux­u­la­tor is not an emu­la­tion. It is “just” a bina­ry inter­face which is a lit­tle bit dif­fer­ent from the FreeBSD-“native”-one. This means that the bina­ry files in FreeB­SD and Lin­ux are both files which com­ply to the ELF spec­i­fi­ca­tion.

When the FreeB­SD ker­nel loads an ELF file, it looks if it is a FreeB­SD ELF file or a Lin­ux ELF file (or some oth­er fla­vor it knows about). Based upon this it looks up appro­pri­ate actions in a table for this bina­ry (it can also dif­fer­en­ti­ate between 64-bit and 32-bit, and prob­a­bly oth­er things too).

The FreeBSD-table is always com­piled in (for a bet­ter big pic­ture: at least on an AMD/Intel 64-bit plat­form there is also the pos­si­bil­i­ty to include a 32-bit ver­sion of this table addi­tion­al­ly, to be able to exe­cute 32-bit pro­grams on 64-bit sys­tems), and oth­er ones like the Lin­ux one can be loaded addi­tion­al­ly into the ker­nel (or build sta­t­i­cal­ly in the ker­nel, if desired).

Those tables con­tain some para­me­ters and point­ers which allow to exe­cute the bina­ry. If a pro­gram is mak­ing a sys­tem call, the ker­nel will look up the cor­rect func­tion inside this table. It will do this for FreeB­SD bina­ries, and for Lin­ux bina­ries. This means that there is no emulation/simulation (over­head) going on… at least ide­al­ly. Some behav­ior is a lit­tle bit dif­fer­ent­ly between Lin­ux and FreeB­SD, so that a lit­tle bit of translation/house-keeping has to go on for some Lin­ux sys­tem calls for the under­ly­ing FreeB­SD ker­nel functions.

This means that a lot of Lin­ux stuff in FreeB­SD is han­dled at the same speed as if this Lin­ux pro­gram would be a FreeB­SD program.

Lin­ux file/directory tricks

When the ker­nel detects a Lin­ux pro­gram, it is also play­ing some tricks with files and direc­to­ries (also a prop­er­ty of the above men­tioned table in the ker­nel, so the­o­ret­i­cal­ly the ker­nel could play tricks for FreeB­SD pro­grams too).

If you look up for a file or direc­to­ry /A, the ker­nel will first look for /compat/linux/A, and if it does not find it, it will look for /A. This is impor­tant! For exam­ple if you have an emp­ty /compat/linux/home, any appli­ca­tion which wants to dis­play the con­tents of /home will show /compat/linux/home. As it is emp­ty, you see noth­ing. If this appli­ca­tion does not allow you to enter a direc­to­ry man­u­al­ly via the key­board, you have lost (ok, you can remove /compat/linux/home or fill it with what you want to have). If you can enter a direc­to­ry via the key­board, you could enter /home/yourlogin, this would first let the ker­nel look for /compat/linux/home/yourlogin, and as it can not find it then have a look for /home/yourlogin (which we assume is there), and as such would dis­play the con­tents of your home directory.

This implies sev­er­al things:

  • you can hide FreeB­SD direc­to­ry con­tents from Lin­ux pro­grams while still being able to access the content
  • bad­ly” pro­grammed Lin­ux appli­ca­tions (more cor­rect­ly: Lin­ux pro­grams which make assump­tions which do not hold in FreeB­SD) can pre­vent you from access­ing FreeB­SD files, or files which are the same in Lin­ux and FreeB­SD (like /etc/group which is not avail­able in /compat/linux in the linux_base ports, so that the FreeB­SD one is read)
  • you can have dif­fer­ent files for Lin­ux than for FreeBSD

The Lin­ux userland

The linux_base port in FreeB­SD is com­ing from a plain instal­la­tion of Lin­ux pack­ages. The dif­fer­ence is that some files are delet­ed, either because we can not use them in the lin­ux­u­la­tor, or because they exist already in the FreeB­SD tree at the same place and we want that the Lin­ux pro­grams use the FreeB­SD file (/etc/group and /etc/passwd come to mind). The instal­la­tion also marks bina­ry pro­grams as Lin­ux pro­grams, so that the ker­nel knows which kernel-table to con­sult for sys­tem calls and such (this is not real­ly nec­es­sary for all bina­ry pro­grams, but it is hard­er to script the cor­rect detec­tion log­ic, than to just “brand” all bina­ry programs).

Addi­tion­al­ly some con­fig­u­ra­tions are made to (hope­ful­ly) make it do the right thing out of the box. The com­plete set­up of the linux_base ports is done to let Lin­ux pro­grams inte­grate into FreeB­SD. This means if you start acrore­ad or skype, you do not want to have to have to con­fig­ure some things in /compat/linux/etc/ first to have your fonts look the same and your user IDs resolved to names (this does not work if you use LDAP or ker­beros or oth­er direc­to­ry ser­vices for the user/group ID man­age­ment, you need to con­fig­ure this your­self). All this should just work and the appli­ca­tion win­dows shall just pop up on your screen so that you can do what you want to do. Some linux_base ports also do not work on all FreeB­SD releas­es. This can be because some ker­nel fea­tures which this linux_base ports depends upon is not avail­able (yet) in FreeB­SD. Because of this you should not choice a linux_base port your­self. Just go and install the pro­gram from the Ports Col­lec­tion and let it install the cor­rect linux_base port auto­mat­i­cal­ly (a dif­fer­ent FreeB­SD release may have a dif­fer­ent default linux_base port).

A note of cau­tion, there are instruc­tions out there which tell how to install more recent linux_base ports into FreeB­SD releas­es which do not have them as default. You do this on your own risk, it may or may not work. It depends upon which pro­grams you use and at which ver­sion those pro­grams are (or more tech­ni­cal­ly, which ker­nel fea­tures they depend upon). If it does not work for you, you just have two pos­si­bil­i­ties: revert back and for­get about it, or update your FreeB­SD ver­sion to a more recent one (but it could be the case, that even the most recent devel­op­ment ver­sion of FreeB­SD does not have sup­port for what you need).

Lin­ux libraries and “ELF file OS ABI invalid”-error messages

Due to the above explained fact about file/directory tricks by the ker­nel, you have to be care­ful with (addi­tion­al) Lin­ux libraries. When a Lin­ux pro­gram needs some libraries, sev­er­al direc­to­ries (spec­i­fied in /compat/linux/etc/ld.so.conf) are searched. Let us assume that the /compat/linux/etc/ld.so.conf spec­i­fies to search in /A, /B and /C. This means the FreeB­SD ker­nel first gets a request to open /A/libXYZ. Because of this he first tries /compat/linux/A/libXYZ, and if it does not exist he tries /A/libXYZ. When this fails too, the Lin­ux run­time link­er tries the next direc­to­ry in the con­fig, so that the ker­nel looks now for /compat/linux/B/libXYZ and if it does not exist for /B/libXYZ.

Now assume that libXYZ is in /compat/linux/C/ as a Lin­ux library, and in /B as a FreeB­SD library. This means that the ker­nel will first find the FreeB­SD library /B/libXYZ. The Lin­ux bina­ry which needs it can not do any­thing with this FreeB­SD library (which depends upon the FreeB­SD syscall table and FreeB­SD sym­bols from e.g. libc), and the Lin­ux run­time link­er will bail out because of this (actu­al­ly he sees that the lin is not of the required type by read­ing the ELF head­er of it). Unfor­tu­nate­ly the Lin­ux run­time link­er will not con­tin­ue to search for anoth­er library with the same name in anoth­er direc­to­ry (at least this was the case last time I checked and mod­i­fied the order in which the Lin­ux run­time link­er search­es for libraries… this has been a while, so he may be smarter now) and you will see the above error mes­sage (if you start­ed the lin­ux pro­gram in a terminal).

The bot­tom line of all this is: the error mes­sage about ELF file OS ABI invalid just means that the Lin­ux pro­gram was not able to find the cor­rect Lin­ux library and got a FreeB­SD library instead. Go, install the cor­re­spond­ing Lin­ux library, and make sure the Lin­ux pro­gram can find it instead of the FreeB­SD library (do not for­get to run “/compat/linux/sbin/ldconfig ‑r /compat/linux” if you make changes by hand instead of using a port, else your changes may not be tak­en into account).

Con­straints regard­ing chroot into /compat/linux

The linux_base ports are designed to have a nice install-and-start expe­ri­ence. The draw­back of this is, that there is not a full Lin­ux sys­tem in /compat/linux, so doing a chroot into /compat/linux will cause trou­ble (depend­ing on what you want to do). If you want to chroot into the lin­ux sys­tem on your FreeB­SD machine, you bet­ter install a linux_dist port. A linux_dist port can be installed in par­al­lel to a linux_base port. Both of them are inde­pen­dent and as such you need to redo/copy con­fig­u­ra­tion changes you want to have in both environments.

All inter­nal ser­vices migrat­ed to IPv6

In the last days I migrat­ed all my inter­nal ser­vices to IPv6.

All my jails have an IPv4 and an IPv6 address now. All Apach­es (I have one for my pic­ture gallery, one for web­mail, and one for inter­nal man­age­ment) now lis­ten on the inter­nal IPv6 address too. Squid is updat­ed from 2.x to 3.1 (the most recent ver­sion in the Ports Col­lec­tion) and I added some IPv6 ACLs. The inter­nal Post­fix is con­fig­ured to han­dle IPv6 too (it is deliv­er­ing every­thing via an authen­ti­cat­ed and encrypt­ed chan­nel to a machine with a sta­t­ic IPv4 address for final deliv­ery). My MySQL does not need an IPv6 address, as it is only lis­ten­ing to requests via IPC (the sock­et is hardlinked between jails). All ssh dae­mons are con­fig­ured to lis­ten to IPv6 too. The IMAP and CUPS serv­er was pick­ing the new IPv6 address­es auto­mat­i­cal­ly. I also updat­ed Sam­ba to han­dle IPv6, but due to lack of a Win­dows machine which prefers IPv6 over IPv4 for CIFS access (at least I think my Win­dows XP net­book only tries IPv4 con­nec­tions) I can not real­ly test this.

Only my Wii is a lit­tle bit behind, and I have not checked if my Sony-TV will DTRT (but for this I first have to get some time to have a look if I have to update my DD-WRT firmware on the lit­tle WLAN-router which is “extend­ing the cable” from the TV to the inter­nal net­work, and I have to look how to con­fig­ure IPv6 with DD-WRT).

ZFS and NFS / on-disk-cache

In the FreeB­SD mail­inglists I stum­bled over  a post which refers to a blog-post which describes why ZFS seems to be slow (on Solaris).

In short: ZFS guar­an­tees that the NFS client does not expe­ri­ence silent cor­rup­tion of data (NFS serv­er crash and loss of data which is sup­posed to be already on disk for the client). A rec­om­men­da­tion is to enable the disk-cache for disks which are com­plete­ly used by ZFS, as ZFS (unlike UFS) is aware of disk-caches. This increas­es the per­for­mance to what UFS is deliv­er­ing in the NFS case.

There is no in-deep descrip­tion of what it means that ZFS is aware of disk-caches, but I think this is a ref­er­ence to the fact that ZFS is send­ing a flush com­mand to the disk at the right moments. Let­ting aside the fact that there are disks out there which lie to you about this (they tell the flush com­mand fin­ished when it is not), this would mean that this is sup­port­ed in FreeB­SD too.

So every­one who is cur­rent­ly dis­abling the ZIL to get bet­ter NFS per­for­mance (and accept silent data cor­rup­tion on the client side): move your zpool to ded­i­cat­ed (no oth­er real FS than ZFS, swap and dump devices are OK) disks (hon­est ones) and enable the disk-caches instead of dis­abling the ZIL.

I also rec­om­mend that peo­ple which have ZFS already on ded­i­cat­ed (and hon­est) disks have a look if the disk-caches are enabled.

IPv6 in my LAN

After enabling IPv6 in my WLAN router, I also enabled IPv6 in my FreeB­SD sys­tems. I have to tell that the IPv6 chap­ter in the FreeB­SD hand­book does not con­tain as much infor­ma­tion as I would like to have about this.

Con­fig­ur­ing the inter­faces of my two 9‑current sys­tems to also car­ry a spe­cif­ic IPv6 address (an easy one from the ULA I use) was easy after read­ing the man-page for rc.conf. After a lit­tle bit of exper­i­ment­ing it came down to:

ifconfig_rl0_ipv6=“inet6 ::2:1 pre­fixlen 64 accept_rtadv”
ipv6_defaultrouter=”<router address>”

Apart from this address (I chose it because the IPv4 address ends in “.2”, this way I can add some easy to remem­ber address­es for this machine if need­ed), I also have two auto­mat­i­cal­ly con­fig­ured address­es. One is with the same ULA and some not so easy to remem­ber end (con­struct­ed from the MAC address), and one is from the offi­cial pre­fix the router con­struct­ed out of the offi­cial IPv4 address from the ISP (+ the same end than the oth­er end).

Addi­tion­al­ly I also have all my jails on this machine with an IPv6 address now (yes, they are like “…:2:100” with the :100 because the IPv4 address ends in “.100”). Still TODO is the con­ver­sion of all the ser­vices in the jails to also lis­ten on the IPv6 address.

I already changed the con­fig of my inter­nal DNS to have the IPv6 address­es for all sys­tems, lis­ten on the IPv6 address (when I add an IPv6 net­work to allow-query/allow-query-cache/allow-recursion bind does not want to start). And as I was there, I also enabled the DNSSEC ver­i­fi­ca­tion (but I get a lot of error mes­sages in the logs: “unable to con­vert errno to isc_result: 42: Pro­to­col not avail­able”, one search result which talks exact­ly about this error tells it is a “cos­met­ic error”…).

I noticed that an IPv6 ping between two phys­i­cal machines takes a lit­tle bit more time than an IPv4 ping (no IPsec enabled). It sur­prised me that this is such a notice­able dif­fer­ence (not with­in the std-dev at all):

— m87.Leidinger.net ping statistics —
10 pack­ets trans­mit­ted, 10 pack­ets received, 0.0% pack­et loss
round-trip min/avg/max/stddev = 0.168÷0.193÷0.220÷0.017 ms

— m87.Leidinger.net ping6 statistics —
10 pack­ets trans­mit­ted, 10 pack­ets received, 0.0% pack­et loss
round-trip min/avg/max/std-dev = 0.207÷0.325÷0.370÷0.047 ms

The infor­ma­tion I miss in the FreeB­SD hand­book in the IPv6 chap­ter is what those oth­er IPv6 relat­ed ser­vices are and when/how to con­fig­ure them. I have an idea now what this rad­vd is, but I am not sure what the inter­ac­tion is with the accept_rtadv set­ting for ifcon­fig (and I do not think I need it, as my WLAN router seems to do it already). I know that I get the IPv6-friendly net­work neigh­bor­hood dis­played with ndp(8). I did not have a look at enabling IPv6 mul­ti­cast sup­port in FreeB­SD, and I do not know what those oth­er IPv6 options for rc.conf do.