Stat­ic DTrace probes for the linuxu­lat­or up­dated

I got a little bit of time to up­date my 3 year old work of adding stat­ic DTrace probes to the linuxu­lat­or.

The changes are not in HEAD, but in my linuxulator-​dtrace branch. The re­vi­sion to have a look at is r230910. In­cluded are some DTrace scripts:

  • script to check in­tern­al locks
  • script to trace fu­texes
  • script to gen­er­ate stats for DTra­ci­fied linuxu­lat­or parts
  • script to check for er­rors:
    • emu­la­tion er­rors (un­sup­por­ted stuff, un­known stuff, …)
    • ker­nel er­rors (re­source short­age, …)
    • pro­gram­ming er­rors (er­rors which can hap­pen if someone made a mis­take, but should not hap­pen)

The programming-​error checks give hints about user­land pro­gram­ming er­rors re­spect­ively a hint about the reas­on of er­ror re­turn val­ues due to re­source short­age or maybe a wrong com­bin­a­tion of para­met­ers. An ex­ample er­ror mes­sage for this case is “Ap­plic­a­tion %s is­sued a sy­sctl which failed the length restrictions.\nThe length passed is %d, the min length sup­por­ted is 1 and the max length sup­por­ted is %d.\n”.

The stats-​script (tailored spe­cially to the linuxu­lat­or, but this can eas­ily be ex­ten­ded to the rest of the ker­nel) can re­port about:

  • num­ber of calls to a ker­nel func­tion per ex­ecut­able bin­ary (not per PID!): al­lows to see where an op­tim­iz­a­tion would be be­ne­fi­cial for a giv­en ap­plic­a­tion
  • graph of CPU time spend in ker­nel func­tions per ex­ecut­able bin­ary: to­geth­er with the num­ber of calls to this func­tion this al­lows to de­term­ine if a ker­nel op­tim­iz­a­tion would be be­ne­fi­cial /​ is pos­sible for a giv­en ap­plic­a­tion
  • graph of longest run­ning (CPU-​time!) ker­nel func­tion in total
  • tim­ing stat­ist­ics for the emul_​lock
  • graph of longest held (CPU-​time!) locks

Un­for­tu­nately this can not be com­mit­ted to HEAD as-​is. The DTrace SDT pro­vider can not handle probes which are ad­ded to the ker­nel after the SDT pro­vider is already loaded. This means that you either have to com­pile the linuxu­lat­or stat­ic­ally in­to the ker­nel, or you have to load the SDT ker­nel mod­ule after the linuxu­lat­or mod­ule is loaded. If you do not re­spect this, you get a ker­nel pan­ic on first ac­cess of one of the pro­viders in the linuxu­lat­or (AFAIR this in­cludes list­ing the probes avail­able in the ker­nel).

The FreeBSD-​linuxulator ex­plained (for de­velopers): ba­sics

The last post about the Linuxu­lat­or where I ex­plained the Linuxu­lat­or from an user point of view got some good amount of at­ten­tion. Triggered by a re­cent ex­plan­a­tion of the Linuxu­lat­or er­rno stuff to a fel­low FreeBSD de­veloper I de­cided so see if more de­velopers are in­ter­ested in some more info too…

The sy­scall vec­tor

In sys/​linux/linux_sysvec.c is all the ba­sic setup to handle Linux “sys­tem stuff” in FreeBSD. The “sys­tem stuff” is about trans­lat­ing FreeBSD er­rnos to Linux er­rnos, about trans­lat­ing FreeBSD sig­nals to Linux sig­nales, about hand­ling Linux traps, and about set­ting up the FreeBSD sys­tem vec­tor (the ker­nel struc­ture which con­tains all the data to identi­fy when a Linux pro­gram is called and to be able to look­up the right ker­nel func­tions for e.g. sy­scalls and ioctls).

There is not only one sy­scall vec­tor, there is one for a.out (struct sysentvec linux_​sysvec) and one for ELF (struct sysentvec elf_​linux_​sysvec) bin­ar­ies (at least on i386, for oth­er ar­chi­tec­tures it may not make sense to have the a.out stuff, as they maybe nev­er seen any a.out Linux bin­ary).

The ELF AUX args

When an ELF im­age is ex­ecuted, the Linuxu­lat­or adds some runtime in­form­a­tion (like pages­ize, uid, guid, …) so that the user­land can query this in­form­a­tion which is not stat­ic at build-​time eas­ily. This is handled in the elf_​linux_​fixup func­tion(). If you see some er­ror mes­sages about miss­ing ELF notes from e.g. glibc, this is the place to add this in­form­a­tion to. It would not be bad from time to time to have a look what Linux is provid­ing and miss­ing pieces there. FreeBSD does not has an auto­mated way of do­ing this, and I am not aware of someone who reg­u­larly checks this. There is a little bit more info about ELF notes avail­able in a mes­sage to one of the FreeBSD mail­ing lists, it also has an ex­ample how to read out this data.


Linux and FreeBSD do not share the same point of view how a trap shall be handled (SIGBUS or SIGSEGV), the cor­res­pond­ing de­cision mak­ing is handled in translate_​traps() and a trans­la­tion table is avail­able as _​bsd_​to_​linux_​trapcode.


The val­ues for the sig­nal names are not the same in FreeBSD and Linux. The trans­la­tion tables are called linux_​to_​bsd_​signal and bsd_​to_​linux_​signal. The trans­la­tion is a fea­ture of the sy­scall vec­tor (= auto­mat­ic).


The val­ues for the er­rno names are not the same in FreeBSD and Linux. The trans­la­tion table is called bsd_​to_​linux_​errno. Re­turn­ing an er­rno in one of the Linux sy­scalls will trig­ger an auto­mat­ic trans­la­tion from the FreeBSD er­rno value to the Linux er­rno value. This means that FreeBSD er­rnos have to be re­turned (e.g. FreeBSD ENOSYS=78) and the Linux pro­gram will re­ceive the Linux value (e.g. Linux ENOSYS=38, and as the Linux ker­nel re­turns neg­at­ive er­rnos, the linux pro­gram will get -38).

If you see some­where an “-ESOMETHING” in the Linuxu­lat­or code, this is either a bug, or some clever/​tricky/​dangerous use of the sign-​bit to en­code some info (e.g. in the fu­tex code there is a func­tion which re­turns -ENOSYS, but the sign-​bit is used as an er­ror in­dic­at­or and the call­ing code is re­spons­ible to trans­late neg­at­ive er­rnos in­to pos­it­ive ones).


The Linux sy­scalls are defined sim­il­ar to the FreeBSD ones. There is a map­ping table (sys/linux/syscalls.master) between sy­scall num­bers and the cor­res­pond­ing func­tions. This table is used to gen­er­ate code (“make sysent” in sys/​/​linux/​) which does what is ne­ces­sary.

The FreeBSD-​linuxulator ex­plained (for users)

After an­oth­er mail where I ex­plained a little bit of the linuxu­lat­or be­ha­vi­or, it is time to try to make an easy text which I can ref­er­ence in fu­ture an­swers. If someone wants to add parts of this ex­plan­a­tion to the FreeBSD hand­book, go ahead.

Linux emu­la­tion? No, “nat­ive” ex­e­cu­tion (sort of)!

First, the linuxu­lat­or is not an emu­la­tion. It is “just” a bin­ary in­ter­face which is a little bit dif­fer­ent from the FreeBSD-“native”-one. This means that the bin­ary files in FreeBSD and Linux are both files which com­ply to the ELF spe­cific­a­tion.

When the FreeBSD ker­nel loads an ELF file, it looks if it is a FreeBSD ELF file or a Linux ELF file (or some oth­er fla­vor it knows about). Based upon this it looks up ap­pro­pri­ate ac­tions in a table for this bin­ary (it can also dif­fer­en­ti­ate between 64-​bit and 32-​bit, and prob­ably oth­er things too).

The FreeBSD-​table is al­ways com­piled in (for a bet­ter big pic­ture: at least on an AMD/​Intel 64-​bit plat­form there is also the pos­sib­il­ity to in­clude a 32-​bit ver­sion of this table ad­di­tion­ally, to be able to ex­ecute 32-​bit pro­grams on 64-​bit sys­tems), and oth­er ones like the Linux one can be loaded ad­di­tion­ally in­to the ker­nel (or build stat­ic­ally in the ker­nel, if de­sired).

Those tables con­tain some para­met­ers and point­ers which al­low to ex­ecute the bin­ary. If a pro­gram is mak­ing a sys­tem call, the ker­nel will look up the cor­rect func­tion in­side this table. It will do this for FreeBSD bin­ar­ies, and for Linux bin­ar­ies. This means that there is no emulation/​simulation (over­head) go­ing on… at least ideally. Some be­ha­vi­or is a little bit dif­fer­ently between Linux and FreeBSD, so that a little bit of translation/​house-​keeping has to go on for some Linux sys­tem calls for the un­der­ly­ing FreeBSD ker­nel func­tions.

This means that a lot of Linux stuff in FreeBSD is handled at the same speed as if this Linux pro­gram would be a FreeBSD pro­gram.

Linux file/​directory tricks

When the ker­nel de­tects a Linux pro­gram, it is also play­ing some tricks with files and dir­ect­or­ies (also a prop­erty of the above men­tioned table in the ker­nel, so the­or­et­ic­ally the ker­nel could play tricks for FreeBSD pro­grams too).

If you look up for a file or dir­ect­ory /​A, the ker­nel will first look for /​compat/​linux/​A, and if it does not find it, it will look for /​A. This is im­port­ant! For ex­ample if you have an empty /​compat/​linux/​home, any ap­plic­a­tion which wants to dis­play the con­tents of /​home will show /​compat/​linux/​home. As it is empty, you see noth­ing. If this ap­plic­a­tion does not al­low you to enter a dir­ect­ory manu­ally via the key­board, you have lost (ok, you can re­move /​compat/​linux/​home or fill it with what you want to have). If you can enter a dir­ect­ory via the key­board, you could enter /​home/​yourlogin, this would first let the ker­nel look for /​compat/​linux/​home/​yourlogin, and as it can not find it then have a look for /​home/​yourlogin (which we as­sume is there), and as such would dis­play the con­tents of your home dir­ect­ory.

This im­plies sev­er­al things:

  • you can hide FreeBSD dir­ect­ory con­tents from Linux pro­grams while still be­ing able to ac­cess the con­tent
  • “badly” pro­grammed Linux ap­plic­a­tions (more cor­rectly: Linux pro­grams which make as­sump­tions which do not hold in FreeBSD) can pre­vent you from ac­cess­ing FreeBSD files, or files which are the same in Linux and FreeBSD (like /​etc/​group which is not avail­able in /​compat/​linux in the linux_​base ports, so that the FreeBSD one is read)
  • you can have dif­fer­ent files for Linux than for FreeBSD

The Linux user­land

The linux_​base port in FreeBSD is com­ing from a plain in­stall­a­tion of Linux pack­ages. The dif­fer­ence is that some files are de­leted, either be­cause we can not use them in the linuxu­lat­or, or be­cause they ex­ist already in the FreeBSD tree at the same place and we want that the Linux pro­grams use the FreeBSD file (/​etc/​group and /​etc/​passwd come to mind). The in­stall­a­tion also marks bin­ary pro­grams as Linux pro­grams, so that the ker­nel knows which kernel-​table to con­sult for sys­tem calls and such (this is not really ne­ces­sary for all bin­ary pro­grams, but it is harder to script the cor­rect de­tec­tion lo­gic, than to just “brand” all bin­ary pro­grams).

Ad­di­tion­ally some con­fig­ur­a­tions are made to (hope­fully) make it do the right thing out of the box. The com­plete setup of the linux_​base ports is done to let Linux pro­grams in­teg­rate in­to FreeBSD. This means if you start acror­ead or skype, you do not want to have to have to con­fig­ure some things in /​compat/​linux/​etc/​ first to have your fonts look the same and your user IDs re­solved to names (this does not work if you use LDAP or ker­ber­os or oth­er dir­ect­ory ser­vices for the user/​group ID man­age­ment, you need to con­fig­ure this your­self). All this should just work and the ap­plic­a­tion win­dows shall just pop up on your screen so that you can do what you want to do. Some linux_​base ports also do not work on all FreeBSD re­leases. This can be be­cause some ker­nel fea­tures which this linux_​base ports de­pends upon is not avail­able (yet) in FreeBSD. Be­cause of this you should not choice a linux_​base port your­self. Just go and in­stall the pro­gram from the Ports Col­lec­tion and let it in­stall the cor­rect linux_​base port auto­mat­ic­ally (a dif­fer­ent FreeBSD re­lease may have a dif­fer­ent de­fault linux_​base port).

A note of cau­tion, there are in­struc­tions out there which tell how to in­stall more re­cent linux_​base ports in­to FreeBSD re­leases which do not have them as de­fault. You do this on your own risk, it may or may not work. It de­pends upon which pro­grams you use and at which ver­sion those pro­grams are (or more tech­nic­ally, which ker­nel fea­tures they de­pend upon). If it does not work for you, you just have two pos­sib­il­it­ies: re­vert back and for­get about it, or up­date your FreeBSD ver­sion to a more re­cent one (but it could be the case, that even the most re­cent de­vel­op­ment ver­sion of FreeBSD does not have sup­port for what you need).

Linux lib­rar­ies and “ELF file OS ABI invalid”-error mes­sages

Due to the above ex­plained fact about file/​directory tricks by the ker­nel, you have to be care­ful with (ad­di­tion­al) Linux lib­rar­ies. When a Linux pro­gram needs some lib­rar­ies, sev­er­al dir­ect­or­ies (spe­cified in /compat/linux/etc/ are searched. Let us as­sume that the /compat/linux/etc/ spe­cifies to search in /​A, /​B and /​C. This means the FreeBSD ker­nel first gets a re­quest to open /​A/​libXYZ. Be­cause of this he first tries /​compat/​linux/​A/​libXYZ, and if it does not ex­ist he tries /​A/​libXYZ. When this fails too, the Linux runtime linker tries the next dir­ect­ory in the con­fig, so that the ker­nel looks now for /​compat/​linux/​B/​libXYZ and if it does not ex­ist for /​B/​libXYZ.

Now as­sume that libXYZ is in /​compat/​linux/​C/​ as a Linux lib­rary, and in /​B as a FreeBSD lib­rary. This means that the ker­nel will first find the FreeBSD lib­rary /​B/​libXYZ. The Linux bin­ary which needs it can not do any­thing with this FreeBSD lib­rary (which de­pends upon the FreeBSD sy­scall table and FreeBSD sym­bols from e.g. libc), and the Linux runtime linker will bail out be­cause of this (ac­tu­ally he sees that the lin is not of the re­quired type by read­ing the ELF head­er of it). Un­for­tu­nately the Linux runtime linker will not con­tin­ue to search for an­oth­er lib­rary with the same name in an­oth­er dir­ect­ory (at least this was the case last time I checked and mod­i­fied the or­der in which the Linux runtime linker searches for lib­rar­ies… this has been a while, so he may be smarter now) and you will see the above er­ror mes­sage (if you star­ted the linux pro­gram in a ter­min­al).

The bot­tom line of all this is: the er­ror mes­sage about ELF file OS ABI in­val­id just means that the Linux pro­gram was not able to find the cor­rect Linux lib­rary and got a FreeBSD lib­rary in­stead. Go, in­stall the cor­res­pond­ing Linux lib­rary, and make sure the Linux pro­gram can find it in­stead of the FreeBSD lib­rary (do not for­get to run “/​compat/​linux/​sbin/​ldconfig -r /​compat/​linux” if you make changes by hand in­stead of us­ing a port, else your changes may not be taken in­to ac­count).

Con­straints re­gard­ing ch­root in­to /​compat/​linux

The linux_​base ports are de­signed to have a nice install-​and-​start ex­per­i­ence. The draw­back of this is, that there is not a full Linux sys­tem in /​compat/​linux, so do­ing a ch­root in­to /​compat/​linux will cause trouble (de­pend­ing on what you want to do). If you want to ch­root in­to the linux sys­tem on your FreeBSD ma­chine, you bet­ter in­stall a linux_​dist port. A linux_​dist port can be in­stalled in par­al­lel to a linux_​base port. Both of them are in­de­pend­ent and as such you need to redo/​copy con­fig­ur­a­tion changes you want to have in both en­vir­on­ments.