Sta­t­ic DTrace probes for the lin­ux­u­la­tor updat­ed

I got a lit­tle bit of time to update my 3 year old work of adding sta­t­ic DTrace probes to the lin­ux­u­la­tor.

The changes are not in HEAD, but in my linuxulator-dtrace branch. The revi­sion to have a look at is r230910. Includ­ed are some DTrace scripts:

  • script to check inter­nal locks
  • script to trace futex­es
  • script to gen­er­ate stats for DTracified lin­ux­u­la­tor parts
  • script to check for errors:
    • emu­la­tion errors (unsup­port­ed stuff, unknown stuff, …)
    • ker­nel errors (resource short­age, …)
    • pro­gram­ming errors (errors which can hap­pen if some­one made a mis­take, but should not hap­pen)

The programming-error checks give hints about user­land pro­gram­ming errors respec­tive­ly a hint about the rea­son of error return val­ues due to resource short­age or maybe a wrong com­bi­na­tion of para­me­ters. An exam­ple error mes­sage for this case is “Appli­ca­tion %s issued a sysctl which failed the length restrictions.\nThe length passed is %d, the min length sup­port­ed is 1 and the max length sup­port­ed is %d.\n”.

The stats-script (tai­lored spe­cial­ly to the lin­ux­u­la­tor, but this can eas­i­ly be extend­ed to the rest of the ker­nel) can report about:

  • num­ber of calls to a ker­nel func­tion per exe­cutable bina­ry (not per PID!): allows to see where an opti­miza­tion would be ben­e­fi­cial for a giv­en appli­ca­tion
  • graph of CPU time spend in ker­nel func­tions per exe­cutable bina­ry: togeth­er with the num­ber of calls to this func­tion this allows to deter­mine if a ker­nel opti­miza­tion would be ben­e­fi­cial / is pos­si­ble for a giv­en appli­ca­tion
  • graph of longest run­ning (CPU-time!) ker­nel func­tion in total
  • tim­ing sta­tis­tics for the emul_lock
  • graph of longest held (CPU-time!) locks

Unfor­tu­nate­ly this can not be com­mit­ted to HEAD as-is. The DTrace SDT provider can not han­dle probes which are added to the ker­nel after the SDT provider is already loaded. This means that you either have to com­pile the lin­ux­u­la­tor sta­t­i­cal­ly into the ker­nel, or you have to load the SDT ker­nel mod­ule after the lin­ux­u­la­tor mod­ule is loaded. If you do not respect this, you get a ker­nel pan­ic on first access of one of the providers in the lin­ux­u­la­tor (AFAIR this includes list­ing the probes avail­able in the ker­nel).

Send to Kin­dle

The FreeBSD-linuxulator explained (for devel­op­ers): basics

The last post about the Lin­ux­u­la­tor where I explained the Lin­ux­u­la­tor from an user point of view got some good amount of atten­tion. Trig­gered by a recent expla­na­tion of the Lin­ux­u­la­tor errno stuff to a fel­low FreeB­SD devel­op­er I decid­ed so see if more devel­op­ers are inter­est­ed in some more info too…

The syscall vec­tor

In sys/lin­ux/linux_sysvec.c is all the basic set­up to han­dle Lin­ux “sys­tem stuff” in FreeB­SD. The “sys­tem stuff” is about trans­lat­ing FreeB­SD errnos to Lin­ux errnos, about trans­lat­ing FreeB­SD sig­nals to Lin­ux sig­nales, about han­dling Lin­ux traps, and about set­ting up the FreeB­SD sys­tem vec­tor (the ker­nel struc­ture which con­tains all the data to iden­ti­fy when a Lin­ux pro­gram is called and to be able to lookup the right ker­nel func­tions for e.g. syscalls and ioctls).

There is not only one syscall vec­tor, there is one for a.out (struct sysentvec linux_sysvec) and one for ELF (struct sysentvec elf_linux_sysvec) bina­ries (at least on i386, for oth­er archi­tec­tures it may not make sense to have the a.out stuff, as they maybe nev­er seen any a.out Lin­ux bina­ry).

The ELF AUX args

When an ELF image is exe­cut­ed, the Lin­ux­u­la­tor adds some run­time infor­ma­tion (like page­size, uid, guid, …) so that the user­land can query this infor­ma­tion which is not sta­t­ic at build-time eas­i­ly. This is han­dled in the elf_linux_fixup func­tion(). If you see some error mes­sages about miss­ing ELF notes from e.g. glibc, this is the place to add this infor­ma­tion to. It would not be bad from time to time to have a look what Lin­ux is pro­vid­ing and miss­ing pieces there. FreeB­SD does not has an auto­mat­ed way of doing this, and I am not aware of some­one who reg­u­lar­ly checks this. There is a lit­tle bit more info about ELF notes avail­able in a mes­sage to one of the FreeB­SD mail­ing lists, it also has an exam­ple how to read out this data.

Traps

Lin­ux and FreeB­SD do not share the same point of view how a trap shall be han­dled (SIGBUS or SIGSEGV), the cor­re­spond­ing deci­sion mak­ing is han­dled in translate_traps() and a trans­la­tion table is avail­able as _bsd_to_linux_trapcode.

Sig­nals

The val­ues for the sig­nal names are not the same in FreeB­SD and Lin­ux. The trans­la­tion tables are called linux_to_bsd_signal and bsd_to_linux_signal. The trans­la­tion is a fea­ture of the syscall vec­tor (= auto­mat­ic).

Errnos

The val­ues for the errno names are not the same in FreeB­SD and Lin­ux. The trans­la­tion table is called bsd_to_linux_errno. Return­ing an errno in one of the Lin­ux syscalls will trig­ger an auto­mat­ic trans­la­tion from the FreeB­SD errno val­ue to the Lin­ux errno val­ue. This means that FreeB­SD errnos have to be returned (e.g. FreeB­SD ENOSYS=78) and the Lin­ux pro­gram will receive the Lin­ux val­ue (e.g. Lin­ux ENOSYS=38, and as the Lin­ux ker­nel returns neg­a­tive errnos, the lin­ux pro­gram will get ‑38).

If you see some­where an “-ESOMETHING” in the Lin­ux­u­la­tor code, this is either a bug, or some clever/tricky/dangerous use of the sign-bit to encode some info (e.g. in the futex code there is a func­tion which returns ‑ENOSYS, but the sign-bit is used as an error indi­ca­tor and the call­ing code is respon­si­ble to trans­late neg­a­tive errnos into pos­i­tive ones).

Syscalls

The Lin­ux syscalls are defined sim­i­lar to the FreeB­SD ones. There is a map­ping table (sys/linux/syscalls.master) between syscall num­bers and the cor­re­spond­ing func­tions. This table is used to gen­er­ate code (“make sysent” in sys//linux/) which does what is nec­es­sary.

Send to Kin­dle

The FreeBSD-linuxulator explained (for users)

After anoth­er mail where I explained a lit­tle bit of the lin­ux­u­la­tor behav­ior, it is time to try to make an easy text which I can ref­er­ence in future answers. If some­one wants to add parts of this expla­na­tion to the FreeB­SD hand­book, go ahead.

Lin­ux emu­la­tion? No, “native” exe­cu­tion (sort of)!

First, the lin­ux­u­la­tor is not an emu­la­tion. It is “just” a bina­ry inter­face which is a lit­tle bit dif­fer­ent from the FreeBSD-“native”-one. This means that the bina­ry files in FreeB­SD and Lin­ux are both files which com­ply to the ELF spec­i­fi­ca­tion.

When the FreeB­SD ker­nel loads an ELF file, it looks if it is a FreeB­SD ELF file or a Lin­ux ELF file (or some oth­er fla­vor it knows about). Based upon this it looks up appro­pri­ate actions in a table for this bina­ry (it can also dif­fer­en­ti­ate between 64-bit and 32-bit, and prob­a­bly oth­er things too).

The FreeBSD-table is always com­piled in (for a bet­ter big pic­ture: at least on an AMD/Intel 64-bit plat­form there is also the pos­si­bil­i­ty to include a 32-bit ver­sion of this table addi­tion­al­ly, to be able to exe­cute 32-bit pro­grams on 64-bit sys­tems), and oth­er ones like the Lin­ux one can be loaded addi­tion­al­ly into the ker­nel (or build sta­t­i­cal­ly in the ker­nel, if desired).

Those tables con­tain some para­me­ters and point­ers which allow to exe­cute the bina­ry. If a pro­gram is mak­ing a sys­tem call, the ker­nel will look up the cor­rect func­tion inside this table. It will do this for FreeB­SD bina­ries, and for Lin­ux bina­ries. This means that there is no emulation/simulation (over­head) going on… at least ide­al­ly. Some behav­ior is a lit­tle bit dif­fer­ent­ly between Lin­ux and FreeB­SD, so that a lit­tle bit of translation/house-keeping has to go on for some Lin­ux sys­tem calls for the under­ly­ing FreeB­SD ker­nel func­tions.

This means that a lot of Lin­ux stuff in FreeB­SD is han­dled at the same speed as if this Lin­ux pro­gram would be a FreeB­SD pro­gram.

Lin­ux file/directory tricks

When the ker­nel detects a Lin­ux pro­gram, it is also play­ing some tricks with files and direc­to­ries (also a prop­er­ty of the above men­tioned table in the ker­nel, so the­o­ret­i­cal­ly the ker­nel could play tricks for FreeB­SD pro­grams too).

If you look up for a file or direc­to­ry /A, the ker­nel will first look for /compat/linux/A, and if it does not find it, it will look for /A. This is impor­tant! For exam­ple if you have an emp­ty /compat/linux/home, any appli­ca­tion which wants to dis­play the con­tents of /home will show /compat/linux/home. As it is emp­ty, you see noth­ing. If this appli­ca­tion does not allow you to enter a direc­to­ry man­u­al­ly via the key­board, you have lost (ok, you can remove /compat/linux/home or fill it with what you want to have). If you can enter a direc­to­ry via the key­board, you could enter /home/yourlogin, this would first let the ker­nel look for /compat/linux/home/yourlogin, and as it can not find it then have a look for /home/yourlogin (which we assume is there), and as such would dis­play the con­tents of your home direc­to­ry.

This implies sev­er­al things:

  • you can hide FreeB­SD direc­to­ry con­tents from Lin­ux pro­grams while still being able to access the con­tent
  • bad­ly” pro­grammed Lin­ux appli­ca­tions (more cor­rect­ly: Lin­ux pro­grams which make assump­tions which do not hold in FreeB­SD) can pre­vent you from access­ing FreeB­SD files, or files which are the same in Lin­ux and FreeB­SD (like /etc/group which is not avail­able in /compat/linux in the linux_base ports, so that the FreeB­SD one is read)
  • you can have dif­fer­ent files for Lin­ux than for FreeB­SD

The Lin­ux user­land

The linux_base port in FreeB­SD is com­ing from a plain instal­la­tion of Lin­ux pack­ages. The dif­fer­ence is that some files are delet­ed, either because we can not use them in the lin­ux­u­la­tor, or because they exist already in the FreeB­SD tree at the same place and we want that the Lin­ux pro­grams use the FreeB­SD file (/etc/group and /etc/passwd come to mind). The instal­la­tion also marks bina­ry pro­grams as Lin­ux pro­grams, so that the ker­nel knows which kernel-table to con­sult for sys­tem calls and such (this is not real­ly nec­es­sary for all bina­ry pro­grams, but it is hard­er to script the cor­rect detec­tion log­ic, than to just “brand” all bina­ry pro­grams).

Addi­tion­al­ly some con­fig­u­ra­tions are made to (hope­ful­ly) make it do the right thing out of the box. The com­plete set­up of the linux_base ports is done to let Lin­ux pro­grams inte­grate into FreeB­SD. This means if you start acrore­ad or skype, you do not want to have to have to con­fig­ure some things in /compat/linux/etc/ first to have your fonts look the same and your user IDs resolved to names (this does not work if you use LDAP or ker­beros or oth­er direc­to­ry ser­vices for the user/group ID man­age­ment, you need to con­fig­ure this your­self). All this should just work and the appli­ca­tion win­dows shall just pop up on your screen so that you can do what you want to do. Some linux_base ports also do not work on all FreeB­SD releas­es. This can be because some ker­nel fea­tures which this linux_base ports depends upon is not avail­able (yet) in FreeB­SD. Because of this you should not choice a linux_base port your­self. Just go and install the pro­gram from the Ports Col­lec­tion and let it install the cor­rect linux_base port auto­mat­i­cal­ly (a dif­fer­ent FreeB­SD release may have a dif­fer­ent default linux_base port).

A note of cau­tion, there are instruc­tions out there which tell how to install more recent linux_base ports into FreeB­SD releas­es which do not have them as default. You do this on your own risk, it may or may not work. It depends upon which pro­grams you use and at which ver­sion those pro­grams are (or more tech­ni­cal­ly, which ker­nel fea­tures they depend upon). If it does not work for you, you just have two pos­si­bil­i­ties: revert back and for­get about it, or update your FreeB­SD ver­sion to a more recent one (but it could be the case, that even the most recent devel­op­ment ver­sion of FreeB­SD does not have sup­port for what you need).

Lin­ux libraries and “ELF file OS ABI invalid”-error mes­sages

Due to the above explained fact about file/directory tricks by the ker­nel, you have to be care­ful with (addi­tion­al) Lin­ux libraries. When a Lin­ux pro­gram needs some libraries, sev­er­al direc­to­ries (spec­i­fied in /compat/linux/etc/ld.so.conf) are searched. Let us assume that the /compat/linux/etc/ld.so.conf spec­i­fies to search in /A, /B and /C. This means the FreeB­SD ker­nel first gets a request to open /A/libXYZ. Because of this he first tries /compat/linux/A/libXYZ, and if it does not exist he tries /A/libXYZ. When this fails too, the Lin­ux run­time link­er tries the next direc­to­ry in the con­fig, so that the ker­nel looks now for /compat/linux/B/libXYZ and if it does not exist for /B/libXYZ.

Now assume that libXYZ is in /compat/linux/C/ as a Lin­ux library, and in /B as a FreeB­SD library. This means that the ker­nel will first find the FreeB­SD library /B/libXYZ. The Lin­ux bina­ry which needs it can not do any­thing with this FreeB­SD library (which depends upon the FreeB­SD syscall table and FreeB­SD sym­bols from e.g. libc), and the Lin­ux run­time link­er will bail out because of this (actu­al­ly he sees that the lin is not of the required type by read­ing the ELF head­er of it). Unfor­tu­nate­ly the Lin­ux run­time link­er will not con­tin­ue to search for anoth­er library with the same name in anoth­er direc­to­ry (at least this was the case last time I checked and mod­i­fied the order in which the Lin­ux run­time link­er search­es for libraries… this has been a while, so he may be smarter now) and you will see the above error mes­sage (if you start­ed the lin­ux pro­gram in a ter­mi­nal).

The bot­tom line of all this is: the error mes­sage about ELF file OS ABI invalid just means that the Lin­ux pro­gram was not able to find the cor­rect Lin­ux library and got a FreeB­SD library instead. Go, install the cor­re­spond­ing Lin­ux library, and make sure the Lin­ux pro­gram can find it instead of the FreeB­SD library (do not for­get to run “/compat/linux/sbin/ldconfig ‑r /compat/linux” if you make changes by hand instead of using a port, else your changes may not be tak­en into account).

Con­straints regard­ing chroot into /compat/linux

The linux_base ports are designed to have a nice install-and-start expe­ri­ence. The draw­back of this is, that there is not a full Lin­ux sys­tem in /compat/linux, so doing a chroot into /compat/linux will cause trou­ble (depend­ing on what you want to do). If you want to chroot into the lin­ux sys­tem on your FreeB­SD machine, you bet­ter install a linux_dist port. A linux_dist port can be installed in par­al­lel to a linux_base port. Both of them are inde­pen­dent and as such you need to redo/copy con­fig­u­ra­tion changes you want to have in both envi­ron­ments.

Send to Kin­dle