The FreeBSD-linuxulator explained (for devel­op­ers): basics

The last post about the Lin­ux­u­la­tor where I explained the Lin­ux­u­la­tor from an user point of view got some good amount of atten­tion. Trig­gered by a recent expla­na­tion of the Lin­ux­u­la­tor errno stuff to a fel­low FreeB­SD devel­op­er I decid­ed so see if more devel­op­ers are inter­est­ed in some more info too…

The syscall vec­tor

In sys/lin­ux/linux_sysvec.c is all the basic set­up to han­dle Lin­ux “sys­tem stuff” in FreeB­SD. The “sys­tem stuff” is about trans­lat­ing FreeB­SD errnos to Lin­ux errnos, about trans­lat­ing FreeB­SD sig­nals to Lin­ux sig­nales, about han­dling Lin­ux traps, and about set­ting up the FreeB­SD sys­tem vec­tor (the ker­nel struc­ture which con­tains all the data to iden­ti­fy when a Lin­ux pro­gram is called and to be able to lookup the right ker­nel func­tions for e.g. syscalls and ioctls).

There is not only one syscall vec­tor, there is one for a.out (struct sysentvec linux_sysvec) and one for ELF (struct sysentvec elf_linux_sysvec) bina­ries (at least on i386, for oth­er archi­tec­tures it may not make sense to have the a.out stuff, as they maybe nev­er seen any a.out Lin­ux bina­ry).

The ELF AUX args

When an ELF image is exe­cut­ed, the Lin­ux­u­la­tor adds some run­time infor­ma­tion (like page­size, uid, guid, …) so that the user­land can query this infor­ma­tion which is not sta­t­ic at build-time eas­i­ly. This is han­dled in the elf_linux_fixup func­tion(). If you see some error mes­sages about miss­ing ELF notes from e.g. glibc, this is the place to add this infor­ma­tion to. It would not be bad from time to time to have a look what Lin­ux is pro­vid­ing and miss­ing pieces there. FreeB­SD does not has an auto­mat­ed way of doing this, and I am not aware of some­one who reg­u­lar­ly checks this. There is a lit­tle bit more info about ELF notes avail­able in a mes­sage to one of the FreeB­SD mail­ing lists, it also has an exam­ple how to read out this data.

Traps

Lin­ux and FreeB­SD do not share the same point of view how a trap shall be han­dled (SIGBUS or SIGSEGV), the cor­re­spond­ing deci­sion mak­ing is han­dled in translate_traps() and a trans­la­tion table is avail­able as _bsd_to_linux_trapcode.

Sig­nals

The val­ues for the sig­nal names are not the same in FreeB­SD and Lin­ux. The trans­la­tion tables are called linux_to_bsd_signal and bsd_to_linux_signal. The trans­la­tion is a fea­ture of the syscall vec­tor (= auto­mat­ic).

Errnos

The val­ues for the errno names are not the same in FreeB­SD and Lin­ux. The trans­la­tion table is called bsd_to_linux_errno. Return­ing an errno in one of the Lin­ux syscalls will trig­ger an auto­mat­ic trans­la­tion from the FreeB­SD errno val­ue to the Lin­ux errno val­ue. This means that FreeB­SD errnos have to be returned (e.g. FreeB­SD ENOSYS=78) and the Lin­ux pro­gram will receive the Lin­ux val­ue (e.g. Lin­ux ENOSYS=38, and as the Lin­ux ker­nel returns neg­a­tive errnos, the lin­ux pro­gram will get ‑38).

If you see some­where an “-ESOMETHING” in the Lin­ux­u­la­tor code, this is either a bug, or some clever/tricky/dangerous use of the sign-bit to encode some info (e.g. in the futex code there is a func­tion which returns ‑ENOSYS, but the sign-bit is used as an error indi­ca­tor and the call­ing code is respon­si­ble to trans­late neg­a­tive errnos into pos­i­tive ones).

Syscalls

The Lin­ux syscalls are defined sim­i­lar to the FreeB­SD ones. There is a map­ping table (sys/linux/syscalls.master) between syscall num­bers and the cor­re­spond­ing func­tions. This table is used to gen­er­ate code (“make sysent” in sys//linux/) which does what is nec­es­sary.

Send to Kin­dle

Fix for the show­stop­per bug in the lin­ux­u­la­tor

I got time yes­ter­day to test acrore­ad with the patch from Intron/Kostik and … Yeah! I was not able to crash acrore­ad in the 2.6.16 emu­la­tion! Great! Request for wide­spread test­ing soon?!?

Now we just have to deter­mine if we have to care about the lock­ing (I don’t know if kib@ already asked jhb@ about it) and the race con­di­tion. In case the user­land exhibits very very bad pro­gram­ming and is using the FD before open() returns suc­cess­ful­ly, the pro­gram could read data. This can only hap­pen if the pro­gram has the right per­mis­sion to this data (the open is sup­posed to fail in this case not because of access restric­tions, but because of e.g., the wrong file type).

Send to Kin­dle