I got a little bit of time to update my 3 year old work of adding static DTrace probes to the linuxulator.
The changes are not in HEAD, but in my linuxulator-dtrace branch. The revision to have a look at is r230910. Included are some DTrace scripts:
- script to check internal locks
- script to trace futexes
- script to generate stats for DTracified linuxulator parts
- script to check for errors:
- emulation errors (unsupported stuff, unknown stuff, …)
- kernel errors (resource shortage, …)
- programming errors (errors which can happen if someone made a mistake, but should not happen)
The programming-error checks give hints about userland programming errors respectively a hint about the reason of error return values due to resource shortage or maybe a wrong combination of parameters. An example error message for this case is “Application %s issued a sysctl which failed the length restrictions.\nThe length passed is %d, the min length supported is 1 and the max length supported is %d.\n”.
The stats-script (tailored specially to the linuxulator, but this can easily be extended to the rest of the kernel) can report about:
- number of calls to a kernel function per executable binary (not per PID!): allows to see where an optimization would be beneficial for a given application
- graph of CPU time spend in kernel functions per executable binary: together with the number of calls to this function this allows to determine if a kernel optimization would be beneficial / is possible for a given application
- graph of longest running (CPU-time!) kernel function in total
- timing statistics for the emul_lock
- graph of longest held (CPU-time!) locks
Unfortunately this can not be committed to HEAD as-is. The DTrace SDT provider can not handle probes which are added to the kernel after the SDT provider is already loaded. This means that you either have to compile the linuxulator statically into the kernel, or you have to load the SDT kernel module after the linuxulator module is loaded. If you do not respect this, you get a kernel panic on first access of one of the providers in the linuxulator (AFAIR this includes listing the probes available in the kernel).
GD Star Rating
GD Star Rating
Tags: error checks
, error return
, example error
, kernel function
, kernel functions
, length restrictions
, programming error
, programming errors
, time kernel
, timing statistics
The last post about the Linuxulator where I explained the Linuxulator from an user point of view got some good amount of attention. Triggered by a recent explanation of the Linuxulator errno stuff to a fellow FreeBSD developer I decided so see if more developers are interested in some more info too…
The syscall vector
In sys/linux/linux_sysvec.c is all the basic setup to handle Linux “system stuff” in FreeBSD. The “system stuff” is about translating FreeBSD errnos to Linux errnos, about translating FreeBSD signals to Linux signales, about handling Linux traps, and about setting up the FreeBSD system vector (the kernel structure which contains all the data to identify when a Linux program is called and to be able to lookup the right kernel functions for e.g. syscalls and ioctls).
There is not only one syscall vector, there is one for a.out (struct sysentvec linux_sysvec) and one for ELF (struct sysentvec elf_linux_sysvec) binaries (at least on i386, for other architectures it may not make sense to have the a.out stuff, as they maybe never seen any a.out Linux binary).
The ELF AUX args
When an ELF image is executed, the Linuxulator adds some runtime information (like pagesize, uid, guid, …) so that the userland can query this information which is not static at build-time easily. This is handled in the elf_linux_fixup function(). If you see some error messages about missing ELF notes from e.g. glibc, this is the place to add this information to. It would not be bad from time to time to have a look what Linux is providing and missing pieces there. FreeBSD does not has an automated way of doing this, and I am not aware of someone who regularly checks this. There is a little bit more info about ELF notes available in a message to one of the FreeBSD mailing lists, it also has an example how to read out this data.
Linux and FreeBSD do not share the same point of view how a trap shall be handled (SIGBUS or SIGSEGV), the corresponding decision making is handled in translate_traps() and a translation table is available as _bsd_to_linux_trapcode.
The values for the signal names are not the same in FreeBSD and Linux. The translation tables are called linux_to_bsd_signal and bsd_to_linux_signal. The translation is a feature of the syscall vector (= automatic).
The values for the errno names are not the same in FreeBSD and Linux. The translation table is called bsd_to_linux_errno. Returning an errno in one of the Linux syscalls will trigger an automatic translation from the FreeBSD errno value to the Linux errno value. This means that FreeBSD errnos have to be returned (e.g. FreeBSD ENOSYS=78) and the Linux program will receive the Linux value (e.g. Linux ENOSYS=38, and as the Linux kernel returns negative errnos, the linux program will get –38).
If you see somewhere an “-ESOMETHING” in the Linuxulator code, this is either a bug, or some clever/tricky/dangerous use of the sign-bit to encode some info (e.g. in the futex code there is a function which returns –ENOSYS, but the sign-bit is used as an error indicator and the calling code is responsible to translate negative errnos into positive ones).
The Linux syscalls are defined similar to the FreeBSD ones. There is a mapping table (sys/linux/syscalls.master) between syscall numbers and the corresponding functions. This table is used to generate code (“make sysent” in sys//linux/) which does what is necessary.
GD Star Rating
GD Star Rating
, elf image
, freebsd mailing
, freebsd system
, kernel functions
, kernel structure
, linux linux
, linux program
, missing pieces
After another mail where I explained a little bit of the linuxulator behavior, it is time to try to make an easy text which I can reference in future answers. If someone wants to add parts of this explanation to the FreeBSD handbook, go ahead.
Linux emulation? No, “native” execution (sort of)!
First, the linuxulator is not an emulation. It is “just” a binary interface which is a little bit different from the FreeBSD-“native”-one. This means that the binary files in FreeBSD and Linux are both files which comply to the ELF specification.
When the FreeBSD kernel loads an ELF file, it looks if it is a FreeBSD ELF file or a Linux ELF file (or some other flavor it knows about). Based upon this it looks up appropriate actions in a table for this binary (it can also differentiate between 64-bit and 32-bit, and probably other things too).
The FreeBSD-table is always compiled in (for a better big picture: at least on an AMD/Intel 64-bit platform there is also the possibility to include a 32-bit version of this table additionally, to be able to execute 32-bit programs on 64-bit systems), and other ones like the Linux one can be loaded additionally into the kernel (or build statically in the kernel, if desired).
Those tables contain some parameters and pointers which allow to execute the binary. If a program is making a system call, the kernel will look up the correct function inside this table. It will do this for FreeBSD binaries, and for Linux binaries. This means that there is no emulation/simulation (overhead) going on… at least ideally. Some behavior is a little bit differently between Linux and FreeBSD, so that a little bit of translation/house-keeping has to go on for some Linux system calls for the underlying FreeBSD kernel functions.
This means that a lot of Linux stuff in FreeBSD is handled at the same speed as if this Linux program would be a FreeBSD program.
Linux file/directory tricks
When the kernel detects a Linux program, it is also playing some tricks with files and directories (also a property of the above mentioned table in the kernel, so theoretically the kernel could play tricks for FreeBSD programs too).
If you look up for a file or directory /A, the kernel will first look for /compat/linux/A, and if it does not find it, it will look for /A. This is important! For example if you have an empty /compat/linux/home, any application which wants to display the contents of /home will show /compat/linux/home. As it is empty, you see nothing. If this application does not allow you to enter a directory manually via the keyboard, you have lost (ok, you can remove /compat/linux/home or fill it with what you want to have). If you can enter a directory via the keyboard, you could enter /home/yourlogin, this would first let the kernel look for /compat/linux/home/yourlogin, and as it can not find it then have a look for /home/yourlogin (which we assume is there), and as such would display the contents of your home directory.
This implies several things:
- you can hide FreeBSD directory contents from Linux programs while still being able to access the content
- “badly” programmed Linux applications (more correctly: Linux programs which make assumptions which do not hold in FreeBSD) can prevent you from accessing FreeBSD files, or files which are the same in Linux and FreeBSD (like /etc/group which is not available in /compat/linux in the linux_base ports, so that the FreeBSD one is read)
- you can have different files for Linux than for FreeBSD
The Linux userland
The linux_base port in FreeBSD is coming from a plain installation of Linux packages. The difference is that some files are deleted, either because we can not use them in the linuxulator, or because they exist already in the FreeBSD tree at the same place and we want that the Linux programs use the FreeBSD file (/etc/group and /etc/passwd come to mind). The installation also marks binary programs as Linux programs, so that the kernel knows which kernel-table to consult for system calls and such (this is not really necessary for all binary programs, but it is harder to script the correct detection logic, than to just “brand” all binary programs).
Additionally some configurations are made to (hopefully) make it do the right thing out of the box. The complete setup of the linux_base ports is done to let Linux programs integrate into FreeBSD. This means if you start acroread or skype, you do not want to have to have to configure some things in /compat/linux/etc/ first to have your fonts look the same and your user IDs resolved to names (this does not work if you use LDAP or kerberos or other directory services for the user/group ID management, you need to configure this yourself). All this should just work and the application windows shall just pop up on your screen so that you can do what you want to do. Some linux_base ports also do not work on all FreeBSD releases. This can be because some kernel features which this linux_base ports depends upon is not available (yet) in FreeBSD. Because of this you should not choice a linux_base port yourself. Just go and install the program from the Ports Collection and let it install the correct linux_base port automatically (a different FreeBSD release may have a different default linux_base port).
A note of caution, there are instructions out there which tell how to install more recent linux_base ports into FreeBSD releases which do not have them as default. You do this on your own risk, it may or may not work. It depends upon which programs you use and at which version those programs are (or more technically, which kernel features they depend upon). If it does not work for you, you just have two possibilities: revert back and forget about it, or update your FreeBSD version to a more recent one (but it could be the case, that even the most recent development version of FreeBSD does not have support for what you need).
Linux libraries and “ELF file OS ABI invalid”-error messages
Due to the above explained fact about file/directory tricks by the kernel, you have to be careful with (additional) Linux libraries. When a Linux program needs some libraries, several directories (specified in /compat/linux/etc/ld.so.conf) are searched. Let us assume that the /compat/linux/etc/ld.so.conf specifies to search in /A, /B and /C. This means the FreeBSD kernel first gets a request to open /A/libXYZ. Because of this he first tries /compat/linux/A/libXYZ, and if it does not exist he tries /A/libXYZ. When this fails too, the Linux runtime linker tries the next directory in the config, so that the kernel looks now for /compat/linux/B/libXYZ and if it does not exist for /B/libXYZ.
Now assume that libXYZ is in /compat/linux/C/ as a Linux library, and in /B as a FreeBSD library. This means that the kernel will first find the FreeBSD library /B/libXYZ. The Linux binary which needs it can not do anything with this FreeBSD library (which depends upon the FreeBSD syscall table and FreeBSD symbols from e.g. libc), and the Linux runtime linker will bail out because of this (actually he sees that the lin is not of the required type by reading the ELF header of it). Unfortunately the Linux runtime linker will not continue to search for another library with the same name in another directory (at least this was the case last time I checked and modified the order in which the Linux runtime linker searches for libraries… this has been a while, so he may be smarter now) and you will see the above error message (if you started the linux program in a terminal).
The bottom line of all this is: the error message about ELF file OS ABI invalid just means that the Linux program was not able to find the correct Linux library and got a FreeBSD library instead. Go, install the corresponding Linux library, and make sure the Linux program can find it instead of the FreeBSD library (do not forget to run “/compat/linux/sbin/ldconfig –r /compat/linux” if you make changes by hand instead of using a port, else your changes may not be taken into account).
Constraints regarding chroot into /compat/linux
The linux_base ports are designed to have a nice install-and-start experience. The drawback of this is, that there is not a full Linux system in /compat/linux, so doing a chroot into /compat/linux will cause trouble (depending on what you want to do). If you want to chroot into the linux system on your FreeBSD machine, you better install a linux_dist port. A linux_dist port can be installed in parallel to a linux_base port. Both of them are independent and as such you need to redo/copy configuration changes you want to have in both environments.
GD Star Rating
GD Star Rating
Tags: amd intel
, binary interface
, bit systems
, freebsd handbook
, freebsd kernel
, intel 64 bit
, kernel functions
, linux binaries
, linux elf
, linux program