Source | Alexander Leidinger

FreeBSD Service Jails – another layer in the security onion

In May I committed a new feature to FreeBSD-current (it will be in the FreeBSD 15 release, I have no plans to merge this to 14). This feature is called “Service Jails”. When you enable it, it takes a service (something which is started by an rc-script at boot or by hand via service(8)) and starts it in a jail(8). It can do this with any service, and with no more than 2 lines of configuration.

For those which don’t know, a jail is some kind of container technology. We have this technology since 1999 (so it pre-dates Docker by 14 years). It served as an inspiration for Solaris zones.

Too good to be true?

Containerizing some software with only 2 lines of code (if a service is “Service Jail ready”, only one line) sounds amazing, and is in no way comparable to a docker file or a normal jail config. This sounds a bit too good to be true. And that is correct. The service Jails framework is somewhere between a fully isolated container, and no containerization at all.

The biggest difference is that a Service Jail has full access to the entire filesystem of the host (or parent jail), except for chflags(1). This means if your service runs as root in the Service Jail, and it is compromised, the attacker is able to read your password database, and modify nearly any file content (and as such on next boot anything can happen). The only exception to this is, if this service already has provisions to run in a chroot and this is enabled. In that case only files in the chroot can be modified by an attacker.

What are the benefits?

Compared to running a service on the host itself without putting it into a jail you have created yourself and tailored to only the software you need, you have the benefit of limiting what the software (or an intruder) is able to do, but not the benefit of a minimal software install.

When you enable a Service Jail for a particular service which is not Service Jail ready and you do not provide a Service Jail config, the service is started inside a jail without any network access and full access to the filesystem. A Service Jail ready service which normally needs network access, can be limited by a custom config to not have any network access at all (or only to IPv6 and not IPv4, or vice versa). This means you can limit this software to not access the network despite not having it inside a VM.

The Service Jail also doesn’t allow to:

mount filesystems (and on purpose there is no provision so far to optionally allow this),
open raw sockets (can be enabled),
open sockets of protocol stacks that have not had jail functionality added (IPv4, IPv6, local UNIX sockets and routing stuff are jail-aware) to them (can be enabled),
lock/unlock physical pages in memory (can be enabled),
use System V IPC facilities (can be enabled),
use debugging facilities for unprivileged processes,
see processes from the host or other jails,

and all the other stuff which is prohibited in jails by default.

When you enable network access (IPv4 and/or IPv6) for a particular service, the Service Jail inherits all the IPs of the host (or parent jail). This means you can not run two services which want to listen on the same port by this. But as you can limit it to get only access to IPv4 and not IPv6 (and vice versa), it means you could run two different services for IPv4 and IPv6, or you can test scenarios where only one IP stack is available but have the host itself configured for dual-stack network access.

How to get most out of Service Jails?

With the possibility to allow unprivileged users to open privileged ports (sysctl net.inet.ip.portrange.reservedhigh=0) and having the service started as non-root (sysrc servicename_user=MyServiceUser), a Service Jail provides a very good benefit for a simple one-line config change (sysrc servicename_svcj=YES).

In this case filesystem access is restricted to what this particular user is able to read / write, only processes started by the service are visible to the service, and all the other jail-restrictions apply. An intruder may as such do bad things to this particular service, but not to other services on the system.

For a read-only webserver this may mean an attacker may be able to modify some log files, but can not see other processes running on the system and deducting from them what is the most valuable next step in the attack.

For a read-only php-fpm service it may mean that the attacker can run some in-memory code to spawn a botnet, but not compromise other parts of the host or access System V memory locations of a database (if the php-fpm service is not configured to allow access to System V resources).

What’s next?

The base system services are either made Service Jails aware, or configured to not run inside a service jail (e.g. a fsck doesn’t make sense to run in a jail). Not all of the services are tested with Service Jails. Give them a try and send a bug report in case something doesn’t work.

The FreeBSD ports collection has about 1500 services. I’ve either committed already some patches or send patches to the maintainers for some of the high profile ports (like webservers, databases, DNS servers, …) to make some of them Service Jails ready, but there are too much services to do that all myself. Feel free to submit some patches for them.

Share/Save

Plotting the FreeBSD memory fragmentation – part 2

If you haven’t read part 1 already, please do so. Else you will not understand what this is about (I don’t repeat the basics here).

The following graphs show the FMFI with D45043, D45045 and D45046 applied.

When you look at the graphs, keep in mind that I updated FreeBSD on 2024−05−27−120546 and 2024−06−04−105830. None of those updates introduced changes in the memory allocation area, so the results should be somewhat comparable.

I used the same workloads as in part 1 (not a deterministic benchmark, real world use case with 30 jails and various package build runs).

First the 2^nd last of the graphs from part 1 to have something to compare against:

Now with the 3 changes listed above:

Just by looking at the graphs, and given that I don’t run a fixed benchmark but this is plotted from real-world use, I don’t think we can draw a conclusion by looking at the FMFI which is plotted here (other than it does no bad for my workload).

The comment in the D45046 review about the reduced number of reservations with at least one NOFREE page (= a page which will never be freed) looks good. Having about 20 times less reservations with NOFREE pages means 20 times less NOFREE pages scattered around in memory. Those NOFREE pages can get in the way for larger allocations. Theoretically more memory areas can be combined (if needed). Practically this is not the case yet. There is a slight hint in the measurement in the comment in the review that there are some more PDE (“Page Directory Entry”) promotions, but they scratch at the 1 – 2% margin. I do not expect this results in a noticeable effect on performance.

Nevertheless, this looks very promising. It paves the way for further work as there are less NOFREE pages scattered around. This may make memory defragmentation / compaction techniques more useful. Once those are mature enough to be tested on real world stuff, I will generate some plots.

Share/Save

Plotting the FreeBSD memory fragmentation

I stumbled upon the work of Bojan Novković regarding physical memory anti-fragmentation mechanisms. As a picture sometimes tells more than 1000 words…

Share/Save

What this is about

I stumbled upon the work of Bojan Novković regarding physical memory anti-fragmentation mechanisms (attention, this is a link to the FreeBSD wiki, content may change without further notice). As a picture sometimes tells more than 1000 words, I wanted to see a graphical representation of the fragmentation. Not in terms of which memory regions are fragmented, but in terms of how fragmented the ~~UMA buckets~~ page allocator freelists are.

Bojan has a fragmentation metric (FMFI) for UMA available as a patch which gives a numeric representation of the fragmentation, but no graphs.

After a bit of tinkering around with gnuplot, I came up with some way of graphing it.

How to create some graphs

First you need some data to plot a graph. Collecting the FMFI stats is easy. A little cron-job which runs this periodically is enough:

#!/bin/sh
 
boottime=$(sysctl kern.boottime 2>&1 | awk '{print $5}' | sed -e 's:,::')
time=$(date -r ${boottime} +%Y%m%d_%H%M)
 
logfile=/var/tmp/vm_frag_${time}.log
 
touch ${logfile}
date "+%Y-%m-%d_%H:%M:%S" >> ${logfile}
sysctl vm.phys_frag_idx >> ${logfile}
echo >> ${logfile}

This creates log files in /var/tmp with the formatted boot time in the filename, so that there is an easy indication of a reset of the fragmentation.

After a while you should have some logs to parse. Gnuplot can not work with the simple log generated by the cron-job, so a CSV needs to be generated. The following awk script (parse_vm_frag.awk) generates the CSV. In my case there is only one NUMA domain, so my awk script to parse the data doesn’t care about NUMA domains.

/....-..-.._..:..:../ { date = $0 }
/vm.phys_frag_idx: / { next }
/DOMAIN/ { next }
 
/  ORDER (SIZE) |  FMFI/ { next }
/--/ { next }
/  .. \( .....K\) / { printf "%d %s %d\n", $1, date, $5; next }

Next step is a template (template.gnuplot) for the plots:

set terminal svg dynamic mouse standalone name "%%NAME%%"
# set terminal png size 1920,1280
set output "%%NAME%%.svg"
 
set title '%%NAME%%' noenhanced
set xdata time
set timefmt "%Y-%m-%d_%H:%M:%S"
set xlabel "Date Time"
set zlabel "Memory Fragmentation Index" rotate by 90
set ylabel "freelist size"
set zrange [-1000:1000]
set yrange [0:12]
set ytics 1
# the following rotate doesn't work, at least chrome doesn't rotate the dates on the x-axis at all
set xtics rotate by 90 format "%F %T" timedate 
set xyplane 0
set grid vertical
set border 895
 
splot "%%NAME%%.csv" using 2:1:3 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 1 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 2 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 3 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 4 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 5 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 6 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 7 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 8 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 9 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 10 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 11 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 12 every 13 t '' with filledcurve

Unfortunately I didn’t get the above to work with gnuplot-variables within 5 minutes, so I created a little script to generate a plot-script for each CSV file.

#!/bin/sh
 
for log in vm_frag_*.log; do
        base=$(basename ${log} .log)
        awk -f parse_vm_frag.awk <${log} >${base}.csv
 
        cp template.gnuplot ${base}.gnuplot
        sed -i -e "s:%%NAME%%:${base}:g" ${base}.gnuplot
done

Now it’s simply “gnuplot *.gnuplot” (assuming the CSV files and the gnuplot files are in the same directory), and you will get SVG graphs.

Some background info

And here are the results of running this for some days on a 2 socket, 6 core each plus hyperthreading Intel Xeon system with 72 GB RAM. This systems has about 30 different jails with a diverse mix of nginx, mysql, postgresql, redis, imap, smtp, various java stuff, …, poudriere (3 workers) and buildworld runs (about 30 jails not counting the poudriere runs). So the following graphs are not done in a reproducible way, but simply the result of real-world applications running all day long. Each new graph means there was a reboot. All reboots where done do update to a more recent FreeBSD-current.

All in all, not only the application workload was always different, but also the running kernel was different.

The graphs

Beware! You can not really compare one graph with another. They do not represent the same workload. As such any conclusion you (or I) want to draw from this is more an indication than a proven fact. Big differences will be visible, small changes may go unnoticed.

This is the graph of FreeBSD-current from around 2024-04-08. There are various modifications compared to a stock FreeBSD system, but the only change in the memory area is the FMFI patch mentioned above.

Explanation of what you see

A memory fragmentation index of 1000 is bad. It means the memory is very fragmented. A value of 0 means there is no fragmentation, and a negative value means it is very easy to satisfy an allocation request.

So bars which go up are bad, bars which go down are good.

The page allocator ~~UMA bucket~~ freelists axis (different colors for each size-rank) denotes the allocation size. ~~UMA bucket~~ Freelist size 0 is about 4k allocations, and each size-increase doubles the allocation size up to 16M at size-rank 12.

In the above graph for ~~UMA bucket~~ freelist size 0 all values are negative. This means that all allocations of upto 4k was always easy and no fragmentation was noticed. This is not a surprise, given that this is the smallest allocation size.

The fact that already ~~UMA bucket~~ freelist size 1 (8k allocations) had already that much fragmentation was a surprise to me at that point. But see the next part for a fix for this.

An immediate fix which prevents some of the fragmentation

The next graph is with a world from around 2024-04-14. It contains Bojans commit which prevents a bit of memory fragmentation around kernel stack guard pages.

Here it seems that Bojans fix had an immediate effect on ~~bucket~~ freelist size 1 (8k allocation size). It stays in “good shape” for a longer period of time. Here in this graph we see an improvement at the beginning until upto ~~bucket~~ size 6 (256k allocation size). The graphs below even show an improvement over several days of may upto ~~UMA bucket~~ size 3 (32k allocation size).

One of the next things I want to try (and plot) is review D16620 which segregates *_nofree allocations per allocation (a small patch) and I’m also interested to see what effect review D40772 has.

Some more graphs

Some more graphs, each one from an updated FreeBSD-current system (dates in the graph represent the reboot into the corresponding new world). Chrome was rebuild by poudriere (consumes a lot of RAM relative to other packages) several times during those graphs.

Share/Save

Self-signed certificates and LDAPS (OpenLDAP) in PHP (or python)

This is not about how to generate a self-signed certificate, this is about how to configure an ldap client to connect securely to a ldap server which has a self-signed certificate.

Recently I was searching a lot how to make this kind of setup work, but it seems nobody is using the keywords of the headline in their HOWTOs, or everyone is not really setting up a really secure connection with self-signed certificates. As such here my try to document this for those which are interested in a secure setup.

How OpenLDAP is checking the certificates normally

OpenLDAP is using the certificate store which is configured for OpenSSL. So any certificate which is signed by one of the CAs in the OpenSSL cert-ctore are trusted.

Secure setup

Most of the time you do not expose an LDAP server to the outside where a certificate from one of the trusted-by-default CAs is needed. A certificate from your internal CA is enough, and in some cases a self-signed certificate is sufficient too.

An easy solution could be to add either the root-certificate of your CA or the self-signed certificate into the trust-store of OpenSSL (not every OS / distribution has this in the same location, you have to check where this is for your OS, for FreeBSD 13+ this is /usr/local/etc/ssl/certs/, see also certctl(8) there). But this would mean you trust the certitifacate which you put there additionally to the default certificates (modulo any blacklisting you made yourself). Theoretically this means anyone who is able to get hold of a certificate from a public-CA for your LDAP server, could perform a man-in-the-middle attack (you need to consider yourself how feasible this is in your infrastructure setup and how likely this is to happen).

More secure operation

Let’s say you run a service which needs to be able to make TLS sessions to systems which use certificates from public CAs and you want to make sure a connection to the LDAP backend can not use certificates from public CAs.

To tighten the setup in this case, you need to specify that the client which uses OpenLDAP-client libraries is using a different trust-store for the certifcate validation.

For the openldap client utilities there is a global config file for this (on FreeBSD this is /usr/local/etc/openldap/ldap.conf). For other tools, like PHP, this needs to be done in the per-user config file ~/.ldaprc. Both file have the same syntax.

With php-ldap you normally run the service either in php-fpm or in an apache-php-module. In both cases the process which runs is configured to run as a non-root user which may or may not have a home directory (in FreeBSD the www user which is typically used for that has no home directory).

HOWTO

create a home directory
create a separate trust-store for LDAP
configure php-ldap / py-ldap to make use of the separate trust-store

Step 1 – create a home directory

Chose a place which is suitable, and create a directory there. It doesn’t need to be in /home, it can be anywhere. The important part is, that it is readable by the user which runs the application which is using php-ldap. It does not need to be writable by this user. In there you need to create the .ldaprc file (again, needs only be readable by the user) with the content from step 3.

Step 2 – create a separate trust-store for LDAP

In FreeBSD the global ldap config is in /usr/local/etc/openldap/ldap.conf. Theoretically you can put the trust-store for LDAP in any place wou want. In my setup I consider it to belong into /usr/local/etc/openldap/ssl/. So make a directory – like /usr/local/etc/openldap/ssl – for the trust-store, and copy the certificate of the LDAP server there.

Attention! Only the public certificate, not the private key! If you only have one file on the server for this, it is the combined key+certificate (if you don’t know or are able to deduct by looking into the file how to get rid of the key… there is a lot of info out there in the WWW which explains it). The directory and the certificate need to be accessible (read for the file, execute for the directory) by any user which shall make use of this. It does not hurt to have it accessible by everyone (you made sure there is not the private-key from the server, right?).

Step 3 – configure php-ldap / py-ldap to make use of the separate trust-store

If you use php-fpm, you need to configure a home directory in the FPM pool configureation section. As already said above, it does not need to be inside /home, but it dpends upon your needs. Here in this example let me use /home. The FPM config line to add is then something like:
env[HOME] = /home/php-fpm
You could achieve the same via changing the home directory in the password database, but this would have an effect on all processes run with this user, whereas here it is just for the php-fpm processes (and childs).

If you use apache instead of php-fpm, you need to configure something similar for the corresponding virtual host:
SetEnv HOME /home/php-fpm

With this you can now configure /home/php-fpm/.ldaprc to point to the LDAP trust-store:
TLS_CACERT /usr/local/etc/openldap/ssl/ldap_server_cert.pem TLS_CACERTDIR /usr/local/etc/openldap/ssl

If you use some python based application, you have to do something similar… if all else fails, it needs to be via a real home directory in the password database.

If you want to use the ldap client tools with any user, you need to add those lines to the /usr/local/etc/openldap/ldap.conf file too (there you can also set the default BASE – e.g. “BASE dc=example,dc=com” – and URI – e.g. “URI ldaps://ldap.example.com:639″).

After restarting php-fpm or apache, you should now be able to make really secure connections to the ldap server.

Some important things

Every time you change the certificate of the LDAP server, you need to update the certifacte in the clients.
There are two TLS modes for the LDAP server, one is “ldaps”, and one is “ldap+starttls”. If you have your LDAP server running in ldaps-mode (typically on port 639), you do not need to specify in your php-ldap using application to enable TLS (which is doing a starttls after connecting… typically on port 389), but you need to specify “ldaps://servername:639” (assuming it runs on port 639) instead of just “servername” at the place in your application where you are told to enter the server name. For py-ldap I have checked just one application (netdata), and there TLS needs to be enabled, and the server name has to be without “ldaps://” as netdata is prefixing the “ldaps://” itself if tls is enabled.
Some places in the internet are telling to add “TLS_REQCERT never” into ldap.conf / .ldaprc. Technically this is not needed. Depending on your point of view this can either be good or bad (specifying it saves some CPU cycles on the server and the client, and some transfer time over the network – not specifying it allows to validate the certificated received to be compared to the certifcate being available locally, but I do not know if OpenLDAP is doing this, nor did I spend some time to evaluate if this improves security (if the important parts of the certificate are out-of-sync, the connection will fail)).

Share/Save

Fighting the Coronavirus with FreeBSD (Folding@Home)

Photo by Fusion Medical Animation on Unsplash

Here is a quick HOWTO for those which want to provide some FreeBSD based compute resources to help finding vaccines. ~~I have not made a port out of this and do not know yet if I get the time to make one. If someone wants to make a port, go ahead, do not wait for me.~~

UPDATE 2020-03-22: 0mp@ made a port out of this, it is in “biology/linux-foldingathome”.

Download the linux RPM of the Folding@Home client (this covers fahclient only).
Enable the linuxulator (kernel moduls and linux_base (first part of chapter 10.2) is enough).
Make sure linprocfs/linsysfs are mounted in /compat/linux/{proc|sys}.
cd /compat/linux
tar -xf /path/to/fahclient....rpm
add the “fahclient” user (give it a real home directory)
make sure there is no /compat/linux/dev or alternatively mount devfs there
mkdir /compat/linux/etc/fahclient
cp /compat/linux/usr/share/doc/fahclient/sample-config.xml /compat/linux/etc/fahclient/config.xml
chown -R fahclient /compat/linux/etc/fahclient
edit /compat/linux/fahclient/config.xml: modify user (mandatory) / team (optional: FreeBSD team is 11743) / passkey (optional) as appropriate (if you want to control the client remotely, you need to modify some more parts, but somehow the client “loses” a filedescriptor and stops working as it should if you do that on FreeBSD)
If you have the home directories of the users as no-exec (e.g. seperate ZFS datasets with exec=off): make sure the home directory of the fahclient user has exec permissions enabled
cd ~fahclient (important! it tries to write to the current work directory when you start it)
Start it: /usr/sbin/daemon /compat/linux/usr/bin/FAHClient /compat/linux/etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid >/dev/null 2>&1

Per default it will now pick up some SARS-CoV‑2 (COVID-19) related folding tasks. There are some more config options (e.g. how much of the system resources are used). Please refer to the official Folding@Home site for more information about that. Be also aware that there is a big rise in compute resources donated to Folding@Home, so the pool of available work units may be empty from time to time, but they are working on adding more work units. Be patient.

My Folding@Home user and team statistics. — My Folding@Home statistics

Share/Save

Category: Source

FreeBSD Service Jails – another layer in the security onion

Too good to be true?

What are the benefits?

How to get most out of Service Jails?

Further reading

What’s next?

Plotting the FreeBSD memory fragmentation – part 2

Plotting the FreeBSD memory fragmentation

What this is about

How to create some graphs

Some background info

The graphs

Explanation of what you see

An immediate fix which prevents some of the fragmentation

Some more graphs

Self-signed certificates and LDAPS (OpenLDAP) in PHP (or python)

How OpenLDAP is checking the certificates normally

Secure setup

More secure operation

HOWTO

Step 1 – create a home directory

Step 2 – create a separate trust-store for LDAP

Step 3 – configure php-ldap / py-ldap to make use of the separate trust-store

Some important things

Fighting the Coronavirus with FreeBSD (Folding@Home)

Too good to be true?

What are the benefits?

How to get most out of Ser­vice Jails?

Fur­ther reading

What’s next?

What this is about

How to cre­ate some graphs

Some back­ground info

The graphs

Expla­na­tion of what you see

An imme­di­ate fix which pre­vents some of the fragmentation

Some more graphs

How OpenL­DAP is check­ing the cer­tifi­cates normally

Secure set­up

More secure operation

HOWTO

Step 1 – cre­ate a home directory

Step 2 – cre­ate a sep­a­rate trust-store for LDAP

Step 3 – con­fig­ure php-ldap / py-ldap to make use of the sep­a­rate trust-store

Some impor­tant things

How to get most out of Service Jails?

Further reading

How to create some graphs

Some background info

Explanation of what you see

An immediate fix which prevents some of the fragmentation

How OpenLDAP is checking the certificates normally

Secure setup

Step 1 – create a home directory

Step 2 – create a separate trust-store for LDAP

Step 3 – configure php-ldap / py-ldap to make use of the separate trust-store

Some important things