Plot­ting the FreeB­SD mem­o­ry fragmentation

I stum­bled upon the work of Bojan Novković regard­ing phys­i­cal mem­o­ry anti-fragmentation mech­a­nisms. As a pic­ture some­times tells more than 1000 words…

What this is about

I stum­bled upon the work of Bojan Novković regard­ing phys­i­cal mem­o­ry anti-fragmentation mech­a­nisms (atten­tion, this is a link to the FreeB­SD wiki, con­tent may change with­out fur­ther notice). As a pic­ture some­times tells more than 1000 words, I want­ed to see a graph­i­cal rep­re­sen­ta­tion of the frag­men­ta­tion. Not in terms of which mem­o­ry regions are frag­ment­ed, but in terms of how frag­ment­ed the UMA buck­ets are.

Bojan has a frag­men­ta­tion met­ric (FMFI) for UMA avail­able as a patch which gives a numer­ic rep­re­sen­ta­tion of the frag­men­ta­tion, but no graphs.

After a bit of tin­ker­ing around with gnu­plot, I came up with some way of graph­ing it.

How to cre­ate some graphs

First you need some data to plot a graph. Col­lect­ing the FMFI stats is easy. A lit­tle cron-job which runs this peri­od­i­cal­ly is enough:

#!/bin/sh
 
boottime=$(sysctl kern.boottime 2>&1 | awk '{print $5}' | sed -e 's:,::')
time=$(date -r ${boottime} +%Y%m%d_%H%M)
 
logfile=/var/tmp/vm_frag_${time}.log
 
touch ${logfile}
date "+%Y-%m-%d_%H:%M:%S" >> ${logfile}
sysctl vm.phys_frag_idx >> ${logfile}
echo >> ${logfile}

This cre­ates log files in /var/tmp with the for­mat­ted boot time in the file­name, so that there is an easy indi­ca­tion of a reset of the fragmentation.

After a while you should have some logs to parse. Gnu­plot can not work with the sim­ple log gen­er­at­ed by the cron-job, so a CSV needs to be gen­er­at­ed. The fol­low­ing awk script (parse_vm_frag.awk) gen­er­ates the CSV. In my case there is only one NUMA domain, so my awk script to parse the data does­n’t care about NUMA domains.

/....-..-.._..:..:../ { date = $0 }
/vm.phys_frag_idx: / { next }
/DOMAIN/ { next }
 
/  ORDER (SIZE) |  FMFI/ { next }
/--/ { next }
/  .. \( .....K\) / { printf "%d %s %d\n", $1, date, $5; next }

Next step is a tem­plate (template.gnuplot) for the plots:

set terminal svg dynamic mouse standalone name "%%NAME%%"
# set terminal png size 1920,1280
set output "%%NAME%%.svg"
 
set title '%%NAME%%' noenhanced
set xdata time
set timefmt "%Y-%m-%d_%H:%M:%S"
set xlabel "Date Time"
set zlabel "Memory Fragmentation Index" rotate by 90
set ylabel "UMA Bucket"
set zrange [-1000:1000]
set yrange [0:12]
set ytics 1
# the following rotate doesn't work, at least chrome doesn't rotate the dates on the x-axis at all
set xtics rotate by 90 format "%F %T" timedate 
set xyplane 0
set grid vertical
set border 895
 
splot "%%NAME%%.csv" using 2:1:3 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 1 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 2 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 3 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 4 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 5 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 6 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 7 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 8 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 9 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 10 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 11 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 12 every 13 t '' with filledcurve

Unfor­tu­nate­ly I did­n’t get the above to work with gnuplot-variables with­in 5 min­utes, so I cre­at­ed a lit­tle script to gen­er­ate a plot-script for each CSV file.

#!/bin/sh
 
for log in vm_frag_*.log; do
        base=$(basename ${log} .log)
        awk -f parse_vm_frag.awk <${log} >${base}.csv
 
        cp template.gnuplot ${base}.gnuplot
        sed -i -e "s:%%NAME%%:${base}:g" ${base}.gnuplot
done

Now it’s sim­ply “gnuplot *.gnuplot” (assum­ing the CSV files and the gnu­plot files are in the same direc­to­ry), and you will get SVG graphs.

Some back­ground info

And here are the results of run­ning this for some days on a 2 sock­et, 6 core each plus hyper­thread­ing Intel Xeon sys­tem with 72 GB RAM. This sys­tems has about 30 dif­fer­ent jails with a diverse mix of nginx, mysql, post­gresql, redis, imap, smtp, var­i­ous java stuff, …, poudriere (3 work­ers) and build­world runs (about 30 jails not count­ing the poudriere runs). So the fol­low­ing graphs are not done in a repro­ducible way, but sim­ply the result of real-world appli­ca­tions run­ning all day long. Each new graph means there was a reboot. All reboots where done do update to a more recent FreeBSD-current.

All in all, not only the appli­ca­tion work­load was always dif­fer­ent, but also the run­ning ker­nel was different.

The graphs

Beware! You can not real­ly com­pare one graph with anoth­er. They do not rep­re­sent the same work­load. As such any con­clu­sion you (or I) want to draw from this is more an indi­ca­tion than a proven fact. Big dif­fer­ences will be vis­i­ble, small changes may go unnoticed.

This is the graph of FreeBSD-current from around 2024-04-08. There are var­i­ous mod­i­fi­ca­tions com­pared to a stock FreeB­SD sys­tem, but the only change in the mem­o­ry area is the FMFI patch men­tioned above.

Expla­na­tion of what you see

A mem­o­ry frag­men­ta­tion index of 1000 is bad. It means the mem­o­ry is very frag­ment­ed. A val­ue of 0 means there is no frag­men­ta­tion, and a neg­a­tive val­ue means it is very easy to sat­is­fy an allo­ca­tion request.

So bars which go up are bad, bars which go down are good.

The UMA Buck­et axis (dif­fer­ent col­ors for each buck­et) denotes the allo­ca­tion size. UMA buck­et 0 is about 4k allo­ca­tions, and each bucket-increase dou­bles the allo­ca­tion size up to 16M at buck­et 12.

In the above graph for UMA buck­et 0 all val­ues are neg­a­tive. This means that all allo­ca­tions of upto 4k was always easy and no frag­men­ta­tion was noticed. This is not a sur­prise, giv­en that this is the small­est allo­ca­tion size.

The fact that already buck­et size 1 (8k allo­ca­tions) had already that much frag­men­ta­tion was a sur­prise to me at that point. But see the next part for a fix for this.

An imme­di­ate fix which pre­vents some of the fragmentation

The next graph is with a world from around 2024-04-14. It con­tains Bojans com­mit which pre­vents a bit of mem­o­ry frag­men­ta­tion around ker­nel stack guard pages.

Here it seems that Bojans fix had an imme­di­ate effect on UMA buck­et 1 (8k allo­ca­tion size). It stays in “good shape” for a longer peri­od of time. Here in this graph we see an improve­ment at the begin­ning until upto buck­et 6 (256k allo­ca­tion size). The graphs below even show an improve­ment over sev­er­al days of may upto UMA buck­et 3 (32k allo­ca­tion size).

One of the next things I want to try (and plot) is review D16620 which seg­re­gates *_nofree allo­ca­tions per buck­et (a small patch) and I’m also inter­est­ed to see what effect review D40772 has.

Some more graphs

Some more graphs, each one from an updat­ed FreeBSD-current sys­tem (dates in the graph rep­re­sent the reboot into the cor­re­spond­ing new world). Chrome was rebuild by poudriere (con­sumes a lot of RAM rel­a­tive to oth­er pack­ages) sev­er­al times dur­ing those graphs.

Solaris: script to check var­i­ous set­tings of a sys­tem if they com­ply to some pre-defined settings

Prob­lem

If you set­up a sys­tem, you want to make sure that it com­plies to a pre-defined con­fig. You can do that with some con­fig­u­ra­tion man­age­ment sys­tem, but there are cas­es where it use­ful to do that out­side of this context.

Solu­tion

The shell script below I start­ed to write in 2008. Over time (until 2016) it extend­ed into some­thing which is able to out­put a report of over 1000 items. You can con­fig­ure it via ${HOME}/.check_host.cfg and /etc/check_host.cfg (it reads both in this order, first con­fig wins and oth­er con­fig is not read). You can use option “-h” to see the usage text. Option “-n” sup­press­es mes­sages which help to fix issues, “-a” prints out sim­ple HTML instead of text.

Solaris: script to cre­ate com­mands to set­up LDOMs based upon out­put from “ldm ls”

Prob­lem

You have a LDOM which you want to clone to some­where else and all you have to per­form that is the ldm com­mand on the tar­get system.

Solu­tion

Down­load the AWK script below. Use the out­put of “ldm ls ‑l ‑p <ldom>” as the input of this AWK script. The out­put will be a list of com­mands to re-create the con­fig for VDS, VDISK, VSW and NETWORK.

I wrote this in 2013, so changes to the out­put of “ldm ls” since then are not account­ed for.

Self-signed cer­tifi­cates and LDAPS (OpenL­DAP) in PHP (or python)

This is not about how to gen­er­ate a self-signed cer­tifi­cate, this is about how to con­fig­ure an ldap client to con­nect secure­ly to a ldap serv­er which has a self-signed certificate.

Recent­ly I was search­ing a lot how to make this kind of set­up work, but it seems nobody is using the key­words of the head­line in their HOW­TOs, or every­one is not real­ly set­ting up a real­ly secure con­nec­tion with self-signed cer­tifi­cates. As such here my try to doc­u­ment this for those which are inter­est­ed in a secure setup.

How OpenL­DAP is check­ing the cer­tifi­cates normally

OpenL­DAP is using the cer­tifi­cate store which is con­fig­ured for OpenSSL. So any cer­tifi­cate which is signed by one of the CAs in the OpenSSL cert-ctore are trusted.

Secure set­up

Most of the time you do not expose an LDAP serv­er to the out­side where a cer­tifi­cate from one of the trusted-by-default CAs is need­ed. A cer­tifi­cate from your inter­nal CA is enough, and in some cas­es a self-signed cer­tifi­cate is suf­fi­cient too.

An easy solu­tion could be to add either the root-certificate of your CA or the self-signed cer­tifi­cate into the trust-store of OpenSSL (not every OS / dis­tri­b­u­tion has this in the same loca­tion, you have to check where this is for your OS, for FreeB­SD 13+ this is /usr/local/etc/ssl/certs/, see also certctl(8) there). But this would mean you trust the cer­ti­ti­fa­cate which you put there addi­tion­al­ly to the default cer­tifi­cates (mod­u­lo any black­list­ing you made your­self). The­o­ret­i­cal­ly this means any­one who is able to get hold of a cer­tifi­cate from a public-CA for your LDAP serv­er, could per­form a man-in-the-middle attack (you need to con­sid­er your­self how fea­si­ble this is in your infra­struc­ture set­up and how like­ly this is to happen).

More secure operation

Let’s say you run a ser­vice which needs to be able to make TLS ses­sions to sys­tems which use cer­tifi­cates from pub­lic CAs and you want to make sure a con­nec­tion to the LDAP back­end can not use cer­tifi­cates from pub­lic CAs.

To tight­en the set­up in this case, you need to spec­i­fy that the client which uses OpenLDAP-client libraries is using a dif­fer­ent trust-store for the cer­tifcate validation.

For the openl­dap client util­i­ties there is a glob­al con­fig file for this (on FreeB­SD this is /usr/local/etc/openldap/ldap.conf). For oth­er tools, like PHP, this needs to be done in the per-user con­fig file ~/.ldaprc. Both file have the same syntax.

With php-ldap you nor­mal­ly run the ser­vice either in php-fpm or in an apache-php-module. In both cas­es the process which runs is con­fig­ured to run as a non-root user which may or may not have a home direc­to­ry (in FreeB­SD the www user which is typ­i­cal­ly used for that has no home directory).

HOWTO

  1. cre­ate a home directory
  2. cre­ate a sep­a­rate trust-store for LDAP
  3. con­fig­ure php-ldap / py-ldap to make use of the sep­a­rate trust-store

Step 1 – cre­ate a home directory

Chose a place which is suit­able, and cre­ate a direc­to­ry there. It does­n’t need to be in /home, it can be any­where. The impor­tant part is, that it is read­able by the user which runs the appli­ca­tion which is using php-ldap. It does not need to be writable by this user. In there you need to cre­ate the .ldaprc file (again, needs only be read­able by the user) with the con­tent from step 3.

Step 2 – cre­ate a sep­a­rate trust-store for LDAP

In FreeB­SD the glob­al ldap con­fig is in /usr/local/etc/openldap/ldap.conf. The­o­ret­i­cal­ly you can put the trust-store for LDAP in any place wou want. In my set­up I con­sid­er it to belong into /usr/local/etc/openldap/ssl/. So make a direc­to­ry – like /usr/local/etc/openldap/ssl – for the trust-store, and copy the cer­tifi­cate of the LDAP serv­er there.

Atten­tion! Only the pub­lic cer­tifi­cate, not the pri­vate key! If you only have one file on the serv­er for this, it is the com­bined key+certificate (if you don’t know or are able to deduct by look­ing into the file how to get rid of the key… there is a lot of info out there in the WWW which explains it). The direc­to­ry and the cer­tifi­cate need to be acces­si­ble (read for the file, exe­cute for the direc­to­ry) by any user which shall make use of this. It does not hurt to have it acces­si­ble by every­one (you made sure there is not the private-key from the serv­er, right?).

Step 3 – con­fig­ure php-ldap / py-ldap to make use of the sep­a­rate trust-store

If you use php-fpm, you need to con­fig­ure a home direc­to­ry in the FPM pool con­fig­ure­ation sec­tion. As already said above, it does not need to be inside /home, but it dpends upon your needs. Here in this exam­ple let me use /home. The FPM con­fig line to add is then some­thing like:
env[HOME] = /home/php-fpm
You could achieve the same via chang­ing the home direc­to­ry in the pass­word data­base, but this would have an effect on all process­es run with this user, where­as here it is just for the php-fpm process­es (and childs).

If you use apache instead of php-fpm, you need to con­fig­ure some­thing sim­i­lar for the cor­re­spond­ing vir­tu­al host:
SetEnv HOME /home/php-fpm

With this you can now con­fig­ure /home/php-fpm/.ldaprc to point to the LDAP trust-store:
TLS_CACERT /usr/local/etc/openldap/ssl/ldap_server_cert.pem
TLS_CACERTDIR /usr/local/etc/openldap/ssl

If you use some python based appli­ca­tion, you have to do some­thing sim­i­lar… if all else fails, it needs to be via a real home direc­to­ry in the pass­word database.

If you want to use the ldap client tools with any user, you need to add those lines to the /usr/local/etc/openldap/ldap.conf file too (there you can also set the default BASE – e.g. “BASE dc=example,dc=com” – and URI – e.g. “URI ldaps://ldap.example.com:639″).

After restart­ing php-fpm or apache, you should now be able to make real­ly secure con­nec­tions to the ldap server.

Some impor­tant things

  • Every time you change the cer­tifi­cate of the LDAP serv­er, you need to update the cer­ti­facte in the clients.
  • There are two TLS modes for the LDAP serv­er, one is “ldaps”, and one is “ldap+starttls”. If you have your LDAP serv­er run­ning in ldaps-mode (typ­i­cal­ly on port 639), you do not need to spec­i­fy in your php-ldap using appli­ca­tion to enable TLS (which is doing a start­tls after con­nect­ing… typ­i­cal­ly on port 389), but you need to spec­i­fy “ldaps://servername:639” (assum­ing it runs on port 639) instead of just “server­name” at the place in your appli­ca­tion where you are told to enter the serv­er name. For py-ldap I have checked just one appli­ca­tion (net­da­ta), and there TLS needs to be enabled, and the serv­er name has to be with­out “ldaps://” as net­da­ta is pre­fix­ing the “ldaps://” itself if tls is enabled.
  • Some places in the inter­net are telling to add “TLS_REQCERT nev­er” into ldap.conf / .ldaprc. Tech­ni­cal­ly this is not need­ed. Depend­ing on your point of view this can either be good or bad (spec­i­fy­ing it saves some CPU cycles on the serv­er and the client, and some trans­fer time over the net­work – not spec­i­fy­ing it allows to val­i­date the cer­tifi­cat­ed received to be com­pared to the cer­tifcate being avail­able local­ly, but I do not know if OpenL­DAP is doing this, nor did I spend some time to eval­u­ate if this improves secu­ri­ty (if the impor­tant parts of the cer­tifi­cate are out-of-sync, the con­nec­tion will fail)).

Fight­ing the Coro­n­avirus with FreeB­SD (Folding@Home)

Pho­to by Fusion Med­ical Ani­ma­tion on Unsplash

Here is a quick HOWTO for those which want to pro­vide some FreeB­SD based com­pute resources to help find­ing vac­cines. I have not made a port out of this and do not know yet if I get the time to make one. If some­one wants to make a port, go ahead, do not wait for me.

UPDATE 2020-03-22: 0mp@ made a port out of this, it is in “biology/linux-foldingathome”.

  • Down­load the lin­ux RPM of the Folding@Home client (this cov­ers fah­client only).
  • Enable the lin­ux­u­la­tor (ker­nel moduls and linux_base (first part of chap­ter 10.2) is enough).
  • Make sure linprocfs/linsysfs are mount­ed in /compat/linux/{proc|sys}.
  • cd /compat/linux
  • tar -xf /path/to/fahclient....rpm
  • add the “fah­client” user (give it a real home directory)
  • make sure there is no /compat/linux/dev or alter­na­tive­ly mount devfs there
  • mkdir /compat/linux/etc/fahclient
  • cp /compat/linux/usr/share/doc/fahclient/sample-config.xml /compat/linux/etc/fahclient/config.xml
  • chown -R fahclient /compat/linux/etc/fahclient
  • edit /compat/linux/fahclient/config.xml: mod­i­fy user (manda­to­ry) / team (option­al: FreeB­SD team is 11743) / passkey (option­al) as appro­pri­ate (if you want to con­trol the client remote­ly, you need to mod­i­fy some more parts, but some­how the client “los­es” a filedescrip­tor and stops work­ing as it should if you do that on FreeBSD)
  • If you have the home direc­to­ries of the users as no-exec (e.g. seper­ate ZFS datasets with exec=off): make sure the home direc­to­ry of the fah­client user has exec per­mis­sions enabled
  • cd ~fahclient (impor­tant! it tries to write to the cur­rent work direc­to­ry when you start it)
  • Start it: /usr/sbin/daemon /compat/linux/usr/bin/FAHClient /compat/linux/etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid >/dev/null 2>&1

Per default it will now pick up some SARS-CoV‑2 (COVID-19) relat­ed fold­ing tasks. There are some more con­fig options (e.g. how much of the sys­tem resources are used). Please refer to the offi­cial Folding@Home site for more infor­ma­tion about that. Be also aware that there is a big rise in com­pute resources donat­ed to Folding@Home, so the pool of avail­able work units may be emp­ty from time to time, but they are work­ing on adding more work units. Be patient.

Exit mobile version
%%footer%%