Plot­ting the FreeB­SD mem­o­ry fragmentation

I stum­bled upon the work of Bojan Novković regard­ing phys­i­cal mem­o­ry anti-fragmentation mech­a­nisms. As a pic­ture some­times tells more than 1000 words…

What this is about

I stum­bled upon the work of Bojan Novković regard­ing phys­i­cal mem­o­ry anti-fragmentation mech­a­nisms (atten­tion, this is a link to the FreeB­SD wiki, con­tent may change with­out fur­ther notice). As a pic­ture some­times tells more than 1000 words, I want­ed to see a graph­i­cal rep­re­sen­ta­tion of the frag­men­ta­tion. Not in terms of which mem­o­ry regions are frag­ment­ed, but in terms of how frag­ment­ed the UMA buck­ets page allo­ca­tor freel­ists are.

Bojan has a frag­men­ta­tion met­ric (FMFI) for UMA avail­able as a patch which gives a numer­ic rep­re­sen­ta­tion of the frag­men­ta­tion, but no graphs.

After a bit of tin­ker­ing around with gnu­plot, I came up with some way of graph­ing it.

How to cre­ate some graphs

First you need some data to plot a graph. Col­lect­ing the FMFI stats is easy. A lit­tle cron-job which runs this peri­od­i­cal­ly is enough:

#!/bin/sh
 
boottime=$(sysctl kern.boottime 2>&1 | awk '{print $5}' | sed -e 's:,::')
time=$(date -r ${boottime} +%Y%m%d_%H%M)
 
logfile=/var/tmp/vm_frag_${time}.log
 
touch ${logfile}
date "+%Y-%m-%d_%H:%M:%S" >> ${logfile}
sysctl vm.phys_frag_idx >> ${logfile}
echo >> ${logfile}

This cre­ates log files in /var/tmp with the for­mat­ted boot time in the file­name, so that there is an easy indi­ca­tion of a reset of the fragmentation.

After a while you should have some logs to parse. Gnu­plot can not work with the sim­ple log gen­er­at­ed by the cron-job, so a CSV needs to be gen­er­at­ed. The fol­low­ing awk script (parse_vm_frag.awk) gen­er­ates the CSV. In my case there is only one NUMA domain, so my awk script to parse the data does­n’t care about NUMA domains.

/....-..-.._..:..:../ { date = $0 }
/vm.phys_frag_idx: / { next }
/DOMAIN/ { next }
 
/  ORDER (SIZE) |  FMFI/ { next }
/--/ { next }
/  .. \( .....K\) / { printf "%d %s %d\n", $1, date, $5; next }

Next step is a tem­plate (template.gnuplot) for the plots:

set terminal svg dynamic mouse standalone name "%%NAME%%"
# set terminal png size 1920,1280
set output "%%NAME%%.svg"
 
set title '%%NAME%%' noenhanced
set xdata time
set timefmt "%Y-%m-%d_%H:%M:%S"
set xlabel "Date Time"
set zlabel "Memory Fragmentation Index" rotate by 90
set ylabel "freelist size"
set zrange [-1000:1000]
set yrange [0:12]
set ytics 1
# the following rotate doesn't work, at least chrome doesn't rotate the dates on the x-axis at all
set xtics rotate by 90 format "%F %T" timedate 
set xyplane 0
set grid vertical
set border 895
 
splot "%%NAME%%.csv" using 2:1:3 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 1 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 2 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 3 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 4 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 5 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 6 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 7 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 8 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 9 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 10 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 11 every 13 t '' with filledcurve, \
      "" using 2:1:3 skip 12 every 13 t '' with filledcurve

Unfor­tu­nate­ly I did­n’t get the above to work with gnuplot-variables with­in 5 min­utes, so I cre­at­ed a lit­tle script to gen­er­ate a plot-script for each CSV file.

#!/bin/sh
 
for log in vm_frag_*.log; do
        base=$(basename ${log} .log)
        awk -f parse_vm_frag.awk <${log} >${base}.csv
 
        cp template.gnuplot ${base}.gnuplot
        sed -i -e "s:%%NAME%%:${base}:g" ${base}.gnuplot
done

Now it’s sim­ply “gnuplot *.gnuplot” (assum­ing the CSV files and the gnu­plot files are in the same direc­to­ry), and you will get SVG graphs.

Some back­ground info

And here are the results of run­ning this for some days on a 2 sock­et, 6 core each plus hyper­thread­ing Intel Xeon sys­tem with 72 GB RAM. This sys­tems has about 30 dif­fer­ent jails with a diverse mix of nginx, mysql, post­gresql, redis, imap, smtp, var­i­ous java stuff, …, poudriere (3 work­ers) and build­world runs (about 30 jails not count­ing the poudriere runs). So the fol­low­ing graphs are not done in a repro­ducible way, but sim­ply the result of real-world appli­ca­tions run­ning all day long. Each new graph means there was a reboot. All reboots where done do update to a more recent FreeBSD-current.

All in all, not only the appli­ca­tion work­load was always dif­fer­ent, but also the run­ning ker­nel was different.

The graphs

Beware! You can not real­ly com­pare one graph with anoth­er. They do not rep­re­sent the same work­load. As such any con­clu­sion you (or I) want to draw from this is more an indi­ca­tion than a proven fact. Big dif­fer­ences will be vis­i­ble, small changes may go unnoticed.

This is the graph of FreeBSD-current from around 2024-04-08. There are var­i­ous mod­i­fi­ca­tions com­pared to a stock FreeB­SD sys­tem, but the only change in the mem­o­ry area is the FMFI patch men­tioned above.

Expla­na­tion of what you see

A mem­o­ry frag­men­ta­tion index of 1000 is bad. It means the mem­o­ry is very frag­ment­ed. A val­ue of 0 means there is no frag­men­ta­tion, and a neg­a­tive val­ue means it is very easy to sat­is­fy an allo­ca­tion request.

So bars which go up are bad, bars which go down are good.

The page allo­ca­tor UMA buck­et freel­ists axis (dif­fer­ent col­ors for each size-rank) denotes the allo­ca­tion size. UMA buck­et Freel­ist size 0 is about 4k allo­ca­tions, and each size-increase dou­bles the allo­ca­tion size up to 16M at size-rank 12.

In the above graph for UMA buck­et freel­ist size 0 all val­ues are neg­a­tive. This means that all allo­ca­tions of upto 4k was always easy and no frag­men­ta­tion was noticed. This is not a sur­prise, giv­en that this is the small­est allo­ca­tion size.

The fact that already UMA buck­et freel­ist size 1 (8k allo­ca­tions) had already that much frag­men­ta­tion was a sur­prise to me at that point. But see the next part for a fix for this.

An imme­di­ate fix which pre­vents some of the fragmentation

The next graph is with a world from around 2024-04-14. It con­tains Bojans com­mit which pre­vents a bit of mem­o­ry frag­men­ta­tion around ker­nel stack guard pages.

Here it seems that Bojans fix had an imme­di­ate effect on buck­et freel­ist size 1 (8k allo­ca­tion size). It stays in “good shape” for a longer peri­od of time. Here in this graph we see an improve­ment at the begin­ning until upto buck­et size 6 (256k allo­ca­tion size). The graphs below even show an improve­ment over sev­er­al days of may upto UMA buck­et size 3 (32k allo­ca­tion size).

One of the next things I want to try (and plot) is review D16620 which seg­re­gates *_nofree allo­ca­tions per allo­ca­tion (a small patch) and I’m also inter­est­ed to see what effect review D40772 has.

Some more graphs

Some more graphs, each one from an updat­ed FreeBSD-current sys­tem (dates in the graph rep­re­sent the reboot into the cor­re­spond­ing new world). Chrome was rebuild by poudriere (con­sumes a lot of RAM rel­a­tive to oth­er pack­ages) sev­er­al times dur­ing those graphs.