In my last blog-post I described how to create a new linux_base port. This blog-post is about the other Linux–ports which make up the Linux–infrastructure in the FreeBSD Ports Collection for a given Linux-release.
What are linux-infrastructure ports?
A linux_base port contains as much as possible and at the same time as little as possible to make up a useful Linux-compatibility-experience in FreeBSD. I know, this is not a descriptive explanation. And it is not on purpose. There are no fixed rules what has to be inside or what not. It “matured” into the current shape. A practical example is, that there is no GUI-stuff in the linux_base. While you need the GUI parts like GTK or QT for software like Skype and acroread, you do not need them for headless game servers. While you may need various libraries for game servers, you may not need those for Skype or acroread. As such some standard parts are in separate ports which are named linux–LINUX_DIST_SUFFIX-NAME. For GTK and the Fedora 10 release this results in linux-f10-gtk2. Such generic ports which depend upon a specific Linux-release make up the Linux-infrastructure in the FreeBSD Ports Collection. Those ports are referenced in port-Makefiles via the USE_LINUX_APPS variable, e.g. USE_LINUX_APPS=gtk2.
If you created a new linux_base port, you need most standard infrastructure ports in a version for the Linux-release used in the linux_base port, to have the Linux-application ports in the FreeBSD Ports Collection working (if you are unlucky, some ports do not play well with the Linux-release you have chosen, but this is out of the scope of this HOWTO).
Updating Mk/bsd.linux-apps.mk
First we need to set the LINUX_DIST_SUFFIX variable to a value suitable to the new Linux-release. This is done in the conditional which checks the OVERRIDE_LINUX_NONBASE_PORTS variable for valid values. Add an appropriate conditional, and do not forget to add the new valid value to the IGNORE line in the last else branch of the conditional.
The next step is to check the _LINUX_APPS_ALL and _LINUX_26_APPS variables. If there are some infrastructure ports which are not available for the new Linux-release, the conditional which checks the availability of a given infrastructure port for a given Linux-release needs to be modified. If at a later step you notice that there are some additional infrastructure ports necessary for the new Linux-release, _LINUX_APPS_ALL and the check-logic needs to be modified too (e.g. add a new variable for your Linux-release, add the content of the variable to _LINUX_APPS_ALL, and change the check to do the right thing).
After that two tedious parts need to be done.
For each infrastructure port there is a set of variables. The name_PORT variable contains the location of the port in the Ports Collection. Typically you do not have to change it (if you really want to change it, do not do it, fix the naming of the infrastructure port instead), because we use a naming convention here which includes the LINUX_DIST_SUFFIX. The name_DETECT variable is an internal variable, do not change it (if you create a new infrastructure port, copy it from somewhere else and make sure the name in value of the variable matches the port name in the name of the variable). Then there are several name_suffix_FILE variables. Leave the existing ones alone, and add a new one with the correct suffix for your new Linux-release. The value of the variable needs to be an important file which is installed by the infrastructure port in question. FYI: The content of the name_suffix_FILE variables are used to set the name_DETECT variables, depending on the Linux-relase the name_DETECT variables are used to check if the port is already installed. Ideally the name_suffix_FILE variable points to a library in the port. The name_DEPENDS variable lists dependencies of this infrastructure port. If the dependencies changed in your Linux-release, you need to add a conditional to change the dependency if LINUX_DIST_SUFFIX is set to your Linux-release.
Normally this is all what needs to be done in PORTSDIR/Mk/bsd.linux-apps.mk, the rest of the file is code to check dependencies and some correctness checks.
The second tedious part is to actually create all those infrastructure ports. Normally you can copy an existing infrastructure port, rename it, adjust the PORTNAME, PORTVERSION, PORTREVISION, MASTER_SITES, PKGNAMEPREFIX, DISTFILES, CONFLICTS (also in all other Linux-release versions of this infrastructure port), LINUX_DIST_VER, RPMVERSION (if set/neccesary) and SRC_DISTFILE variables, generate the distfile checksums (make makesum), and fix the plist. I suggest to script parts of this work (as of this writing Freshports counts 68 ports where the portname starts with linux-f10-).
Adding new infrastructure ports, or removing infrastructure ports for a given Linux-release
If your Linux-release does not come with a package for an existing infrastructure port, just do not create a corresponding name_suffix_FILE line. You still need to do the right thing regarding dependencies of ports which depend upon this non-existing infrastructure port (if your Linux-release comes with packages for them).
To add a new infrastructure port, copy an existing block, rename the variables, set them correctly, add a new variable for your Linux-release in the first _LINUX_APPS_ALL section, add the content of this variable to _LINUX_APPS_ALL, and change the check-logic as described above.
Final words
If you have something which installs and deinstalls correctly, feel free to provide it on freebsd-emulation@FreeBSD.org for review/testing. If you have questions during the porting, feel also free to send a mail there.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: descriptive explanation,
freebsd ports collection,
game servers,
headless game,
linux application,
linux base,
linux compatibility,
linux dist,
linux ports,
linux release —
FreeBSD is in need of a new linux_base port. It is on my TODO list since a long time, but I do not get the time to create one. I still do not have the time to work on a new one, but when you read this, I managed to get the time to create a HOWTO which describes what needs to be done to create a new linux_base port.
I will not describe how to create a new linux_base port from scratch, I will just describe how you can copy the last one and update it to something newer based upon the existing infrastructure for RPM packages.
Specific questions which come up during porting a new Linux release should be asked on freebsd-emulation@FreeBSD.org, there are more people which can answer questions than here in my blog. I will add useful information to this HOWTO if necessary.
In the easy case most of the work is searching the right RPMs and their dependencies to use, and to create the plist.
Why do we need a new linux_base port?
The current linux_base port is based upon Fedora 10, which is end of life since December 2009. Even Fedora 13 is already end of life. Fedora 16 is supposed to be released this year. From a support point of view, Fedora 15 or maybe even Fedora 16 would be a good target for the next linux_base port. Other alternatives would be to use an extended lifetime release of another RPM based distribution, like for example CentOS 6 (which seems to be based upon Fedora 12 with backports from Fedora 13 and 14). Using a Linux release which is told to be supported for at least 10 years, sounds nice from a FreeBSD point of view (only minor changes to the linux ports in such a case, instead of creating a complete new linux_base each N+2 releases like with Fedora), but it also means additional work if you want to create the first linux_base port for it.
The mysteries you have to conquer if you want to create a new linux_base port
What we do not know is, if Fedora 15/16, CentOS 6, or any other Linux release will work in a supported FreeBSD release. There are two ways to find this out.
The first one is to take an existing Linux system, chroot into it (either via NFS or after making a copy into a directory of a FreeBSD system), and to run a lot of programs (acroread, skype, shells, scripts, …). The LTP testsuite is not that much useful here, as it will test mostly kernel features, but we do not know which kernel features are mandatory for a given userland of a Linux release.
The second way of testing if a given Linux release works on FreeBSD is to actually create a new linux_base port for it and test it without chrooting.
The first way is faster, if you are only interested in testing if something works. The second way provides an easy to setup testbed for FreeBSD kernel developers to fix the Linuxulator so that it works with the new linux_base port. Both ways have their merits, but it is up to the person doing the work to decide which way to go.
The meat: HOWTO create a new linux_base port
First off, you need a system (or a jail) without any linux_base port installed. After that you can create a new linux_base port (= lbN), by just making a copy of the latest one (= lbO). In lbN you need to add lbO as a CONFLICT, and in all other existing linux_base ports, you need to add lbN as a conflict.
Change the PORTNAME, PORTVERSION, reset the PORTREVISION in lbN, and set LINUX_DIST_VER to the new Linux-release version in the lbN Makefile (this is used in PORTSDIR/Mk/bsd.linux-rpm.mk and PORTSDIR/Mk/bsd.linux-apps.mk).
If you do not stay with Fedora, there is some more work to do before you can have a look at chosing RPMs for installation. You need to have a look at PORTSDIR/Mk/bsd.linux-rpm.mk and add some cases for the new LINUX_DIST you want to use. Do not forget to set LINUX_DIST in the lbN Makefile to the name of the distribution you use. You also need to augment the LINUX_DIST_VER check in PORTSDIR/Mk/bsd.linux-rpm.mk with some LINUX_DIST conditionals. If you are lucky, the directory structure for downloads is similar to the Fedora structure, and there is not a lot to do here.
When this is done, you can have a look at the BIN_DISTFILES variable in the lbN Makefile. Try to find similar RPMs for the new Linux release you want to port. Some may not be available, and it may also be the case that different ones are needed instead. I suggest to first work with the ones which are available (make makesum, test install and create plist). After that you need to find out what the replacement RPMs for non-existing ones are. You are on your own here. Search around the net, and/or have a look at the dependencies in the RPMs of lbO to determine if something was added as a dependency of something else or not (if not, forget about it ATM). When you managed to find replacement RPMs, you can now have a look at the dependencies of the RPMs in lbN. Do not add blindly all dependencies, not all are needed in FreeBSD (the linux_base ports are not supposed to create an environment which you can chroot into, they are supposed to augment the FreeBSD system to be able to run Linux programs in ports like they where FreeBSD native programs). What you need in the linux_base ports are libraries, config and data files which do not exist in FreeBSD or have a different syntax than in FreeBSD (those config or data files which are just in a different place, can be symlinked), and basic shell commands (which commands are needed or not… well… good question, in the past we made decisions what to include based upon problem reports from users). Now for the things which are not available and where not added as a dependency. Those are things which are either used during install, or where useful to have in the past. Find out by what it was replaced and have a look if this replacement can easily be used instead. If it can be used, add it. If not, well… bad luck, we (the FreeBSD community) will see how to handle this somehow.
If you think that you have all you need in BIN_DISTFILES, please update SRC_DISTFILES accordingly and generate the distfile via make –DPACKAGE_BUILDING makesum to have the checksums of the sources (for legal reasons we need them on our mirrors).
The next step is to have a look at REMOVE_DIRS, REMOVE_FILES and ADD_DIRS if something needs to be modified. Most of them are there to fall back to the corresponding FreeBSD directories/files, or because they are not needed at all (REMOVE_*). Do not remove directories from ADD_DIRS, they are created here to fix some edge conditions (I do not remember exactly why we had to add them, and I do not take the time ATM to search in the CVS history).
If you are lucky, this is all (make sure the plist is correct). If you are not lucky and you need to make some modifications to files, have a look at the do-build target in the Makefile, this is the place where some changes are done to create a nice user experience.
If you arrive here while creating a new linux_base port, lean back and feel a bit proud. You managed to create a new linux_base port. It is not very well tested at this moment, and it is far from everything which needs to be done to have the complete Linux infrastructure for a given Linux release, but the most important part is done. Please notify freebsd-emulation@FreeBSD.org and call for testers.
What is missing?
The full Linuxulator infrastructure for the FreeBSD Ports Collection has some more ports around a linux_base port. Most of the infrastructure for this is handled in Mk/bsd.linux-apps.mk.
UPDATE: I got some time to write how to update the Linux-infrastructure ports.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: easy case,
fedora,
linux base,
linux ports,
linux release,
minor changes,
plist,
rpms,
suppor,
target —
Everyone has his own way of setting up a machine to serve as a host of multiple jails. Here is my way, YMMV.
Initial FreeBSD install
I use several harddisks in a Software–RAID setup. It does not matter much if you set them up with one big partition or with several partitions, feel free to follow your preferences here. My way of partitioning the harddisks is described in a previous post. That post only shows the commands to split the harddisks into two partitions and use ZFS for the rootfs. The commands to initialize the ZFS data partition are not described, but you should be able to figure it out yourself (and you can decide on your own what kind of RAID level you want to use). For this FS I set atime, exec and setuid to off in the ZFS options.
On the ZFS data partition I create a new dataset for the system. For this dataset I set atime, exec and setuid to off in the ZFS options. Inside this dataset I create datasets for /home, /usr/compat, /usr/local, /usr/obj, /usr/ports/, /usr/src, /usr/sup and /var/ports. There are two ways of doing this. One way is to set the ZFS mountpoint. The way I prefer is to set relative symlinks to it, e.g. “cd /usr; ln –s ../data/system/usr_obj obj”. I do this because this way I can temporary import the pool on another machine (e.g. my desktop, if the need arises) without fear to interfere with the system. The ZFS options are set as follows:
ZFS options for data/system/*
|
Dataset
|
Option
|
Value |
| data/system/home |
exec |
on |
| data/system/usr_compat |
exec |
on |
| data/system/usr_compat |
setuid |
on |
| data/system/usr_local |
exec |
on |
| data/system/usr_local |
setuid |
on |
| data/system/usr_obj |
exec |
on |
| data/system/usr_ports |
exec |
on |
| data/system/usr_ports |
setuid |
on |
| data/system/usr_src |
exec |
on |
| data/system/usr_sup |
secondarycache |
none |
| data/system/var_ports |
exec |
on |
The exec option for home is not necessary if you keep separate datasets for each user. Normally I keep separate datasets for home directories, but Jail-Hosts should not have users (except the admins, but they should not keep data in their homes), so I just create a single home dataset. The setuid option for the usr_ports should not be necessary if you redirect the build directory of the ports to a different place (WRKDIRPREFIX in /etc/make.conf).
Installing ports
The ports I install by default are net/rsync, ports-mgmt/portaudit, ports-mgmt/portmaster, shells/zsh, sysutils/bsdstats, sysutils/ezjail, sysutils/smartmontools and sysutils/tmux.
Basic setup
In the crontab of root I setup a job to do a portsnap update once a day (I pick a random number between 0 and 59 for the minute, but keep a fixed hour). I also have http_proxy specified in /etc/profile, so that all machines in this network do not download everything from far away again and again, but can get the data from the local caching proxy. As a little watchdog I have a little @reboot rule in the crontab, which notifies me when a machine reboots:
@reboot grep "kernel boot file is" /var/log/messages | mail -s "`hostname` rebooted" root >/dev/null 2>&1
This does not replace a real monitoring solution, but in cases where real monitoring is overkill it provides a nice HEADS-UP (and shows you directly which kernel is loaded in case a non-default one is used).
Some default aliases I use everywhere are:
alias portmlist="portmaster -L | egrep -B1 '(ew|ort) version|Aborting|installed|dependencies|IGNORE|marked|Reason:|MOVED|deleted|exist|update' | grep -v '^--'"
alias portmclean="portmaster -t --clean-distfiles --clean-packages"
alias portmcheck="portmaster -y --check-depends"
Additional devfs rules for Jails
I have the need to give access to some specific devices in some jails. For this I need to setup a custom /etc/devfs.rules file. The files contains some ID numbers which need to be unique in the system. On a 9–current system the numbers one to four are already used (see /etc/defaults/devfs.rules). The next available number is obviously five then. First I present my devfs.rules entries, then I explain them:
[devfsrules_unhide_audio=5]
add path 'audio*' unhide
add path 'dsp*' unhide
add path midistat unhide
add path 'mixer*' unhide
add path 'music*' unhide
add path 'sequencer*' unhide
add path sndstat unhide
add path speaker unhide
[devfsrules_unhide_printers=6]
add path 'lpt*' unhide
add path 'ulpt*' unhide user 193 group 193
add path 'unlpt*' unhide user 193 group 193
[devfsrules_unhide_zfs=7]
add path zfs unhide
[devfsrules_jail_printserver=8]
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add include $devfsrules_unhide_printers
add include $devfsrules_unhide_zfs
[devfsrules_jail_withzfs=9]
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add include $devfsrules_unhide_zfs
The devfs_rules_unhide_XXX ones give access to specific devices, e.g. all the sound related devices or to local printers. The devfsrules_jail_XXX ones combine all the unhide rules for specific jail setups. Unfortunately the include directive is not recursive, so that we can not include the default devfsrules_jail profile and need to replicate its contents. The first three includes of each devfsrules_jail_XXX accomplish this. The unhide_zfs rule gives access to /dev/zfs, which is needed if you attach one or more ZFS datasets to a jail. I will explain how to use those profiles with ezjail in a follow-up post.
Jails setup
I use ezjail to manage jails, it is more comfortable than doing it by hand while at the same time allows me to do something by hand. My jails normally reside inside ZFS datasets, for this reason I have setup a special area (ZFS dataset data/jails) which is handled by ezjail.The corresponding ezjail.conf settings are:
ezjail_jaildir=/data/jails
ezjail_use_zfs="YES"
ezjail_jailzfs="data/jails"
I also disabled procfs and fdescfs in jails (but they can be enabled later for specific jails if necessary).
Unfortunately ezjail (as of v3.1) sets the mountpoint of a newly created dataset even if it is not necessary. For this reason I always issue a “zfs inherit mountpoint ” after creating a jail. This simplifies the case where you want to move/rename a dataset and want to have the mountpoint automcatically follow the change.
The access flags of /data/jails directory are 700, this prevents local users (there should be none, but better safe than sorry) to get access to files from users in jails with the same UID.
After the first create/update of the ezjail basejail the ZFS options of basejail (data/jails/basejail) and newjail (data/jails/newjail) need to be changed. For both exec and setuid should be changed to “on” The same needs to be done after creating a new jail for the new jail (before starting it).
The default ezjail flavour
In my default ezjail flavour I create some default user(s) with a basesystem-shell (via /data/jails/flavours/mydef/ezjail.flavour) before the package install, and change the shell to my preferred zsh afterwards (this is only valid if the jails are used only by in-house people, if you want to offer lightweight virtual machines to (unknown) customers, the default user(s) and shell(s) are obviously up to discussion). At the end I also run a “/usr/local/sbin/portmaster –y –check-depends” to make sure everything is in a sane state.
For the packages (/data/jails/flavours/mydef/pkg/) I add symlinks to the unversioned packages I want to install. I have the packages in a common (think about setting PACKAGES in make.conf and using PACKAGES/Latest/XYZ.tbz) directory (if they can be shared over various flavours), and they are unversioned so that I do not have to update the version number each time there is an update. The packages I install by default are bsdstats, portaudit, portmaster, zsh, tmux and all their dependencies.
In case you use jails to virtualize services and consolidate servers (e.g. DNS, HTTP, MySQL each in a separate jail) instead of providing lightweight virtual machines to (unknown) customers, there is also a benefit of sharing the distfiles and packages between jails on the same machine. To do this I create /data/jails/flavours/mydef/shared/ports/{distfiles,packages} which are then mounted via nullfs or NFS into all the jails from a common directory. This requires the following variables in /data/jails/flavours/mydef/etc/make.conf (I also keep the packages for different CPU types and compilers in the same subtree, if you do not care, just remove the “/${CC}/${CPUTYPE}” from the PACAKGES line):
DISTDIR= /shared/ports/distfiles
PACKAGES= /shared/ports/packages/${CC}/${CPUTYPE}
New jails
A future post will cover how I setup new jails in such a setup and how I customize the start order of jails or use some non–default settings for the jail-startup.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: dataset,
harddisks,
jails,
option value,
raid level,
setuid,
software raid,
symlinks,
temporary import,
zfs —
After 9 years with my current home-server (one jail for each service like MySQL, Squid, IMAP, Webmail, …) I decided that it is time to get something more recent (specially as I want to install some more jails but can not add more memory to this i386 system).
With my old system I had an UFS2-root on a 3-way-gmirror, swap on a 2-way-gmirror and my data in a 3-partition raidz (all in different slices of the same 3 harddisks, the 3rd slice which would correspond to the swap was used as a crashdump area).
For the new system I wanted to go all-ZFS, but I like to have my boot area separated from my data area (two pools instead of one big pool). As the machine has 12 GB RAM I also do not configure swap areas (at least by default, if I really need some swap I can add some later, see below). The system has five 1 TB harddisks and a 60 GB SSD. The harddisks do not have 4k-sectors, but I expect that there will be more and more 4k-sector drives in the future. As I prefer to plan ahead I installed the ZFS pools in a way that they are “4k-ready”. For those which have 4k-sector drives which do not tell the truth but announce they have 512 byte sectors (I will call them pseudo-4k-sector drives here) I include a description how to properly align the (GPT-)partitions.
A major requirement to boot 4k-sector-size ZFS pools is ZFS v28 (to be correct here, just the boot-code needs to support this, so if you take the pmbr and gptzfsboot from a ZFS v28 system, this should work… but I have not tested this). As I am running 9-current, this is not an issue for me.
A quick description of the task is to align the partition/slices properly for pseudo-4k-sector drives, and then use gnop temporary during pool creation time to have ZFS use 4k-sectors during the lifetime of the pool. The long description follows.
The layout of the drives
The five equal drives are partitioned with a GUID partition table (GPT). Each drive is divided into three partitions, one for the boot code, one for the root pool, and one for the data pool. The root pool is a 3-way mirror and the data pool is a raidz2 pool over all 5 disks. The remaining space on the two harddisks which do not take part in the mirroring of the root pool get swap partitions of the same size as the root partitions. One of them is used as a dumpdevice (this is –current, after all), and the other one stays unused as a cold-standby. The 60 GB SSD will be used as a ZFS cache device, but as of this writing I have not decided yet if I will use it for both pools or only for the data pool.
Calculating the offsets
The first sector after the GPT (created with standard settings) which can be used as the first sector for a partition is sector 34 on a 512 bytes-per-sector drive. On a pseudo-4k-sector drive this would be somewhere in the sector 4 of a real 4k-sector, so this is not a good starting point. The next 4k-aligned sector on a pseudo-4k-sector drive is sector 40 (sector 5 on a real 4k-sector drive).
The first partition is the partition for the FreeBSD boot code. It needs to have enough space for gptzfsboot. Only allocating the space needed for gptzfsboot looks a little bit dangerous regarding future updates, so my harddisks are configured to allocate half a megabyte for it. Additionally I leave some unused sectors as a safety margin after this first partition.
The second partition is the root pool (respectively the swap partitions). I let it start at sector 2048, which would be sector 256 on a real 4k-sector drive (if you do not want to waste less than half a megabyte just calculate a lower start sector which is divisible by 8 (-> start % 8 = 0)). It is a 4 GB partition, this is enough for the basesystem with some debug kernels. Everything else (/usr/{src,ports,obj,local}) will be in the data partition.
The last partition is directly after the second and uses the rest of the harddisk rounded down to a full GB (if the disk needs to be replaced with a similar sized disk there is some safety margin left, as the number of sectors in harddisks fluctuates a little bit even in the same models from the same manufacturing charge). For my harddisks this means a little bit more than half a gigabyte of wasted storage space.
The commands to partition the disks
In the following I use ada0 as the device of the disk, but it also works with daX or adX or similar. I installed one disk from an existing 9–current system instead of using some kind of installation media (beware, the pool is linked to the system which creates it, I booted a life-USB image to import it on the new system and copied the zpool.cache to /boot/zfs/ after importing on the new system).
Create the GPT:
gpart create -s gpt ada0
Create the boot partition:
gpart add -b 40 -s 1024 -t freebsd-boot ada0
Create the root/swap partitions and name them with a GPT label:
gpart add -b 2048 -s 4G -t freebsd-zfs -l rpool0 ada0
or for the swap
gpart add -b 2048 -s 4G -t freebsd-swap -l swap0 ada0
Create the data partition and name them with a GPT label:
gpart add -s 927G -t freebsd-zfs -l data0 ada0
Install the boot code in partition 1:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
The result looks like this:
# gpart show ada0
=> 34 1953525101 ada0 GPT (931G)
34 6 - free - (3.0k)
40 1024 1 freebsd-boot (512k)
1064 984 - free - (492k)
2048 8388608 2 freebsd-zfs (4.0G)
8390656 1944059904 3 freebsd-zfs (927G)
1952450560 1074575 - free - (524M)
Create the pools with 4k-ready internal structures
Creating a ZFS pool on one of the ZFS partitions without preparation will not create a 4k-ready pool on a pseudo-4k-drive. I used gnop (the settings do not survive a reboot) to make the partition temporary a 4k-sector partition (only the command for the root pool is shown, for the data partition gnop has to be used in the same way):
gnop create -S 4096 ada0p2
zpool create -O utf8only=on -o failmode=panic rpool ada0p2.nop
zpool export rpool
gnop destroy ada0p2.nop
zpool import rpool
After the pool is created, it will keep the 4k-sectors setting, even when accessed without gnop. You can ignore the options I used to create the pool, they are just my preferences (and the utf8only setting can only be done at pool creation time). If you prepare this on a system which already has a zpool on its own, you can maybe specify “-o cachefile=/boot/zfs/zpool2.cache” and copy it to the new pool as zpool.cache to make it bootable without the need of a life-image for the new system (I did not test this).
Verifying if a pool is pseudo-4k-ready
To verify that the pool will use 4k-sectors, you can have a look at the ashift values of the pool (the ashift is per vdev, so if you e.g. concattenate several mirrors, the ashift needs to be verified for each mirror, and if you concattenate just a bunch of disks, the ashift needs to be verified for all disks). It needs to be 12. To get the ashift value you can use zdb:
zdb rpool | grep ashift
Setting up the root pool
One of the benefits of root-on-zfs is that I can have multiple FreeBSD boot environments (BE). This means that I not only can have several different kernels, but also several different userland versions. To handle them comfortably, I use manageBE from Philipp Wuensche. This requires a specific setup of the root pool:
zfs create rpool/ROOT
zfs create rpool/ROOT/r220832M
zpool set bootfs=rpool/ROOT/r220832M rpool
zfs set freebsd:boot-environment=1 rpool/ROOT/r220832M # manageBE setting
The r220832M is my initial BE. I use the SVN revision of the source tree which was used during install of this BE as the name of the BE here. You also need to add the following line to /boot/loader.conf:
vfs.root.mountfrom="zfs:rpool/ROOT/r220832M"
As I want to have a shared /var and /tmp for all my BEs, I create them separately:
zfs create -o exec=off -o setuid=off -o mountpoint=/rpool/ROOT/r220832M/var rpool/var
zfs create -o setuid=off -o mountpoint=/rpool/ROOT/r220832M/tmp rpool/tmp
As I did this on the old system, I did not set the mountpoints to /var and /tmp, but this has to be done later.
Now the userland can be installed (e.g. buildworld/installworld/buildkernel/buildkernel/mergemaster with DESTDIR=/rpool/ROOT/r220832M/, do not forget to put a good master.passwd/passwd/group in the root pool).
When the root pool is ready make sure an empty /etc/fstab is inside, and configure the root as follows (only showing what is necessary for root-on-zfs):
loader.conf:
---snip---
vfs.root.mountfrom="zfs:rpool/ROOT/r220832M"
zfs_load="yes"
opensolaris_load="yes"
---snip---
rc.conf
---snip---
zfs_enable="YES"
---snip---
At this point of the setup I unmounted all zfs on rpool, set the mountpoint of rpool/var to /var and of rpool/tmp to /tmp, exported the pool and installed the harddisk in the new system. After booting a life-USB-image, importing the pool, putting the resulting zpool.cache into the pool (rpool/ROOT/r220832M/boot/zfs/), I rebooted into the rpool and attached the other harddisks to the pool (“zpool attach rpool ada0p2 ada1p2”, “zpool attach rpool ada0p2 ada2p2”):
After updating to a more recent version of 9-current, the BE looks like this now:
# ./bin/manageBE list
Poolname: rpool
BE Active Active Mountpoint Space
Name Now Reboot - Used
---- ------ ------ ---------- -----
r221295M yes yes / 2.66G
cannot open '-': dataset does not exist
r221295M@r221295M no no -
r220832M no no /rpool/ROOT/r220832M 561M
Used by BE snapshots: 561M
The little bug above (the error message which is probably caused by the snapshot which shows up here probably because I use listsnapshots=on) is already reported to the author of manageBE.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: boot code,
byte sectors,
gpt,
harddisks,
home server,
imap webmail,
partition table,
pmbr,
sector size,
v28 —
At work we have a Solaris 8 with a UFS which told the application that it can not create new files. The df command showed plenty if free inodes, and there was also enough space free in the FS. The reason that the application got the error was that while there where still plenty of fragments free, no free block was available anymore. You can not create a new file only with fragments, you need to have at least one free block for each new file.
To see the number of free blocks of a UFS you can call “fstyp –v | head –18″ and look at the value behind “nbfree”.
To get this working again we cleaned up the FS a little bit (compressing/deleting log files), but this is only a temporary solution. Unluckily we can not move this application to a Solaris 10 with ZFS, so I was playing around a little bit to see what we can do.
First I made a histogram of the file sizes. The backup of the FS I was playing with had a little bit more than 4 million files in this FS. 28.5% of them where smaller than or equal 512 bytes, 31.7% where smaller than or equal 1k (fragment size), 36% smaller than or equal 8k (block size) and 74% smaller than or equal 16k. The following graph shows in red the critical part, files which need a block and produce fragments, but can not life with only fragments.
Then I played around with newfs options for this one specific FS with this specific data mix. Changing the number of inodes did not change much the outcome for our problem (as expected). Changing the optimization from “time” to “space” (and restoring all the data from backup into the empty FS) gave us 1000 more free blocks. On a FS which had 10 Mio free blocks when empty this is not much, but we expect that the restore consumes less fragments and more full blocks than the live-FS of the application (we can not compare, as the content of the live-FS changed a lot since we had the problem). We assume that e.g. the logs of the application are split over a lot of fragments instead of full blocks, due to small writes to the logs by the application. The restore should write all the data in big chunks, so our expectation is that the FS will use more full blocks and less fragments. Because of this we expect that the live-FS with this specific data mix could benefit from changing the optimization.
I also played around with the fragment size. The expectation was that it will only change what is reported in the output of df (reducing the reported available space for the same amount of data). Here is the result:
The difference between 1k (default) and 2k is not much. For 8k we would have to much unused space lost. The fragment size of 4k looks like it is acceptable to get a better monitoring status of this particular data mix.
Based upon this we will probably create a new FS with a fragment size of 4k and we will probably switch the optimization directly to “space”. This way we will have a better reporting on the fill level of the FS for our data mix (but we will not be able to fully use the real space of the FS) and as such our monitoring should alert us in time to do a cleanup of the FS or to increase the size of the FS.
GD Star Rating
loading…
GD Star Rating
loading…
Tags: df command,
fragment size,
free blocks,
free inodes,
histogram,
million files,
solaris 8,
temporary solution,
time to space,
zfs —