The boot disks of some Solaris LDOMs were migrated from one storage system to another one via ZFS mirroring the rpool to the new system and detaching the old LUN.
After reboot with on the new storage system Solaris 10 and 11(.3) panic at boot.
- rpool not on slice 0 but on slice 2
- bug in Solaris when doing such a mirror and “just” doing a reboot <- this is the real issue, it seems Solaris can not handle a change of the name of the underlying device for a rpool, as just moving the partitioning to slice 0 is not fixing the panic.
# boot from network (or an alternate pool which was not yet moved), import/export the pools, boot from the pools
boot net -
# go to shell
# if needed: change the partitioning so that slice 0 has the same values as slice 2 (respectively make sure the rpool is in slice 0)
zpool import -R /tmp/yyy rpool
zpool export rpool
A colleague noticed that on a Solaris 11 system a Solaris 10 branded zone “gains” two new daemons which are running with UID 16 and 17. Those users are not automatically added to /etc/passwd, /etc/shadow (and /etc/group)… at least not when the zones are imported from an existing Solaris 10 zone.
I added the two users (netadm, netcfg) and the group (netadm) to the Solaris 10 branded zones by hand (copy&paste of the lines in /etc/passwd, /etc/shadow, /etc/group + run pwconv) for our few Solaris 10 branded zones on Solaris 11.
Some weeks ago we installed critical patch updates (CPU) on a Solaris 10 system (internal system, a year of CPU to install, nothing in it affecting us or was considered a security risk, we decided to apply this one regardless to not fall behind too much). Afterwards we noticed that two zones are doing a lot of DNS requests. We noticed this already before the zones went into production and we configured a positive time to live in nscd.conf for “hosts”. Additionally we noticed a lot of DNS requests for IPv6 addresses (AAAA lookups), while absolutely no IPv6 address is configured in the zones (not even for localhost… and those are exclusive IP zones). Apparently with one of the patches in the CPU the behaviour changed regarding the caching, I am not sure if we had the AAAA lookups before.
Today I got some time to debug this. After adding caching of “ipnodes” in addition to “hosts” (and I configured a negative time to live for both at the same time), the DNS requests came down to a sane amount.
For the AAAA lookups I have not found a solution. By my reading of the documentation I would assume there are not IPv6 DNS lookups if there is not IPv6 address configured.
After hours (spread over weeks) I come to the conclusion that there is a lot of potential to improve the documentation of card readers (but I doubt the card reader vendors will do it) and of the pcsc documentation. It is not easy to arrive at a point where you understand everything. The compatibility list does not help much, as the card readers are partly past their end of life and the models which replace them are not listed. Respectively the one I bought does not support all the features I need. I even ported the driver to FreeBSD (not committed, I wanted to test everything first) and a lot of stuff works, but one critical part is that I can not store a certificate on the crypto card as the card reader or the driver does not support extended APDUs (needed to transfer more than 255 bytes to the card reader).
Well, the status so far:
- I have a HOWTO what to install to use crypto cards in FreeBSD
- I have a HOWOT what to install / configure in Windows
- I have a HOWTO regarding creating keys on a openpgp v2 card and how to use this key with ssh on FreeBSD (or any other unix-like OS which can run pcsc)
- I have a card reader which does not support extended APDUs
- I want to make sure what I write in the HOWTOs is also suitable for the use with Windows / PuTTY
- it seems Windows needs a certificate and not only a key when using the Windows CAPI (using the vendor supplied card reader driver) in PuTTY-CSC (works at work with a USB token)
- the pcsc pkcs11 Windows DLL is not suitable yet for use on Windows 8 64bit
- I contacted the card reader vendor if the card reader or the driver is the problem regarding the extended APDUs
- I found problems in gpg4win / pcsc on Windows 8
- I have send some money to the developers of gpg4win to support their work (if you use gnupg on Windows, try to send a few units of money to them, the work stagnated as they need to spend their time for paid work)
So either I need a new card reader, or have to wait for an update of the linux driver of the vendor… which probably means it may be a lot faster to buy a new card reader. When looking for one with at least a PIN pad, I either do not find anything which is listed as supported by pcsc on the vendor pages (it is incredible how hard it is to navigate the websites of some companies… a lot of buzzwords but no way to get to the real products), or they only list updated models where I do not know if they will work.
When I have something which works with FreeBSD and Windows, I will publish all the HOWTOs here at once.
The problem I see at work: A T4-2 with 3 guest LDOMs, virtualized disks and networks lost the complete network connectivity “out of the blue” once, and maybe “sporadic” directly after a cold boot. After a lot of discussion with Oracle, I have the impression that we have two problems here.
Total network loss of the machine (no zone or guest LDOM or the primary LDOM was able to have receive or send IP packets). This happened once. No idea how to reproduce it. In the logs we see the message “[ID 920994 kern.warning] WARNING: vnetX: exceeded number of permitted handshake attempts (5) on channel xxx”. According to Oracle this is supposed to be fixed in 148677 – 01 which will come with Solaris 10u11. They suggested to use a vsw interface instead of a vnet interface on the primary domain to at least lower the probability of this problem hitting us. They were not able to tell us how to reproduce the problem (seems to be a race condition, at least I get this impression based upon the description of the Oracle engineer handling the SR). Only a reboot helped to get the problem solved. I was told we are the only client which reported this kind of problem, the patch for this problem is based upon an internal bugreport from internal tests.
After cold boots sometimes some machines (not all) are not able to connect to an IP on the T4. A reboot helps, as does removing an interface from an aggregate and directly adding it again (see below for the system config). To try to reproduce the problem, we did a lot of warm reboots of the primary domain, and the problem never showed up. We did some cold reboots, and the problem showed up once.
In case someone else sees one of those problems on his machines too, please get in contact with me to see what we have in common to try to track this down further and to share info which may help in maybe reproducing the problems.
- T4-2 with 4 HBAs and 8 NICs (4 * igb on-board, 4 * nxge on additional network card)
- 3 guest LDOMs and one io+control domain (both in the primary domain)
- the guest LDOMs use SAN disks over the 4 HBAs
- the primary domain uses a mirrored zpool on SSDs
- 5 vswitch in the hypervisor
- 4 aggregates (aggr1 – aggr4 with L2-policy), each one with one igb and one nxge NIC
- each aggregate is connected to a separate vswitch (the 5th vswitch is for machine-internal communication)
- each guest LDOM has three vnets, each vnets connected to a vswitch (1 guest LDOM has aggr1+2 only for zones (via vnets), 2 guest LDOMs have aggr 3+4 only for zones (via vnets), and all LDOMs have aggr2+3 (via vnets) for global-zone communication, all LDOMs are additionally connected to the machine-internal-only vswitch via the 3rd vnet)
- primary domain uses 2 vnets connected to the vswitch which is connected to aggr2 and aggr3 (consistency with the other LDOMs on this machine) and has no zones
- this means each entity (primary domain, guest LDOMs and each zone) has two vnets in and those two vnets are configured in a link-based IPMP setup (vnet-linkprop=phys-state)
- each vnet has VLAN tagging configured in the hypervisor (with the zones being in different VLANs than the LDOMs)
The proposed change by Oracle is to replace the 2 vnet interfaces in the primary domain with 2 vsw interfaces (which means to do VLAN tagging in the primary domain directly instead of in the vnet config). To have IPMP working this means to have vsw-linkprop=phys-state. We have two systems with the same setup, on one system we already changed this and it is working as before. As we don’t know how to reproduce the 1st problem, we don’t know if the problem is fixed or not, respectively what the probability is to get hit again by this problem.
Ideas / suggestions / info welcome.
The recent security incident triggered a discussion how to secure ssh/gpg keys.
One way I want to focus on here (because it is the way I want to use at home), is to store the keys on a crypto card. I did some research for suitable crypto cards and found one which is called Feitian PKI Smartcard, and one which is called OpenPGP card. The OpenPGP card also exists in a USB version (basically a small version of the card is already integrated into a small USB card reader).
The Feitian card is reported to be able to handle RSA keys upto 2048 bits. They do not seem to handle DSA (or ECDSA) keys. The smartcard quick starter guide they have (the Tuning smartcard file system part) tells how to change the parameters of the card to store upto 9 keys on it.
The spec of the OpenPGP card tells that it supports RSA keys upto 3072 bits, but there are reports that it is able to handle RSA keys upto 4096 bits (you need to have at least GPG 2.0.18 to handle that big keys on the crypto card). It looks to me like the card is not handle DSA (or ECDSA) cards. There are only slots for upto 3 keys on it.
If I go this way, I would also need a card reader. It seems a class 3 one (hardware PIN pad and display) would be the most “future-proof” way to go ahead. I found a Reiner SCT cyberJack secoder card reader, which is believed to be supported by OpenSC and seems to be a good balance between cost and features of the Reiner SCT card readers.
If anyone reading this can suggest a better crypto card (keys upto 4096 bits, more than 3 slots, and/or DSA/ECDSA support), or a better card reader, or has any practical experience with any of those components on FreeBSD, please add a comment.
Last week I had a look if there are some news for an official update of the Galaxy Tab 10.1 to ICS. To my surprise there is one at least in Italy. The one I found to download was marked more or less for the European market. Well… that was good enough for me and the night from Friday to Saturday I have spend to update the Tab by hand (unfortunately this includes a factory reset, no smooth migration from an old version, but at least I still have root access).
What I noticed so far:
- OpenGL ES speed improved from 4.2 to 6.6 FPS.
- I had some lock-ups so far, I do not know if this may be related to some restored data (app data and e.g. Bluetooth/WLAN config restored with TitaniumBackup) or to bugs (Dalvik cache and cache partition where clean, factory reset was done too prior to restoring from the backup). I had to press the power button for some seconds to initiate a reboot. Most of the time it helped to wait a minute before entering the PIN for the SIM. One time it did not help at all, the only way to get it working was to take my WLAN Access Point (AP) offline, start the Tab, enter the PIN, and to restart the AP. At that point I had GPS and WLAN in the Tab activated, in the lock-ups before I did not have GPS active. I had something similar like this with my Nexus S when it got ICS, somehow this resolved itself. Update 2012-08-14: I googled a bit, there was a bug in ICS 4.0.3 related to WLAN, but I have 4.0.4 on the Tab, so this may not be this. I also got the freeze without WLAN but with the mobile data connection active. 2nd update 2012-08-14: If I disable account syncing with the mobile data connection it does not freeze. I have not yet tried this with the WLAN connection. Update 2012-08-16: The synchronization of the calendar data caused the problem. Deleting all data for any app with calendar in the name and re-syncing fixed the problem. No freeze since I did this yesterday.
- When I open/close a folder (much missed feature in Android 3.x), the Tab speaks with me (something like “Folder XXX opened” in the configured language… that is a bit annoying).
- I like the default background image.
- Update 2012-08-14: The battery icon does stay green even when the battery is nearly empty. 🙁
I was not able to test the Email APP yet, I am waiting for a warranty-replacement of the PSU of my server at home (Murphy’s law: Your PSU will break when you just started a big renovation of your kitchen and do not have time to take care about it, and when you get time a lot of people from the PSU-manufacturer which take care about warranty-replacements are in holiday).
I also need to check the mobile data connectivity (quality and speed), but I would expect that it is not worse than before. Update 2012-08-14: The download speed test shows similar results than before, the upload speed test is slower, but this may be the mobile network here where I tested. At least I can confirm that it works, modulo the problem of the freezes described above.