Sol­ar­is 10/11(.3) boot panic/​crash after mov­ing rpool to a new stor­age sys­tem

Situ­ation

The boot disks of some Sol­ar­is LDOMs were mi­grated from one stor­age sys­tem to an­oth­er one via ZFS mir­ror­ing the rpool to the new sys­tem and de­tach­ing the old LUN.

Is­sue

After re­boot with on the new stor­age sys­tem Sol­ar­is 10 and 11(.3) pan­ic at boot.

Cause

  • rpool not on slice 0 but on slice 2
  • bug in Sol­ar­is when do­ing such a mir­ror and “just” do­ing a re­boot <- this is the real is­sue, it seems Sol­ar­is can not handle a change of the name of the un­der­ly­ing device for a rpool, as just mov­ing the par­ti­tion­ing to slice 0 is not fix­ing the pan­ic.

Fix

# boot from net­work (or an al­tern­ate pool which was not yet moved), import/​export the pools, boot from the pools
boot net -
# go to shell
# if needed: change the par­ti­tion­ing so that slice 0 has the same val­ues as slice 2 (re­spect­ively make sure the rpool is in slice 0)
zpool im­port -R /​tmp/​yyy rpool
zpool ex­port rpool
re­boot

 

New users in Sol­ar­is 10 branded zones on Sol­ar­is 11 not handled auto­mat­ic­ally

A col­league no­ticed that on a Sol­ar­is 11 sys­tem a Sol­ar­is 10 branded zone “gains” two new dae­mons which are run­ning with UID 16 and 17. Those users are not auto­mat­ic­ally ad­ded to /​etc/​passwd, /​etc/​shadow (and /​etc/​group)… at least not when the zones are im­por­ted from an ex­ist­ing Sol­ar­is 10 zone.

I ad­ded the two users (net­adm, netcfg) and the group (net­adm) to the Sol­ar­is 10 branded zones by hand (copy&paste of the lines in /​etc/​passwd, /​etc/​shadow, /​etc/​group + run pw­conv) for our few Sol­ar­is 10 branded zones on Sol­ar­is 11.

In­crease of DNS re­quests after a crit­ic­al patch up­date of Sol­ar­is 10

Some weeks ago we in­stalled crit­ic­al patch up­dates (CPU) on a Sol­ar­is 10 sys­tem (in­tern­al sys­tem, a year of CPU to in­stall, noth­ing in it af­fect­ing us or was con­sidered a se­cur­ity risk, we de­cided to ap­ply this one re­gard­less to not fall be­hind too much). Af­ter­wards we no­ticed that two zones are do­ing a lot of DNS re­quests. We no­ticed this already be­fore the zones went in­to pro­duc­tion and we con­figured a pos­it­ive time to live in nscd.conf for “hosts”. Ad­di­tion­ally we no­ticed a lot of DNS re­quests for IPv6 ad­dresses (AAAA look­ups), while ab­so­lutely no IPv6 ad­dress is con­figured in the zones (not even for loc­al­host… and those are ex­clus­ive IP zones). Ap­par­ently with one of the patches in the CPU the be­ha­viour changed re­gard­ing the cach­ing, I am not sure if we had the AAAA look­ups be­fore.

Today I got some time to de­bug this. After adding cach­ing of “ipnodes” in ad­di­tion to “hosts” (and I con­figured a neg­at­ive time to live for both at the same time), the DNS re­quests came down to a sane amount.

For the AAAA look­ups I have not found a solu­tion. By my read­ing of the doc­u­ment­a­tion I would as­sume there are not IPv6 DNS look­ups if there is not IPv6 ad­dress con­figured.

Status crypto cards HOWTO: prob­lems with the card read­er (sup­port could be bet­ter)

After hours (spread over weeks) I come to the con­clu­sion that there is a lot of po­ten­tial to im­prove the doc­u­ment­a­tion of card read­ers (but I doubt the card read­er vendors will do it) and of the pc­sc doc­u­ment­a­tion. It is not easy to ar­rive at a point where you un­der­stand everything. The com­pat­ib­il­ity list does not help much, as the card read­ers are partly past their end of life and the mod­els which re­place them are not lis­ted. Re­spect­ively the one I bought does not sup­port all the fea­tures I need. I even por­ted the driver to FreeBSD (not com­mit­ted, I wanted to test everything first) and a lot of stuff works, but one crit­ic­al part is that I can not store a cer­ti­fic­ate on the crypto card as the card read­er or the driver  does not sup­port ex­ten­ded AP­DUs (needed to trans­fer more than 255 bytes to the card read­er).

Well, the status so far:

  • I have a HOWTO what to in­stall to use crypto cards in FreeBSD
  • I have a HOWOT what to in­stall /​ con­fig­ure in Win­dows
  • I have a HOWTO re­gard­ing cre­at­ing keys on a open­p­gp v2 card and how to use this key with ssh on FreeBSD (or any oth­er unix-​like OS which can run pc­sc)
  • I have a card read­er which does not sup­port ex­ten­ded AP­DUs
  • I want to make sure what I write in the HOW­TOs is also suit­able for the use with Win­dows /​ PuTTY
  • it seems Win­dows needs a cer­ti­fic­ate and not only a key when us­ing the Win­dows CAPI (us­ing the vendor sup­plied card read­er driver) in PuTTY-​CSC (works at work with a USB token)
  • the pc­sc pkcs11 Win­dows DLL is not suit­able yet for use on Win­dows 8 64bit
  • I con­tac­ted the card read­er vendor if the card read­er or the driver is the prob­lem re­gard­ing the ex­ten­ded AP­DUs
  • I found prob­lems in gpg4win /​ pc­sc on Win­dows 8
  • I have send some money to the de­velopers of gpg4win to sup­port their work (if you use gnupg on Win­dows, try to send a few units of money to them, the work stag­nated as they need to spend their time for paid work)

So either I need a new card read­er, or have to wait for an up­date of the linux driver of the vendor… which prob­ably means it may be a lot faster to buy a new card read­er. When look­ing for one with at least a PIN pad, I either do not find any­thing which is lis­ted as sup­por­ted by pc­sc on the vendor pages (it is in­cred­ible how hard it is to nav­ig­ate the web­sites of some com­pan­ies… a lot of buzzwords but no way to get to the real products), or they only list up­dated mod­els where I do not know if they will work.

When I have some­thing which works with FreeBSD and Win­dows, I will pub­lish all the HOW­TOs here at once.

Com­plete net­work loss on Sol­ar­is 10u10 CPU 2012-​10 on vir­tu­al­ized T4-​2

The prob­lem I see at work: A T4-​2 with 3 guest LDOMs, vir­tu­al­ized disks and net­works lost the com­plete net­work con­nectiv­ity “out of the blue” once, and maybe “sporad­ic” dir­ectly after a cold boot. After a lot of dis­cus­sion with Or­acle, I have the im­pres­sion that we have two prob­lems here.

1st prob­lem:
Total net­work loss of the ma­chine (no zone or guest LDOM or the primary LDOM was able to have re­ceive or send IP pack­ets). This happened once. No idea how to re­pro­duce it. In the logs we see the mes­sage “[ID 920994 kern.warning] WARNING: vnetX: ex­ceeded num­ber of per­mit­ted hand­shake at­tempts (5) on chan­nel xxx”. Ac­cord­ing to Or­acle this is sup­posed to be fixed in 148677 – 01 which will come with Sol­ar­is 10u11. They sug­ges­ted to use a vsw in­ter­face in­stead of a vnet in­ter­face on the primary do­main to at least lower the prob­ab­il­ity of this prob­lem hit­ting us. They were not able to tell us how to re­pro­duce the prob­lem (seems to be a race con­di­tion, at least I get this im­pres­sion based upon the de­scrip­tion of the Or­acle en­gin­eer hand­ling the SR). Only a re­boot helped to get the prob­lem solved. I was told we are the only cli­ent which re­por­ted this kind of prob­lem, the patch for this prob­lem is based upon an in­tern­al bu­gre­port from in­tern­al tests.

2nd prob­lem:
After cold boots some­times some ma­chines (not all) are not able to con­nect to an IP on the T4. A re­boot helps, as does re­mov­ing an in­ter­face from an ag­greg­ate and dir­ectly adding it again (see be­low for the sys­tem con­fig). To try to re­pro­duce the prob­lem, we did a lot of warm re­boots of the primary do­main, and the prob­lem nev­er showed up. We did some cold re­boots, and the prob­lem showed up once.

In case someone else sees one of those prob­lems on his ma­chines too, please get in con­tact with me to see what we have in com­mon to try to track this down fur­ther and to share info which may help in maybe re­pro­du­cing the prob­lems.

Sys­tem setup:

  • T4-​2 with 4 HBAs and 8 NICs (4 * igb on-​board, 4 * nxge on ad­di­tion­al net­work card)
  • 3 guest LDOMs and one io+control do­main (both in the primary do­main)
  • the guest LDOMs use SAN disks over the 4 HBAs
  • the primary do­main uses a mirrored zpool on SSDs
  • 5 vswitch in the hy­per­visor
  • 4 ag­greg­ates (aggr1 – aggr4 with L2-​policy), each one with one igb and one nxge NIC
  • each ag­greg­ate is con­nec­ted to a sep­ar­ate vswitch (the 5th vswitch is for machine-​internal com­mu­nic­a­tion)
  • each guest LDOM has three vnets, each vnets con­nec­ted to a vswitch (1 guest LDOM has aggr1+2 only for zones (via vnets), 2 guest LDOMs have ag­gr 3+4 only for zones (via vnets), and all LDOMs have aggr2+3 (via vnets) for global-​zone com­mu­nic­a­tion, all LDOMs are ad­di­tion­ally con­nec­ted to the machine-​internal-​only vswitch via the 3rd vnet)
  • primary do­main uses 2 vnets con­nec­ted to the vswitch which is con­nec­ted to aggr2 and aggr3 (con­sist­ency with the oth­er LDOMs on this ma­chine) and has no zones
  • this means each en­tity (primary do­main, guest LDOMs and each zone) has two vnets in and those two vnets are con­figured in a link-​based IPMP setup (vnet-linkprop=phys-state)
  • each vnet has VLAN tag­ging con­figured in the hy­per­visor (with the zones be­ing in dif­fer­ent VLANs than the LDOMs)

The pro­posed change by Or­acle is to re­place the 2 vnet in­ter­faces in the primary do­main with 2 vsw in­ter­faces (which means to do VLAN tag­ging in the primary do­main dir­ectly in­stead of in the vnet con­fig). To have IPMP work­ing this means to have vsw-linkprop=phys-state. We have two sys­tems with the same setup, on one sys­tem we already changed this and it is work­ing as be­fore. As we don’t know how to re­pro­duce the 1st prob­lem, we don’t know if the prob­lem is fixed or not, re­spect­ively what the prob­ab­il­ity is to get hit again by this prob­lem.

Ideas /​ sug­ges­tions /​ info wel­come.

Which crypto card to use with FreeBSD (ssh/​gpg)

The re­cent se­cur­ity in­cid­ent triggered a dis­cus­sion how to se­cure ssh/​gpg keys.

One way I want to fo­cus on here (be­cause it is the way I want to use at home), is to store the keys on a crypto card. I did some re­search for suit­able crypto cards and found one which is called Fei­tian PKI Smart­card, and one which is called Open­P­GP card. The Open­P­GP card also ex­ists in a USB ver­sion (ba­sic­ally a small ver­sion of the card is already in­teg­rated in­to a small USB card read­er).

The Fei­tian card is re­por­ted to be able to handle RSA keys upto 2048 bits. They do not seem to handle DSA (or ECDSA) keys. The smart­card quick starter guide they have  (the Tun­ing smart­card file sys­tem part) tells how to change the para­met­ers of the card to store upto 9 keys on it.

The spec of the Open­P­GP card tells that it sup­ports RSA keys upto 3072 bits, but there are re­ports that it is able to handle RSA keys upto 4096 bits (you need to have at least GPG 2.0.18 to handle that big keys on the crypto card). It looks to me like the card is not handle DSA (or ECDSA) cards. There are only slots for upto 3 keys on it.

If I go this way, I would also need a card read­er. It seems a class 3 one (hard­ware PIN pad and dis­play) would be the most “future-​proof” way to go ahead. I found a Rein­er SCT cy­ber­Jack secoder card read­er, which is be­lieved to be sup­por­ted by OpenSC and seems to be a good bal­ance between cost and fea­tures of the Rein­er SCT card read­ers.

If any­one read­ing this can sug­gest a bet­ter crypto card (keys upto 4096 bits, more than 3 slots, and/​or DSA/​ECDSA  sup­port), or a bet­ter card read­er, or has any prac­tic­al ex­per­i­ence with any of those com­pon­ents on FreeBSD, please add a com­ment.

ICS on the Sam­sung Galaxy Tab 10.1

Last week I had a look if there are some news for an of­fi­cial up­date of the Galaxy Tab 10.1 to ICS. To my sur­prise there is one at least in Italy. The one I found to down­load was marked more or less for the European mar­ket. Well… that was good enough for me and the night from Fri­day to Sat­urday I have spend to up­date the Tab by hand (un­for­tu­nately this in­cludes a fact­ory re­set, no smooth mi­gra­tion from an old ver­sion, but at least I still have root ac­cess).

What I no­ticed so far:

  • OpenGL ES speed im­proved from 4.2 to 6.6 FPS.
  • I had some lock-​ups so far, I do not know if this may be re­lated to some re­stored data (app data and e.g. Bluetooth/​WLAN con­fig re­stored with Ti­tani­um­Backup) or to bugs (Dalvik cache and cache par­ti­tion where clean, fact­ory re­set was done too pri­or to restor­ing from the backup). I had to press the power but­ton for some seconds to ini­ti­ate a re­boot. Most of the time it helped to wait a minute be­fore en­ter­ing the PIN for the SIM. One time it did not help at all, the only way to get it work­ing was to take my WLAN Ac­cess Point (AP) off­line, start the Tab, enter the PIN, and to re­start the AP. At that point I had GPS and WLAN in the Tab ac­tiv­ated, in the lock-​ups be­fore I did not have GPS act­ive. I had some­thing sim­il­ar like this with my Nex­us S when it got ICS, some­how this re­solved it­self. Up­date 2012-​08-​14: I googled a bit, there was a bug in ICS 4.0.3 re­lated to WLAN, but I have 4.0.4 on the Tab, so this may not be this. I also got the freeze without WLAN but with the mo­bile data con­nec­tion act­ive. 2nd up­date 2012-​08-​14: If I dis­able ac­count syncing with the mo­bile data con­nec­tion it does not freeze. I have not yet tried this with the WLAN con­nec­tion. Up­date 2012-​08-​16: The syn­chron­iz­a­tion of the cal­en­dar data caused the prob­lem. De­let­ing all data for any app with cal­en­dar in the name and re-​syncing fixed the prob­lem. No freeze since I did this yes­ter­day.
  • When I open/​close a folder (much missed fea­ture in An­droid 3.x), the Tab speaks with me (some­thing like “Folder XXX opened” in the con­figured lan­guage… that is a bit an­noy­ing).
  • I like the de­fault back­ground im­age.
  • Up­date 2012-​08-​14: The bat­tery icon does stay green even when the bat­tery is nearly empty. 🙁

I was not able to test the Email APP yet, I am wait­ing for a warranty-​replacement of the PSU of my serv­er at home (Murphy’s law: Your PSU will break when you just star­ted a big renov­a­tion of your kit­chen and do not have time to take care about it, and when you get time a lot of people from the PSU-​manufacturer which take care about warranty-​replacements are in hol­i­day).

I also need to check the mo­bile data con­nectiv­ity (qual­ity and speed), but I would ex­pect that it is not worse than be­fore. Up­date 2012-​08-​14: The down­load speed test shows sim­il­ar res­ults than be­fore, the up­load speed test is slower, but this may be the mo­bile net­work here where I tested. At least I can con­firm that it works, mod­ulo the prob­lem of the freezes de­scribed above.