Alexander Leidinger

Just another weblog

Jan
15

Com­plete net­work loss on Solaris 10u10 CPU 2012-10 on vir­tu­al­ized T4-2

The prob­lem I see at work: A T4-2 with 3 guest LDOMs, vir­tu­al­ized disks and net­works lost the com­plete net­work con­nec­tiv­ity “out of the blue” once, and maybe “spo­radic” directly after a cold boot. After a lot of dis­cus­sion with Ora­cle, I have the impres­sion that we have two prob­lems here.

1st prob­lem:
Total net­work loss of the machine (no zone or guest LDOM or the pri­mary LDOM was able to have receive or send IP pack­ets). This hap­pened once. No idea how to repro­duce it. In the logs we see the mes­sage “[ID 920994 kern.warning] WARNING: vnetX: exceeded num­ber of per­mit­ted hand­shake attempts (5) on chan­nel xxx”. Accord­ing to Ora­cle this is sup­posed to be fixed in 148677 – 01 which will come with Solaris 10u11. They sug­gested to use a vsw inter­face instead of a vnet inter­face on the pri­mary domain to at least lower the prob­a­bil­ity of this prob­lem hit­ting us. They were not able to tell us how to repro­duce the prob­lem (seems to be a race con­di­tion, at least I get this impres­sion based upon the descrip­tion of the Ora­cle engi­neer han­dling the SR). Only a reboot helped to get the prob­lem solved. I was told we are the only client which reported this kind of prob­lem, the patch for this prob­lem is based upon an inter­nal bugre­port from inter­nal tests.

2nd prob­lem:
After cold boots some­times some machines (not all) are not able to con­nect to an IP on the T4. A reboot helps, as does remov­ing an inter­face from an aggre­gate and directly adding it again (see below for the sys­tem con­fig). To try to repro­duce the prob­lem, we did a lot of warm reboots of the pri­mary domain, and the prob­lem never showed up. We did some cold reboots, and the prob­lem showed up once.

In case some­one else sees one of those prob­lems on his machines too, please get in con­tact with me to see what we have in com­mon to try to track this down fur­ther and to share info which may help in maybe repro­duc­ing the problems.

Sys­tem setup:

  • T4-2 with 4 HBAs and 8 NICs (4 * igb on-board, 4 * nxge on addi­tional net­work card)
  • 3 guest LDOMs and one io+control domain (both in the pri­mary domain)
  • the guest LDOMs use SAN disks over the 4 HBAs
  • the pri­mary domain uses a mir­rored zpool on SSDs
  • 5 vswitch in the hypervisor
  • 4 aggre­gates (aggr1 — aggr4 with L2-policy), each one with one igb and one nxge NIC
  • each aggre­gate is con­nected to a sep­a­rate vswitch (the 5th vswitch is for machine-internal communication)
  • each guest LDOM has three vnets, each vnets con­nected to a vswitch (1 guest LDOM has aggr1+2 only for zones (via vnets), 2 guest LDOMs have aggr 3+4 only for zones (via vnets), and all LDOMs have aggr2+3 (via vnets) for global-zone com­mu­ni­ca­tion, all LDOMs are addi­tion­ally con­nected to the machine-internal-only vswitch via the 3rd vnet)
  • pri­mary domain uses 2 vnets con­nected to the vswitch which is con­nected to aggr2 and aggr3 (con­sis­tency with the other LDOMs on this machine) and has no zones
  • this means each entity (pri­mary domain, guest LDOMs and each zone) has two vnets in and those two vnets are con­fig­ured in a link-based IPMP setup (vnet-linkprop=phys-state)
  • each vnet has VLAN tag­ging con­fig­ured in the hyper­vi­sor (with the zones being in dif­fer­ent VLANs than the LDOMs)

The pro­posed change by Ora­cle is to replace the 2 vnet inter­faces in the pri­mary domain with 2 vsw inter­faces (which means to do VLAN tag­ging in the pri­mary domain directly instead of in the vnet con­fig). To have IPMP work­ing this means to have vsw-linkprop=phys-state. We have two sys­tems with the same setup, on one sys­tem we already changed this and it is work­ing as before. As we don’t know how to repro­duce the 1st prob­lem, we don’t know if the prob­lem is fixed or not, respec­tively what the prob­a­bil­ity is to get hit again by this problem.

Ideas / sug­ges­tions / info welcome.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Jan
24

Strange per­for­mance prob­lem with the IBM HTTP Server (mod­i­fied apache)

Recently we had a strange per­for­mance prob­lem at work. A web appli­ca­tion was hav­ing slow response times from time to time and users com­plained. We did not see an uncom­mon CPU/mem/swap usage on any involved machine. I gen­er­ated heat-maps from per­for­mance mea­sure­ments and there where no obvi­ous traces of slow behav­ior. We did not find any rea­son why the appli­ca­tion should be slow for clients, but obvi­ously it was.

Then some­one men­tioned two recent apache DoS prob­lems. Num­ber one — the cookie hash issue — did not seem to be the cause, we did not see a huge CPU or mem­ory con­sump­tion which we would expect to see with such an attack. The sec­ond one — the slow reads prob­lem (no max con­nec­tion dura­tion time­out in apache, can be exploited by a small receive win­dow for TCP) — looked like it could be an issue. The slow read DoS prob­lem can be detected by look­ing at the server-status page.

What you would see on the server-status page are a lot of worker threads in the ‘W’ (write data) state. This is sup­posed to be an indi­ca­tion of slow reads. We did see this.

As our site is behind a reverse proxy with some kind of IDS/IPS fea­ture, we took the reverse proxy out of the pic­ture to get a bet­ter view of who is doing what (we do not have X-Forwarded-For configured).

At this point we noticed still a lot of con­nec­tion in the ‘W’ state from the rev-proxy. This was strange, it was not sup­posed to do this. After restart­ing the rev-proxy (while the clients went directly to the web­servers) we had those ‘W’ entries still in the server-status. This was get­ting really strange. And to add to this, the dura­tion of the ‘W’ state from the rev-proxy tells that this state is active since sev­eral thou­sand sec­onds. Ugh. WTF?

Ok, next step: killing the offend­ers. First I ver­i­fied in the list of con­nec­tions in the server-status (extended-status is acti­vated) that all worker threads with the rev–proxy con­nec­tion of a given PID are in this strange state and no client request is active. Then I killed this par­tic­u­lar PID. I wanted to do this until I do not have those strange con­nec­tions any­more. Unfor­tu­nately I arrived at PIDs which were listed in the server-status (even after a refresh), but not avail­able in the OS. That is bad. Very bad.

So the next step was to move all clients away from one web­server, and then to reboot this web­server com­pletely to be sure the entire sys­tem is in a known good state for future mon­i­tor­ing (the big ham­mer approach).

As we did not know if this strange state was due to some kind of mis-administration of the sys­tem or not, we decided to have the rev-proxy again in front of the web­server and to mon­i­tor the systems.

We sur­vived about one and a half day. After that all worker threads on all web­servers where in this state. DoS. At this point we where sure there was some­thing mali­cious going on (some days later our man­age­ment showed us a mail from a com­pany which offered secu­rity con­sult­ing 2 months before to make sure we do not get hit by a DDoS dur­ing the hol­i­day sea­son… a coincidence?).

Next step, ver­i­fi­ca­tion of miss­ing secu­rity patches (unfor­tu­nately it is not us who decides which patches we apply to the sys­tems). What we noticed is, that the rev-proxy is miss­ing a patch for a DoS prob­lem, and for the web­servers a new fix­pack was sched­uled to be released not far in the future (as of this writ­ing: it is avail­able now).

Since we applied the DoS fix for the rev-proxy, we do not have a prob­lem any­more. This is not really con­clu­sive, as we do not really know if this fixed the prob­lem or if the attacker stopped attack­ing us.

From read­ing what the DoS patch fixes, we would assume we should see some con­tin­u­ous traf­fic going on between the rev-rpoxy and the web­server, but there was noth­ing when we observed the strange state.

We are still not allowed to apply patches as we think we should do, but at least we have a bet­ter mon­i­tor­ing in place to watch out for this par­tic­u­lar prob­lem (acti­vate the extended sta­tus in apache/IHS, look for lines with state ‘W’ and a long dura­tion (col­umn ‘SS’), raise an alert if the dura­tion is higher than the max. possible/expected/desired dura­tion for all pos­si­ble URLs).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Sep
30

Forc­ing a route in Solaris?

I have a lit­tle prob­lem find­ing a clean solu­tion to the fol­low­ing problem.

A machine with two net­work inter­faces and no default route. The first inter­face gets an IP at boot time and the cor­re­spond­ing sta­tic route is inserted dur­ing boot into the rout­ing table with­out prob­lems. The sec­ond inter­face only gets an IP address when the shared-IP zones on the machine are started, dur­ing boot the inter­face is plumbed but with­out any address. The net­works on those inter­faces are not con­nected and the machine is not a gate­way (this means we have a machine–admin­is­tra­tion net­work and a production-network). The sta­tic routes we want to have for the addresses of the zones are not added to the rout­ing table, because the next hop is not reach­able at the time the routing-setup is done. As soon as the zones are up (and the inter­face gets an IP), a re-run of the routing-setup adds the miss­ing sta­tic routes.

Unfor­tu­nately I can not tell Solaris to keep the sta­tic route even if the next hop is not reach­able ATM (at least I have not found an option to the route com­mand which does this).

One solu­tion to this prob­lem would be to add an address at boot to the inter­face which does not have an address at boot-time ATM (prob­a­bly with the dep­re­cated flag set). The prob­lem is, that this sub­net (/28) has not enough free addresses any­more, so this is not an option.

Another solu­tion is to use a script which re-runs the routing-setup after the zones are started. This is a prag­matic solu­tion, but not a clean solution.

As I under­stand the in.routed man-page in.routed is not an option with the default con­fig, because the machine shall not route between the net­works, and shall not change the rout­ing based upon RIP mes­sages from other machines. Unfor­tu­nately I do not know enough about it to be sure, and I do not get the time to play around with this. I have seen some inter­st­ing options regard­ing this in the man-page, but play­ing around with this and sniff­ing the net­work to see what hap­pens, is not an option ATM. Any­one with a config/tutorial for this “do not broad­cast any­thing, do not accept any­thing from outside”-case (if possible)?

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Sep
21

Speed traps with chmod

I have the habit to chmod with the rel­a­tive nota­tion (e.g. g+w or a+r or go-w or sim­i­lar) instead of the absolute one (e.g. 0640 or u=rw,g=r,o=). Recently I had to chmod a lot of files. As usual I was using the rel­a­tive nota­tion. With a lot of files, this took a lot of time. Time was not really an issue, so I did not stop it to restart with a bet­ter per­form­ing com­mand (e.g. find /path –type f –print0 | xargs –0 chmod 0644; find /path –type d –print0 | xargs –0 chmod 0755), but I thought a lit­tle tips&tricks post­ing may be in order, as not every­one knows the difference.

The rel­a­tive notation

When you spec­ify g+w, it means to remove the write access for the group, but keep every­thing else like it is. Nat­u­rally this means that chmod first has to lookup the cur­rent access rights. So for each async write request, there has to be a read-request first.

The absolute notation

The absolute nota­tion is what most peo­ple are used to (at least the numeric one). It does not need to read the access rights before chang­ing them, so there is less I/O to be done to get what you want. The draw­back is that it is not so nice for recur­sive changes. You do not want to have the x-bit for data files, but you need it for direc­to­ries. If you only have a tree with data files where you want to have an uni­form access, the exam­ple above via find is prob­a­bly faster (for sure if the direc­tory meta-data is still in RAM).

If you have a mix of bina­ries and data, it is a lit­tle bit more tricky to come up with a way which is faster. If the data has a name-pattern, you could use it in the find.

And if you have a non-uniform access for the group bits and want to make sure the owner has write access to every­thing, it may be faster to use the rel­a­tive nota­tion than to find a replace­ment command-sequence with the absolute notation.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Aug
10

Rants about JASS (Solaris Secu­rity Toolkit)

Recently I switched to a new client where the Solaris Secu­rity Toolkit (JASS) is exten­sively used. I am now in the process of updat­ing some things, among them are JET and JASS. As part of this work I reeval­u­ate the local JASS mod­i­fi­ca­tions. Pre­vi­ously a cus­tom JASS pack­age was used, but in case JASS is updated by Ora­cle at some point in time (and an update is really needed, see below), this would need some amount of work to find out the dif­fer­ences and to for­ward port them to the new ver­sion. If every­thing is well doc­u­mented, this should not be hard to do, but the per­son doing the work also needs to find the up-to-date docs.

To make it more easy I decided to change this. I now install the offi­cial JASS pack­age via JET together with the lat­est patch for it, and then let JET copy our mod­i­fi­ca­tions over the installed pack­age. Instead of mod­i­fy­ing exist­ing dri­vers, I cre­ated our own dri­vers with a ref­er­ence to the dri­ver which served as a base.

While doing this I encoun­tered sev­eral short­com­ings of JASS on Solaris 10.

There are sev­eral FS based checks which do not make sense to do for the FS of zones in a global zone (at least not the way I use JASS, so maybe a con­fig­urable way of chang­ing the behav­ior should serve for every­one). If zones are installed in /zones, you do not need to check for files with­out valid UIDs (you surely find a lot of files, as the users are defined inside the zones and not in the global zone) or sim­i­lar things (even not for world writable files, as the zones are installed in a root-access-only sub­tree and inside the zones there may be other secu­rity con­straints con­fig­ured inside JASS, read: it is the respon­si­bil­ity of JASS inside the zone to do this). An easy solu­tion would be to exclude those FS which con­tain zones (and as we only have one sub­tree, I just hard­coded this in sev­eral scripts).

I also miss the pos­si­bil­ity (maybe I over­looked a sim­ple way) for the ssh check to limit the Allow­Root­Lo­gin to spe­cific hosts. JASS only checks yes or no, but can not limit it to spe­cific hosts (e.g. via “Match IP/hostname”). Often you do not need to per­mit root-logins (RBAC/sudo/…), but some­times it is the only way to han­dle a par­tic­u­lar edge-case (or to speed up an action dra­mat­i­cally), and in such cases you do not want to allow root-logins more than necessary.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,