Alexander Leidinger

Just another weblog

Jan
24

Strange per­for­mance prob­lem with the IBM HTTP Server (mod­i­fied apache)

Recently we had a strange per­for­mance prob­lem at work. A web appli­ca­tion was hav­ing slow response times from time to time and users com­plained. We did not see an uncom­mon CPU/mem/swap usage on any involved machine. I gen­er­ated heat-maps from per­for­mance mea­sure­ments and there where no obvi­ous traces of slow behav­ior. We did not find any rea­son why the appli­ca­tion should be slow for clients, but obvi­ously it was.

Then some­one men­tioned two recent apache DoS prob­lems. Num­ber one — the cookie hash issue — did not seem to be the cause, we did not see a huge CPU or mem­ory con­sump­tion which we would expect to see with such an attack. The sec­ond one — the slow reads prob­lem (no max con­nec­tion dura­tion time­out in apache, can be exploited by a small receive win­dow for TCP) — looked like it could be an issue. The slow read DoS prob­lem can be detected by look­ing at the server-status page.

What you would see on the server-status page are a lot of worker threads in the ‘W’ (write data) state. This is sup­posed to be an indi­ca­tion of slow reads. We did see this.

As our site is behind a reverse proxy with some kind of IDS/IPS fea­ture, we took the reverse proxy out of the pic­ture to get a bet­ter view of who is doing what (we do not have X-Forwarded-For configured).

At this point we noticed still a lot of con­nec­tion in the ‘W’ state from the rev-proxy. This was strange, it was not sup­posed to do this. After restart­ing the rev-proxy (while the clients went directly to the web­servers) we had those ‘W’ entries still in the server-status. This was get­ting really strange. And to add to this, the dura­tion of the ‘W’ state from the rev-proxy tells that this state is active since sev­eral thou­sand sec­onds. Ugh. WTF?

Ok, next step: killing the offend­ers. First I ver­i­fied in the list of con­nec­tions in the server-status (extended-status is acti­vated) that all worker threads with the rev–proxy con­nec­tion of a given PID are in this strange state and no client request is active. Then I killed this par­tic­u­lar PID. I wanted to do this until I do not have those strange con­nec­tions any­more. Unfor­tu­nately I arrived at PIDs which were listed in the server-status (even after a refresh), but not avail­able in the OS. That is bad. Very bad.

So the next step was to move all clients away from one web­server, and then to reboot this web­server com­pletely to be sure the entire sys­tem is in a known good state for future mon­i­tor­ing (the big ham­mer approach).

As we did not know if this strange state was due to some kind of mis-administration of the sys­tem or not, we decided to have the rev-proxy again in front of the web­server and to mon­i­tor the systems.

We sur­vived about one and a half day. After that all worker threads on all web­servers where in this state. DoS. At this point we where sure there was some­thing mali­cious going on (some days later our man­age­ment showed us a mail from a com­pany which offered secu­rity con­sult­ing 2 months before to make sure we do not get hit by a DDoS dur­ing the hol­i­day sea­son… a coincidence?).

Next step, ver­i­fi­ca­tion of miss­ing secu­rity patches (unfor­tu­nately it is not us who decides which patches we apply to the sys­tems). What we noticed is, that the rev-proxy is miss­ing a patch for a DoS prob­lem, and for the web­servers a new fix­pack was sched­uled to be released not far in the future (as of this writ­ing: it is avail­able now).

Since we applied the DoS fix for the rev-proxy, we do not have a prob­lem any­more. This is not really con­clu­sive, as we do not really know if this fixed the prob­lem or if the attacker stopped attack­ing us.

From read­ing what the DoS patch fixes, we would assume we should see some con­tin­u­ous traf­fic going on between the rev-rpoxy and the web­server, but there was noth­ing when we observed the strange state.

We are still not allowed to apply patches as we think we should do, but at least we have a bet­ter mon­i­tor­ing in place to watch out for this par­tic­u­lar prob­lem (acti­vate the extended sta­tus in apache/IHS, look for lines with state ‘W’ and a long dura­tion (col­umn ‘SS’), raise an alert if the dura­tion is higher than the max. possible/expected/desired dura­tion for all pos­si­ble URLs).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Sep
30

Forc­ing a route in Solaris?

I have a lit­tle prob­lem find­ing a clean solu­tion to the fol­low­ing problem.

A machine with two net­work inter­faces and no default route. The first inter­face gets an IP at boot time and the cor­re­spond­ing sta­tic route is inserted dur­ing boot into the rout­ing table with­out prob­lems. The sec­ond inter­face only gets an IP address when the shared-IP zones on the machine are started, dur­ing boot the inter­face is plumbed but with­out any address. The net­works on those inter­faces are not con­nected and the machine is not a gate­way (this means we have a machine–admin­is­tra­tion net­work and a production-network). The sta­tic routes we want to have for the addresses of the zones are not added to the rout­ing table, because the next hop is not reach­able at the time the routing-setup is done. As soon as the zones are up (and the inter­face gets an IP), a re-run of the routing-setup adds the miss­ing sta­tic routes.

Unfor­tu­nately I can not tell Solaris to keep the sta­tic route even if the next hop is not reach­able ATM (at least I have not found an option to the route com­mand which does this).

One solu­tion to this prob­lem would be to add an address at boot to the inter­face which does not have an address at boot-time ATM (prob­a­bly with the dep­re­cated flag set). The prob­lem is, that this sub­net (/28) has not enough free addresses any­more, so this is not an option.

Another solu­tion is to use a script which re-runs the routing-setup after the zones are started. This is a prag­matic solu­tion, but not a clean solution.

As I under­stand the in.routed man-page in.routed is not an option with the default con­fig, because the machine shall not route between the net­works, and shall not change the rout­ing based upon RIP mes­sages from other machines. Unfor­tu­nately I do not know enough about it to be sure, and I do not get the time to play around with this. I have seen some inter­st­ing options regard­ing this in the man-page, but play­ing around with this and sniff­ing the net­work to see what hap­pens, is not an option ATM. Any­one with a config/tutorial for this “do not broad­cast any­thing, do not accept any­thing from outside”-case (if possible)?

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Sep
21

Speed traps with chmod

I have the habit to chmod with the rel­a­tive nota­tion (e.g. g+w or a+r or go-w or sim­i­lar) instead of the absolute one (e.g. 0640 or u=rw,g=r,o=). Recently I had to chmod a lot of files. As usual I was using the rel­a­tive nota­tion. With a lot of files, this took a lot of time. Time was not really an issue, so I did not stop it to restart with a bet­ter per­form­ing com­mand (e.g. find /path –type f –print0 | xargs –0 chmod 0644; find /path –type d –print0 | xargs –0 chmod 0755), but I thought a lit­tle tips&tricks post­ing may be in order, as not every­one knows the difference.

The rel­a­tive notation

When you spec­ify g+w, it means to remove the write access for the group, but keep every­thing else like it is. Nat­u­rally this means that chmod first has to lookup the cur­rent access rights. So for each async write request, there has to be a read-request first.

The absolute notation

The absolute nota­tion is what most peo­ple are used to (at least the numeric one). It does not need to read the access rights before chang­ing them, so there is less I/O to be done to get what you want. The draw­back is that it is not so nice for recur­sive changes. You do not want to have the x-bit for data files, but you need it for direc­to­ries. If you only have a tree with data files where you want to have an uni­form access, the exam­ple above via find is prob­a­bly faster (for sure if the direc­tory meta-data is still in RAM).

If you have a mix of bina­ries and data, it is a lit­tle bit more tricky to come up with a way which is faster. If the data has a name-pattern, you could use it in the find.

And if you have a non-uniform access for the group bits and want to make sure the owner has write access to every­thing, it may be faster to use the rel­a­tive nota­tion than to find a replace­ment command-sequence with the absolute notation.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Aug
10

Rants about JASS (Solaris Secu­rity Toolkit)

Recently I switched to a new client where the Solaris Secu­rity Toolkit (JASS) is exten­sively used. I am now in the process of updat­ing some things, among them are JET and JASS. As part of this work I reeval­u­ate the local JASS mod­i­fi­ca­tions. Pre­vi­ously a cus­tom JASS pack­age was used, but in case JASS is updated by Ora­cle at some point in time (and an update is really needed, see below), this would need some amount of work to find out the dif­fer­ences and to for­ward port them to the new ver­sion. If every­thing is well doc­u­mented, this should not be hard to do, but the per­son doing the work also needs to find the up-to-date docs.

To make it more easy I decided to change this. I now install the offi­cial JASS pack­age via JET together with the lat­est patch for it, and then let JET copy our mod­i­fi­ca­tions over the installed pack­age. Instead of mod­i­fy­ing exist­ing dri­vers, I cre­ated our own dri­vers with a ref­er­ence to the dri­ver which served as a base.

While doing this I encoun­tered sev­eral short­com­ings of JASS on Solaris 10.

There are sev­eral FS based checks which do not make sense to do for the FS of zones in a global zone (at least not the way I use JASS, so maybe a con­fig­urable way of chang­ing the behav­ior should serve for every­one). If zones are installed in /zones, you do not need to check for files with­out valid UIDs (you surely find a lot of files, as the users are defined inside the zones and not in the global zone) or sim­i­lar things (even not for world writable files, as the zones are installed in a root-access-only sub­tree and inside the zones there may be other secu­rity con­straints con­fig­ured inside JASS, read: it is the respon­si­bil­ity of JASS inside the zone to do this). An easy solu­tion would be to exclude those FS which con­tain zones (and as we only have one sub­tree, I just hard­coded this in sev­eral scripts).

I also miss the pos­si­bil­ity (maybe I over­looked a sim­ple way) for the ssh check to limit the Allow­Root­Lo­gin to spe­cific hosts. JASS only checks yes or no, but can not limit it to spe­cific hosts (e.g. via “Match IP/hostname”). Often you do not need to per­mit root-logins (RBAC/sudo/…), but some­times it is the only way to han­dle a par­tic­u­lar edge-case (or to speed up an action dra­mat­i­cally), and in such cases you do not want to allow root-logins more than necessary.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Apr
19

Solaris UFS full while df shows plenty of free space/inodes

At work we have a Solaris 8 with a UFS which told the appli­ca­tion that it can not cre­ate new files. The df com­mand showed plenty if free inodes, and there was also enough space free in the FS. The rea­son that the appli­ca­tion got the error was that while there where still plenty of frag­ments free, no free block was avail­able any­more. You can not cre­ate a new file only with frag­ments, you need to have at least one free block for each new file.

To see the num­ber of free blocks of a UFS you can call “fstyp –v | head –18″ and look at the value behind “nbfree”.

To get this work­ing again we cleaned up the FSlit­tle bit (compressing/deleting log files), but this is only a tem­po­rary solu­tion. Unluck­ily we can not move this appli­ca­tion to a Solaris 10 with ZFS, so I was play­ing around a lit­tle bit to see what we can do.

First I made a his­togram of the file sizes. The backup of the FS I was play­ing with had a lit­tle bit more than 4 mil­lion files in this FS. 28.5% of them where smaller than or equal 512 bytes, 31.7% where smaller than or equal 1k (frag­ment size), 36% smaller than or equal 8k (block size) and 74% smaller than or equal 16k. The fol­low­ing graph shows in red the crit­i­cal part, files which need a block and pro­duce frag­ments, but can not life with only fragments.

chart

Then I played around with newfs options for this one spe­cific FS with this spe­cific data mix. Chang­ing the num­ber of inodes did not change much the out­come for our prob­lem (as expected). Chang­ing the opti­miza­tion from “time” to “space” (and restor­ing all the data from backup into the empty FS) gave us 1000 more free blocks. On a FS which had 10 Mio free blocks when empty this is not much, but we expect that the restore con­sumes less frag­ments and more full blocks than the live-FS of the appli­ca­tion (we can not com­pare, as the con­tent of the live-FS changed a lot since we had the prob­lem). We assume that e.g. the logs of the appli­ca­tion are split over a lot of frag­ments instead of full blocks, due to small writes to the logs by the appli­ca­tion. The restore should write all the data in big chunks, so our expec­ta­tion is that the FS will use more full blocks and less frag­ments. Because of this we expect that the live-FS with this spe­cific data mix could ben­e­fit from chang­ing the optimization.

I also played around with the frag­ment size. The expec­ta­tion was that it will only change what is reported in the out­put of df (reduc­ing the reported avail­able space for the same amount of data). Here is the result:

chart

The dif­fer­ence between 1k (default) and 2k is not much. For 8k we would have to much unused space lost. The frag­ment size of 4k looks like it is accept­able to get a bet­ter mon­i­tor­ing sta­tus of this par­tic­u­lar data mix.

Based upon this we will prob­a­bly cre­ate a new FS with a frag­ment size of 4k and we will prob­a­bly switch the opti­miza­tion directly to “space”. This way we will have a bet­ter report­ing on the fill level of the FS for our data mix (but we will not be able to fully use the real space of the FS) and as such our mon­i­tor­ing should alert us in time to do a cleanup of the FS or to increase the size of the FS.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,