Alexander Leidinger

Just another weblog

Jan
24

Strange per­for­mance prob­lem with the IBM HTTP Server (mod­i­fied apache)

Recently we had a strange per­for­mance prob­lem at work. A web appli­ca­tion was hav­ing slow response times from time to time and users com­plained. We did not see an uncom­mon CPU/mem/swap usage on any involved machine. I gen­er­ated heat-maps from per­for­mance mea­sure­ments and there where no obvi­ous traces of slow behav­ior. We did not find any rea­son why the appli­ca­tion should be slow for clients, but obvi­ously it was.

Then some­one men­tioned two recent apache DoS prob­lems. Num­ber one — the cookie hash issue — did not seem to be the cause, we did not see a huge CPU or mem­ory con­sump­tion which we would expect to see with such an attack. The sec­ond one — the slow reads prob­lem (no max con­nec­tion dura­tion time­out in apache, can be exploited by a small receive win­dow for TCP) — looked like it could be an issue. The slow read DoS prob­lem can be detected by look­ing at the server-status page.

What you would see on the server-status page are a lot of worker threads in the ‘W’ (write data) state. This is sup­posed to be an indi­ca­tion of slow reads. We did see this.

As our site is behind a reverse proxy with some kind of IDS/IPS fea­ture, we took the reverse proxy out of the pic­ture to get a bet­ter view of who is doing what (we do not have X-Forwarded-For configured).

At this point we noticed still a lot of con­nec­tion in the ‘W’ state from the rev-proxy. This was strange, it was not sup­posed to do this. After restart­ing the rev-proxy (while the clients went directly to the web­servers) we had those ‘W’ entries still in the server-status. This was get­ting really strange. And to add to this, the dura­tion of the ‘W’ state from the rev-proxy tells that this state is active since sev­eral thou­sand sec­onds. Ugh. WTF?

Ok, next step: killing the offend­ers. First I ver­i­fied in the list of con­nec­tions in the server-status (extended-status is acti­vated) that all worker threads with the rev–proxy con­nec­tion of a given PID are in this strange state and no client request is active. Then I killed this par­tic­u­lar PID. I wanted to do this until I do not have those strange con­nec­tions any­more. Unfor­tu­nately I arrived at PIDs which were listed in the server-status (even after a refresh), but not avail­able in the OS. That is bad. Very bad.

So the next step was to move all clients away from one web­server, and then to reboot this web­server com­pletely to be sure the entire sys­tem is in a known good state for future mon­i­tor­ing (the big ham­mer approach).

As we did not know if this strange state was due to some kind of mis-administration of the sys­tem or not, we decided to have the rev-proxy again in front of the web­server and to mon­i­tor the systems.

We sur­vived about one and a half day. After that all worker threads on all web­servers where in this state. DoS. At this point we where sure there was some­thing mali­cious going on (some days later our man­age­ment showed us a mail from a com­pany which offered secu­rity con­sult­ing 2 months before to make sure we do not get hit by a DDoS dur­ing the hol­i­day sea­son… a coincidence?).

Next step, ver­i­fi­ca­tion of miss­ing secu­rity patches (unfor­tu­nately it is not us who decides which patches we apply to the sys­tems). What we noticed is, that the rev-proxy is miss­ing a patch for a DoS prob­lem, and for the web­servers a new fix­pack was sched­uled to be released not far in the future (as of this writ­ing: it is avail­able now).

Since we applied the DoS fix for the rev-proxy, we do not have a prob­lem any­more. This is not really con­clu­sive, as we do not really know if this fixed the prob­lem or if the attacker stopped attack­ing us.

From read­ing what the DoS patch fixes, we would assume we should see some con­tin­u­ous traf­fic going on between the rev-rpoxy and the web­server, but there was noth­ing when we observed the strange state.

We are still not allowed to apply patches as we think we should do, but at least we have a bet­ter mon­i­tor­ing in place to watch out for this par­tic­u­lar prob­lem (acti­vate the extended sta­tus in apache/IHS, look for lines with state ‘W’ and a long dura­tion (col­umn ‘SS’), raise an alert if the dura­tion is higher than the max. possible/expected/desired dura­tion for all pos­si­ble URLs).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Dec
21

A phoronix bench­mark cre­ates a huge bench­mark­ing discussion

The recent Phoronix bench­mark which com­pared a release can­di­date of FreeBSD 9 with Ora­cle Linux Server 6.1 cre­ated a huge dis­cus­sion in the FreeBSD mail­inglists. The rea­son was that some peo­ple think the num­bers pre­sented there give a wrong pic­ture of FreeBSD. Partly because not all bench­mark num­bers are pre­sented in the most promi­nent page (as linked above), but only at a dif­fer­ent place. This gives the impres­sion that FreeBSD is infe­rior in this bench­mark while it just puts the focus (for a rea­son, accord­ing to some peo­ple) on a dif­fer­ent part of the bench­mark (to be more spe­cific, blog­bench is doing disk reads and writes in par­al­lel, FreeBSD gives higher pri­or­ity to writes than to reads, FreeBSD 9 out­per­forms OLS 6.1 in the writes while OLS 6.1 shines with the reads, and only the reads are pre­sented on the first page). Other com­plaints are that it is told that the default install was used (in this case UFS as the FS), when it was not (ZFS as the FS).

The author of the Phoronix arti­cle par­tic­i­pated in parts of the dis­cus­sion and asked for spe­cific improve­ment sug­ges­tions. A FreeBSD com­mit­ter seems to be already work­ing to get some issues resolved. What I do not like per­son­ally, is that the arti­cle is not updated with a remark that some things pre­sented do not reflect the real­ity and a retest is necessary.

As there was much talk in the thread but not much obvi­ous activ­ity from our side to resolve some issues, I started to improve the FreeBSD wiki page about bench­mark­ing so that we are able to point to it in case some­one wants to bench­mark FreeBSD. Oth­ers already chimed in and improved some things too. It is far from per­fect, some more eyes — and more impor­tantly some more fin­gers which add con­tent — are needed. Please go to the wiki page and try to help out (if you are afraid to write some­thing in the wiki, please at least tell your sug­ges­tions on a FreeBSD mail­inglist so that oth­ers can improve the wiki page).

What we need too, is a wiki page about FreeBSD tun­ing (a first step would be to take the man-page and con­vert it into a wiki page, then to improve it, and then to feed back the changes to the man-page while keep­ing the wiki page to be able to cross ref­er­ence parts from the bench­mark­ing page).

I already told about this in the thread about the Phoronix bench­mark: every­one is wel­come to improve the sit­u­a­tion. Do not talk, write some­thing. No mat­ter if it is an improve­ment to the bench­mark­ing page, tun­ing advise, or a tool which inspects the sys­tem and sug­gests some tun­ing. If you want to help in the wiki, cre­ate a First­name­Last­name account and ask a FreeBSD comit­ter for write access.

A while ago (IIRC we have to think in months or even years) there was some frame­work for auto­matic FreeBSD bench­mark­ing. Unfor­tu­nately the author run out of time. The frame­work was able to install a FreeBSD sys­tem on a machine, run some spec­i­fied bench­mark (not much bench­marks where inte­grated), and then install another FreeBSD ver­sion to run the same bench­mark, or to rein­stall the same ver­sion to run another bench­mark. IIRC there was also some DB behind which col­lected the results and maybe there was even some way to com­pare them. It would be nice if some­one could get some time to talk with the author to get the frame­work and set it up some­where, so that we have a con­trolled envi­ron­ment where we can do our own bench­marks in an auto­matic and repeat­able fash­ion with sev­eral FreeBSD versions.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Oct
04

IBM HTTP Server (7) and Verisign Inter­me­di­ate Certificates

I was fight­ing with the right way to add a recent Verisign cer­tifi­cate to a key­store for the IBM HTTP Server (IHS). I have used the ikey­man util­ity on Solaris.

The prob­lem indi­ca­tor was the error mes­sageSSL0208E: SSL Hand­shake Failed, Cer­tifi­cate val­i­da­tion error” in the SSL log of IHS.

The IBM web­sites where not really help­ful to track down the prob­lem (the miss­ing stuff). The Verisign instruc­tions did not lead to a work­ing solu­tion either.

What was done before: the Verisign Inter­me­di­ate Cer­tifi­cates where imported as “Signer Cer­tifi­cates”, and the cer­tifi­cate for the web­server was imported within “Per­sonal Cer­tifi­cates”. With­out the signer cer­tifi­cates the per­sonal cer­tifi­cate would not import due to an inter­me­di­ate cer­tifi­cated miss­ing (no valid trust-chain).

What I did to resolve the problem:

  •  I removed all Verisign certificates.
  •  I added the Verisign Root Cer­tifi­cate and the Verisign Inter­me­di­ate Cer­tifi­cate A as a signer cer­tifi­cate (use the “Add” but­ton). I also tried to add the Verisign Inter­me­di­ate Cer­tifi­cate B, but it com­plained that some part of it was already there as part of the Inter­me­di­ate Cer­tifi­cate A. I skipped this part.
  •  Then I con­verted the server cer­tifi­cate and key to a PKS12 file via “openssl pkcs12 –export –in server-cert.arm –out cert-for-ihs.p12 –inkey server-key.arm –name name_for_cert_in_ihs”.
  • After that I imported the cert-for-ihs.p12 as a “Per­sonal Cer­tifi­cate”. The import dia­log offers 3 items to import. I selected the “name_for_cert_in_ihs” and the one con­tain­ing “cn=verisign class 3 pub­lic pri­mary cer­ti­fi­ca­tion author­ity — g5” (when I selected the 3rd one too, it com­plained that a part of it was already imported with a dif­fer­ent name).

With this mod­i­fied key­store in place, I just had to select the cer­tifi­cate via “SSLServerCert name_for_cert_in_ihs” in the IHS con­fig and the prob­lem was fixed.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Sep
30

Forc­ing a route in Solaris?

I have a lit­tle prob­lem find­ing a clean solu­tion to the fol­low­ing problem.

A machine with two net­work inter­faces and no default route. The first inter­face gets an IP at boot time and the cor­re­spond­ing sta­tic route is inserted dur­ing boot into the rout­ing table with­out prob­lems. The sec­ond inter­face only gets an IP address when the shared-IP zones on the machine are started, dur­ing boot the inter­face is plumbed but with­out any address. The net­works on those inter­faces are not con­nected and the machine is not a gate­way (this means we have a machine–admin­is­tra­tion net­work and a production-network). The sta­tic routes we want to have for the addresses of the zones are not added to the rout­ing table, because the next hop is not reach­able at the time the routing-setup is done. As soon as the zones are up (and the inter­face gets an IP), a re-run of the routing-setup adds the miss­ing sta­tic routes.

Unfor­tu­nately I can not tell Solaris to keep the sta­tic route even if the next hop is not reach­able ATM (at least I have not found an option to the route com­mand which does this).

One solu­tion to this prob­lem would be to add an address at boot to the inter­face which does not have an address at boot-time ATM (prob­a­bly with the dep­re­cated flag set). The prob­lem is, that this sub­net (/28) has not enough free addresses any­more, so this is not an option.

Another solu­tion is to use a script which re-runs the routing-setup after the zones are started. This is a prag­matic solu­tion, but not a clean solution.

As I under­stand the in.routed man-page in.routed is not an option with the default con­fig, because the machine shall not route between the net­works, and shall not change the rout­ing based upon RIP mes­sages from other machines. Unfor­tu­nately I do not know enough about it to be sure, and I do not get the time to play around with this. I have seen some inter­st­ing options regard­ing this in the man-page, but play­ing around with this and sniff­ing the net­work to see what hap­pens, is not an option ATM. Any­one with a config/tutorial for this “do not broad­cast any­thing, do not accept any­thing from outside”-case (if possible)?

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Sep
15

Sony BRAVIA TV & DLNA formats

As I wrote ear­lier, I try to get some infos which for­mats my Sony BRAVIA 5800 TV is able to play over the net­work. Sony is not really help­ful (they tell only names some­one with a DLNA spec could cor­rectly inter­pret). Now I took the time to move my TV into a dif­fer­ent sub­net (the same where my NAS is in, not like before in a DMZ), and I installed minidlna. After some net­work sniff­ing, the use of the Intel UPnP Device Spy and some minidlna–source read­ing I have now a bet­ter idea what my Sony TV expects.

The DLNA-specification seems to man­date a MIME-type and some DLNA-specific iden­ti­fier which describes the con­tent a player (a DLNA-Renderer) is able to dis­play. In the fol­low­ing I will present the MIME-type, the DLNA-identifier, and prob­a­bly a Sony-specific identifier.

Regard­ing pic­tures the TV only accepts JPEGs, bit in small, medium and large sizes. I did not bother to look up what this means in real val­ues, so far this is not of high inter­est for me. For audio the TV accepts MP3s and LPCM (raw PCM sam­ples). The raw sniffed data from the TV looks like this:

image/jpeg:DLNA.ORG_PN=JPEG_SM
image/jpeg:DLNA.ORG_PN=JPEG_MED
image/jpeg:DLNA.ORG_PN=JPEG_LRG
audio/mpeg:DLNA.ORG_PN=MP3
audio/L16:DLNA.ORG_PN=LPCM

The more inter­est­ing part for me is the video part. The TV sup­ports MPEG2 Video (the MPEG_ part in the DLNA.ORG_PN) and H.264 (the AVC_ part in the DLNA.ORG_PN). For MPEG2 it sup­ports pro­gram streams (PS in DLNA.ORG_PN) and trans­port streams (TS in DLNA.ORG_PN). For PS it sup­ports PAL and NTSC res­o­lu­tions (720×576 is PAL, HD res­o­lu­tions like 720p or 1080i or 1080p are not sup­ported). The packet-length of a trans­port steam can be 188 bytes or 192 bytes. If the width is >= 1288 or the height is >= 720, minidlna adds HD in DLNA.ORG_PN, else it will add SD. The EU in DLNA.ORG_PN is for SD video with a height of 576 or 288 pix­els. Depend­ing of the com­bi­na­tion of the packet-length and if there is a time­stamp in use or not, the DLNA.ORG_PN will have a _ISO or a _T appended.

It also sup­ports H.264. The DLNA.ORG_PN starts with a AVC in this case. Only trans­port streams (TS  in DLNA.ORG_PN) is sup­ported. As with MPEG2, the packet-length of the TS can be 188 or 192 bytes. Depend­ing of the com­bi­na­tion of the packet-length and if there is a time­stamp in use or not, the DLNA.ORG_PN will have a _ISO or a _T appended. Depend­ing on the pro­file used, minidlna adds some more infos to the DLNA.ORG_PN, BL if it is a baseline-profile, MP if it is a main-profile, and HP if it is a high-profile. I do not see this in the valid video for­mats my TV requested over the wire. As with the MPEG2 for­mat, SD or HD is added (in minidlna) depend­ing on the width and height, but also on the bitrate of the video. For the main-profile the width has to be <= 720, the height <= 576 and the bitrate <= 10M (base 10, not base 2) for SD, and the width has to be <=1920, the height <= 1152 and the bitrate <= 20M (base 10, not base 2) for HD. For the high-profile the width has to be <=1920, the height <=1152, the bitrate <= 30M (base 10, not base 2) and the audio has to be AC3 to get the HD added in DLNA.ORG_PN. The audio is spec­i­fied in DLNA.ORG_PN as MPEG1_L3 for MP3, AC3 for AC3, and AAC or AAC_MULT5 for AAC (stereo or 5-channel). As can be seen below, the TV seems only to sup­port AC3 audio for AVC. The TV also has _24_, _50_ and _60_ in DLNA.ORG_PN. I did not find those things in the minidlna source (but I have not really searched for this). I could imag­ine that _24_ stands for 24 pic­tures per sec­ond, and the _50_ and _60_ for pro­gres­sive videos (with 50 respec­tively 60 pic­tures per sec­ond), but this is pure spec­u­la­tion from my side. Here is the raw sniffed data:

video/mpeg:DLNA.ORG_PN=AVC_TS_HD_24_AC3_ISO;SONY.COM_PN=AVC_TS_HD_24_AC3_ISO
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=AVC_TS_HD_24_AC3;SONY.COM_PN=AVC_TS_HD_24_AC3
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=AVC_TS_HD_24_AC3_T;SONY.COM_PN=AVC_TS_HD_24_AC3_T

video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_PS_PAL
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_PS_NTSC

video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_TS_SD_50_L2_T
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_TS_SD_60_L2_T
video/mpeg:DLNA.ORG_PN=MPEG_TS_SD_EU_ISO
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_TS_SD_EU
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_TS_SD_EU_T
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_TS_SD_50_AC3_T
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_TS_SD_60_AC3_T
video/mpeg:DLNA.ORG_PN=MPEG_TS_HD_50_L2_ISO;SONY.COM_PN=HD2_50_ISO
video/mpeg:DLNA.ORG_PN=MPEG_TS_HD_60_L2_ISO;SONY.COM_PN=HD2_60_ISO
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_TS_HD_50_L2_T;SONY.COM_PN=HD2_50_T
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=MPEG_TS_HD_60_L2_T;SONY.COM_PN=HD2_60_T

video/mpeg:DLNA.ORG_PN=AVC_TS_HD_50_AC3_ISO;SONY.COM_PN=AVC_TS_HD_50_AC3_ISO
video/mpeg:DLNA.ORG_PN=AVC_TS_HD_60_AC3_ISO;SONY.COM_PN=AVC_TS_HD_60_AC3_ISO
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=AVC_TS_HD_50_AC3;SONY.COM_PN=AVC_TS_HD_50_AC3
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=AVC_TS_HD_60_AC3;SONY.COM_PN=AVC_TS_HD_60_AC3
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=AVC_TS_HD_50_AC3_T;SONY.COM_PN=AVC_TS_HD_50_AC3_T
video/vnd.dlna.mpeg-tts:DLNA.ORG_PN=AVC_TS_HD_60_AC3_T;SONY.COM_PN=AVC_TS_HD_60_AC3_T

video/x-mp2t-mphl-188

So far I did not get the time to exper­i­ment with this. I also have the impres­sion that minidlna has still some rough edges (the sin­tel video I used to test before with a dif­fer­ent media server, does not show up in the list with minidlna).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,