Alexander Leidinger

Just another weblog

Jan
24

Strange per­for­mance prob­lem with the IBM HTTP Server (mod­i­fied apache)

Recently we had a strange per­for­mance prob­lem at work. A web appli­ca­tion was hav­ing slow response times from time to time and users com­plained. We did not see an uncom­mon CPU/mem/swap usage on any involved machine. I gen­er­ated heat-maps from per­for­mance mea­sure­ments and there where no obvi­ous traces of slow behav­ior. We did not find any rea­son why the appli­ca­tion should be slow for clients, but obvi­ously it was.

Then some­one men­tioned two recent apache DoS prob­lems. Num­ber one — the cookie hash issue — did not seem to be the cause, we did not see a huge CPU or mem­ory con­sump­tion which we would expect to see with such an attack. The sec­ond one — the slow reads prob­lem (no max con­nec­tion dura­tion time­out in apache, can be exploited by a small receive win­dow for TCP) — looked like it could be an issue. The slow read DoS prob­lem can be detected by look­ing at the server-status page.

What you would see on the server-status page are a lot of worker threads in the ‘W’ (write data) state. This is sup­posed to be an indi­ca­tion of slow reads. We did see this.

As our site is behind a reverse proxy with some kind of IDS/IPS fea­ture, we took the reverse proxy out of the pic­ture to get a bet­ter view of who is doing what (we do not have X-Forwarded-For configured).

At this point we noticed still a lot of con­nec­tion in the ‘W’ state from the rev-proxy. This was strange, it was not sup­posed to do this. After restart­ing the rev-proxy (while the clients went directly to the web­servers) we had those ‘W’ entries still in the server-status. This was get­ting really strange. And to add to this, the dura­tion of the ‘W’ state from the rev-proxy tells that this state is active since sev­eral thou­sand sec­onds. Ugh. WTF?

Ok, next step: killing the offend­ers. First I ver­i­fied in the list of con­nec­tions in the server-status (extended-status is acti­vated) that all worker threads with the rev–proxy con­nec­tion of a given PID are in this strange state and no client request is active. Then I killed this par­tic­u­lar PID. I wanted to do this until I do not have those strange con­nec­tions any­more. Unfor­tu­nately I arrived at PIDs which were listed in the server-status (even after a refresh), but not avail­able in the OS. That is bad. Very bad.

So the next step was to move all clients away from one web­server, and then to reboot this web­server com­pletely to be sure the entire sys­tem is in a known good state for future mon­i­tor­ing (the big ham­mer approach).

As we did not know if this strange state was due to some kind of mis-administration of the sys­tem or not, we decided to have the rev-proxy again in front of the web­server and to mon­i­tor the systems.

We sur­vived about one and a half day. After that all worker threads on all web­servers where in this state. DoS. At this point we where sure there was some­thing mali­cious going on (some days later our man­age­ment showed us a mail from a com­pany which offered secu­rity con­sult­ing 2 months before to make sure we do not get hit by a DDoS dur­ing the hol­i­day sea­son… a coincidence?).

Next step, ver­i­fi­ca­tion of miss­ing secu­rity patches (unfor­tu­nately it is not us who decides which patches we apply to the sys­tems). What we noticed is, that the rev-proxy is miss­ing a patch for a DoS prob­lem, and for the web­servers a new fix­pack was sched­uled to be released not far in the future (as of this writ­ing: it is avail­able now).

Since we applied the DoS fix for the rev-proxy, we do not have a prob­lem any­more. This is not really con­clu­sive, as we do not really know if this fixed the prob­lem or if the attacker stopped attack­ing us.

From read­ing what the DoS patch fixes, we would assume we should see some con­tin­u­ous traf­fic going on between the rev-rpoxy and the web­server, but there was noth­ing when we observed the strange state.

We are still not allowed to apply patches as we think we should do, but at least we have a bet­ter mon­i­tor­ing in place to watch out for this par­tic­u­lar prob­lem (acti­vate the extended sta­tus in apache/IHS, look for lines with state ‘W’ and a long dura­tion (col­umn ‘SS’), raise an alert if the dura­tion is higher than the max. possible/expected/desired dura­tion for all pos­si­ble URLs).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Jan
24

More dri­vers avail­able in the FreeBSD-kernel doxy­gen docs

Yes­ter­day I com­mit­ted some more con­figs to gen­er­ate doxy­gen doc­u­men­ta­tion of FreeBSD-kernel dri­vers. I mechan­i­cally gen­er­ated miss­ing con­figs for sub­di­rec­to­ries of src/sys/dev/. This means there is no depen­dency infor­ma­tion included in the con­figs, and as such you will not get links e.g. to the PCI doc­u­men­ta­tion, if a dri­ver calls func­tions in the PCI dri­ver (feel free to tell me about such depen­den­cies).

If  you want to gen­er­ate the HTML or PDF ver­sion of some sub­sys­tem, just go to src/tools/kerneldoc/subsys/ an run “make” to get a list of tar­gets to build. As an exam­ple, “make dev_sound” will gen­er­ate the HTML ver­sion for the sound sys­tem, “make pdf-dev_sound” gen­er­ates the PDF ver­sion. The sound sys­tem is prob­a­bly the most “nice” exam­ple, as it includes a page with TODO items, and has even some real API docs instead of just the call-graphs and such auto­mat­i­cally gen­er­ated information.

Some dri­vers already have (some) doxy­gen markup (I did just a quick grep for ‘/*[*!]’ to detect doxy­gen markup indi­ca­tors, no idea about the cov­er­age or qual­ity), namely:

There is more doc­u­men­ta­tion than only for those dri­vers, I just listed those as there are at least parts of doxy­gen doc­u­men­ta­tion inside.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Jan
18

How to fix FreeBSD for cor­po­ra­tions (and user with smaller instal­la­tions), the tools-viewpoint

There is a huge dis­cus­sion going on on hackers@ about how FreeBSD is not suit­able for large instal­la­tions (any­more?). As of this writ­ing, the dis­cus­sion seems to get some discussion-clusters. We have some sub-topics which could lead to some good improvements.

One subtopic is the release engi­neer­ing. Some changes like a more guided approach of what should be merged to which branch, the fre­quency of releases and maybe some kind of long-term-branch(es). There is some dis­cus­sion to get maybe some joined-funding in some way from inter­ested par­ties to pay some­one to take care about long-term-branch(es).

Another subtopic is the way bugs are han­dled in our old bug­track­ing soft­ware and how patches go unno­ticed there.

And both of them are con­nected (parts more, parts less) by what can be done in a vol­un­teer project.

To me it looks like the pro­pos­als “just” need some refine­ments and some “vol­un­teers” to put value (this means man power and/or money) to what they said.

What I want to dis­cuss here is, how tools could help with mak­ing PRs/patches more vis­i­ble to devel­op­ers (there is already the pos­si­bil­ity to get emails from the small bugbuster-team about patches in PR data­base, but you have to ask them to get them) and how to make it more easy to get patches into FreeBSD.

Mak­ing bugs more vis­i­ble to developers

The obvi­ous first: We need a dif­fer­ent bug­track­ing sys­tem. We already know about it. There is (or was…) even some­one work­ing IIRC on an eval­u­a­tion of what could be done and how easy/hard it would be. I am not aware of any out­come, despite the fact that it is months (or even a year) since this was announced. I do not blame any­one here, I would like to get time to fin­ish some FreeBSD vol­un­teer work myself.

In my opin­ion this needs to be han­dled in a com­mer­cial way. Some­one needs to be offi­cially paid (with a dead­line) to pro­duce a result. Unfor­tu­nately there is the prob­lem that the require­ments are in a way, that peo­ple do not have to change their workflows/procedures.

IIRC peo­ple ask that they should be able to send a mail to the bug­tracker with­out the need for authen­ti­ca­tion. Per­son­ally I think the bug­track­ing issue is in a state where we need to change our workflows/procedures. It is con­ve­nient to get mails from the bug­tracker and only have to reply to the mail to add some­thing. On the other hand, if I report bugs some­where, and if I really care about the prob­lem res­o­lu­tion, I am will­ing login to what­ever inter­face to get this damn prob­lem solved.

Send­ing a prob­lem report from the sys­tem where I have the issue in an easy way is a very use­ful fea­ture. Cur­rently we have send-pr for this and it uses emails. This means it requires a work­ing email setup. As an user I do not care if the tool uses email or HTTP or HTTPS, I just want to have an easy way to sub­mit the prob­lem. I would not mind if I first have to do a “send-problem reg­is­ter me@tld” (once), “send-problem login me@tld” (once per system+user I want to send from) and then maybe a “send-problem tem­plate write_template_here.txt” (to get some tem­plate text to fill out), edit the tem­plate file and then run “send-problem send my_report.txt file1 file2 …”. That would be a dif­fer­ent work­flow, but still easy.

Email noti­fi­ca­tions are surely needed, but if I really care about a prob­lem, I can be both­ered to reg­is­ter first. So in my opin­ion, we need a dif­fer­ent bug­tracker des­per­ately enough that we need to drop our require­ments regard­ing our cur­rent workflow/procedures (even if it means we can not get a com­mand line way of sub­mit­ting bugs at all). The pri­mary goal of the soft­ware needs to be to make it easy to track and resolve bugs. The sub­mis­sion of bugs shall be not hard too. If I look at the state of the world as it is ATM, I would say a webin­ter­face with authen­ti­ca­tion is not a big bur­den to take if I really want to get my prob­lem fixed. Some com­mand line tool would be nice to have, but regard­ing the cur­rent state of our bug­tracker it needs to be optional instead of a hard requirement.

Apart from mak­ing it easy to track and resolve prob­lems, the soft­ware also needs to be able to make us aware of the biggest prob­lems. Now… you may ask what is a big prob­lem. Well… IMO it does not mat­ter to you what I think is big or small here. The per­son with a prob­lem needs to decide what is a big prob­lem to him. And peo­ple with the same prob­lem need to be able to tell that it is also a big prob­lem for them. So a fea­ture which allows to “vote” or “+1″ or “AOL” (or how­ever you want to call it) would allow to let users with prob­lems voice their opin­ion upon the rel­e­vance of the prob­lem to our user­base. This also means there needs to be a way to see the high­est voted prob­lems. An auto­matic mail would be best, but as above this is optional. If I as a devel­oper really care about this, I can be both­ered to login to a webin­ter­face (or maybe some­one vol­un­teers to make a copy & paste and send a mail… we need to be will­ing to rethink our procedures).

Get­ting patches more easy into a FreeBSD branch

It looks to me that this topic is requires a lit­tle bit more involve­ment from mul­ti­ple tools. In my opin­ion we need to switch to a dis­trib­uted ver­sion con­trol sys­tem. One which allows to eas­ily cre­ate my own branch of FreeBSD on my own hard­ware, and which allows to let other users use my branch eas­ily (if I want to allow other to branch from my branch). It also needs to be able to let me push my changes towards FreeBSD. Obvi­ously not directly into the offi­cial sources, but into some kind of stag­ing area. Other peo­ple should be able to have a look at this stag­ing area and be able to review what I did. They need to be able to make some com­ments for oth­ers to see, or give some kind of (multi-dimensional?-)rating for the patch (code qual­ity / works for me / does not work / …). Based upon the review/rating and maybe some auto­mated eval­u­a­tion (com­pile test / regres­sion test / bench­mark run) a com­mit­ter could push the patch into the offi­cial FreeBSD tree (ideal would be some auto­mated noti­fi­ca­tion sys­tem, a push but­ton solu­tion for inte­gra­tion and so on, but as above we should not be afraid if we do not get all the bells and whistles).

If we would have some­thing like this in place, cre­at­ing some kind of long-term-release branch could be used more eas­ily in a colab­o­ra­tive man­ner. Com­pa­nies which use the same long-term-release branch could sub­mit their back­ports of fixes/features this way. They also could see if sim­i­lar branches (there could be related but dif­fer­ent branches, like 9.4–secu­rity–fixes-only <= 9.4-official-errata-only <= 9.4-bugfixes <= 9.4-bugfixes-and-driverupdates <= …) could be merged to their in-house branch (and maybe con­se­quently push-back to the offi­cial branch they branched from if the patch comes from a dif­fer­ent branch).

It does not mat­ter here if we would cre­ate a fixed set of branches for each release, or if we only cre­ate some special-purpose branches based upon the phase of the moon (ide­ally we would cre­ate a lot of branches for every release, companies/users can cherry pick/submit what they want, and the sta­tus of a long-term-branch is solely based upon the inflow of patches and not by what the secu­rity team or release man­ager or a ran­dom devel­oper thinks it should be… but the real­ity will prob­a­bly be some­where in the middle).

I do not know if tools exists to make all this hap­pen, or which tools could be put together to make it hap­pen. I also did not men­tion on pur­pose tools I am aware of which already pro­vide (small) parts of this. These are just some ideas to think about. Inter­ested par­ties are invited to join the dis­cus­sion on hackers@ (which is far away from dis­cussing spe­cific tools or fea­tures), but you are also free to add some com­ments here.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,