Strange per­for­mance prob­lem with the IBM HTTP Serv­er (mod­i­fied apache)

Recent­ly we had a strange per­for­mance prob­lem at work. A web appli­ca­tion was hav­ing slow response times from time to time and users com­plained. We did not see an uncom­mon CPU/mem/swap usage on any involved machine. I gen­er­at­ed heat-maps from per­for­mance mea­sure­ments and there where no obvi­ous traces of slow behav­ior. We did not find any rea­son why the appli­ca­tion should be slow for clients, but obvi­ous­ly it was.

Then some­one men­tioned two recent apache DoS prob­lems. Num­ber one – the cook­ie hash issue – did not seem to be the cause, we did not see a huge CPU or mem­o­ry con­sump­tion which we would expect to see with such an attack. The sec­ond one – the slow reads prob­lem (no max con­nec­tion dura­tion time­out in apache, can be exploit­ed by a small receive win­dow for TCP) – looked like it could be an issue. The slow read DoS prob­lem can be detect­ed by look­ing at the server-status page.

What you would see on the server-status page are a lot of work­er threads in the ‘W’ (write data) state. This is sup­posed to be an indi­ca­tion of slow reads. We did see this.

As our site is behind a reverse proxy with some kind of IDS/IPS fea­ture, we took the reverse proxy out of the pic­ture to get a bet­ter view of who is doing what (we do not have X‑Forwarded-For configured).

At this point we noticed still a lot of con­nec­tion in the ‘W’ state from the rev-proxy. This was strange, it was not sup­posed to do this. After restart­ing the rev-proxy (while the clients went direct­ly to the web­servers) we had those ‘W’ entries still in the server-status. This was get­ting real­ly strange. And to add to this, the dura­tion of the ‘W’ state from the rev-proxy tells that this state is active since sev­er­al thou­sand sec­onds. Ugh. WTF?

Ok, next step: killing the offend­ers. First I ver­i­fied in the list of con­nec­tions in the server-status (extended-status is acti­vat­ed) that all work­er threads with the rev-proxy con­nec­tion of a giv­en PID are in this strange state and no client request is active. Then I killed this par­tic­u­lar PID. I want­ed to do this until I do not have those strange con­nec­tions any­more. Unfor­tu­nate­ly I arrived at PIDs which were list­ed in the server-status (even after a refresh), but not avail­able in the OS. That is bad. Very bad.

So the next step was to move all clients away from one web­serv­er, and then to reboot this web­serv­er com­plete­ly to be sure the entire sys­tem is in a known good state for future mon­i­tor­ing (the big ham­mer approach).

As we did not know if this strange state was due to some kind of mis-administration of the sys­tem or not, we decid­ed to have the rev-proxy again in front of the web­serv­er and to mon­i­tor the systems.

We sur­vived about one and a half day. After that all work­er threads on all web­servers where in this state. DoS. At this point we where sure there was some­thing mali­cious going on (some days lat­er our man­age­ment showed us a mail from a com­pa­ny which offered secu­ri­ty con­sult­ing 2 months before to make sure we do not get hit by a DDoS dur­ing the hol­i­day sea­son… a coincidence?).

Next step, ver­i­fi­ca­tion of miss­ing secu­ri­ty patch­es (unfor­tu­nate­ly it is not us who decides which patch­es we apply to the sys­tems). What we noticed is, that the rev-proxy is miss­ing a patch for a DoS prob­lem, and for the web­servers a new fix­pack was sched­uled to be released not far in the future (as of this writ­ing: it is avail­able now).

Since we applied the DoS fix for the rev-proxy, we do not have a prob­lem any­more. This is not real­ly con­clu­sive, as we do not real­ly know if this fixed the prob­lem or if the attack­er stopped attack­ing us.

From read­ing what the DoS patch fix­es, we would assume we should see some con­tin­u­ous traf­fic going on between the rev-rpoxy and the web­serv­er, but there was noth­ing when we observed the strange state.

We are still not allowed to apply patch­es as we think we should do, but at least we have a bet­ter mon­i­tor­ing in place to watch out for this par­tic­u­lar prob­lem (acti­vate the extend­ed sta­tus in apache/IHS, look for lines with state ‘W’ and a long dura­tion (col­umn ‘SS’), raise an alert if the dura­tion is high­er than the max. possible/expected/desired dura­tion for all pos­si­ble URLs).

More dri­vers avail­able in the FreeBSD-kernel doxy­gen docs

Yes­ter­day I com­mit­ted some more con­figs to gen­er­ate doxy­gen doc­u­men­ta­tion of FreeBSD-kernel dri­vers. I mechan­i­cal­ly gen­er­at­ed miss­ing con­figs for sub­di­rec­to­ries of src/sys/dev/. This means there is no depen­den­cy infor­ma­tion includ­ed in the con­figs, and as such you will not get links e.g. to the PCI doc­u­men­ta­tion, if a dri­ver calls func­tions in the PCI dri­ver (feel free to tell me about such dependencies).

If  you want to gen­er­ate the HTML or PDF ver­sion of some sub­sys­tem, just go to src/tools/kerneldoc/subsys/ an run “make” to get a list of tar­gets to build. As an exam­ple, “make dev_sound” will gen­er­ate the HTML ver­sion for the sound sys­tem, “make pdf-dev_sound” gen­er­ates the PDF ver­sion. The sound sys­tem is prob­a­bly the most “nice” exam­ple, as it includes a page with TODO items, and has even some real API docs instead of just the call-graphs and such auto­mat­i­cal­ly gen­er­at­ed information.

Some dri­vers already have (some) doxy­gen markup (I did just a quick grep for ‘/*[*!]’ to detect doxy­gen markup indi­ca­tors, no idea about the cov­er­age or qual­i­ty), namely:

There is more doc­u­men­ta­tion than only for those dri­vers, I just list­ed those as there are at least parts of doxy­gen doc­u­men­ta­tion inside.

How to fix FreeB­SD for cor­po­ra­tions (and user with small­er instal­la­tions), the tools-viewpoint

There is a huge dis­cus­sion going on on hackers@ about how FreeB­SD is not suit­able for large instal­la­tions (any­more?). As of this writ­ing, the dis­cus­sion seems to get some discussion-clusters. We have some sub-topics which could lead to some good improvements.

One subtopic is the release engi­neer­ing. Some changes like a more guid­ed approach of what should be merged to which branch, the fre­quen­cy of releas­es and maybe some kind of long-term-branch(es). There is some dis­cus­sion to get maybe some joined-funding in some way from inter­est­ed par­ties to pay some­one to take care about long-term-branch(es).

Anoth­er subtopic is the way bugs are han­dled in our old bug­track­ing soft­ware and how patch­es go unno­ticed there.

And both of them are con­nect­ed (parts more, parts less) by what can be done in a vol­un­teer project.

To me it looks like the pro­pos­als “just” need some refine­ments and some “vol­un­teers” to put val­ue (this means man pow­er and/or mon­ey) to what they said.

What I want to dis­cuss here is, how tools could help with mak­ing PRs/patches more vis­i­ble to devel­op­ers (there is already the pos­si­bil­i­ty to get emails from the small bugbuster-team about patch­es in PR data­base, but you have to ask them to get them) and how to make it more easy to get patch­es into FreeBSD.

Mak­ing bugs more vis­i­ble to developers

The obvi­ous first: We need a dif­fer­ent bug­track­ing sys­tem. We already know about it. There is (or was…) even some­one work­ing IIRC on an eval­u­a­tion of what could be done and how easy/hard it would be. I am not aware of any out­come, despite the fact that it is months (or even a year) since this was announced. I do not blame any­one here, I would like to get time to fin­ish some FreeB­SD vol­un­teer work myself.

In my opin­ion this needs to be han­dled in a com­mer­cial way. Some­one needs to be offi­cial­ly paid (with a dead­line) to pro­duce a result. Unfor­tu­nate­ly there is the prob­lem that the require­ments are in a way, that peo­ple do not have to change their workflows/procedures.

IIRC peo­ple ask that they should be able to send a mail to the bug­track­er with­out the need for authen­ti­ca­tion. Per­son­al­ly I think the bug­track­ing issue is in a state where we need to change our workflows/procedures. It is con­ve­nient to get mails from the bug­track­er and only have to reply to the mail to add some­thing. On the oth­er hand, if I report bugs some­where, and if I real­ly care about the prob­lem res­o­lu­tion, I am will­ing login to what­ev­er inter­face to get this damn prob­lem solved.

Send­ing a prob­lem report from the sys­tem where I have the issue in an easy way is a very use­ful fea­ture. Cur­rent­ly we have send-pr for this and it uses emails. This means it requires a work­ing email set­up. As an user I do not care if the tool uses email or HTTP or HTTPS, I just want to have an easy way to sub­mit the prob­lem. I would not mind if I first have to do a “send-problem reg­is­ter me@tld” (once), “send-problem login me@tld” (once per system+user I want to send from) and then maybe a “send-problem tem­plate write_template_here.txt” (to get some tem­plate text to fill out), edit the tem­plate file and then run “send-problem send my_report.txt file1 file2 …”. That would be a dif­fer­ent work­flow, but still easy.

Email noti­fi­ca­tions are sure­ly need­ed, but if I real­ly care about a prob­lem, I can be both­ered to reg­is­ter first. So in my opin­ion, we need a dif­fer­ent bug­track­er des­per­ate­ly enough that we need to drop our require­ments regard­ing our cur­rent workflow/procedures (even if it means we can not get a com­mand line way of sub­mit­ting bugs at all). The pri­ma­ry goal of the soft­ware needs to be to make it easy to track and resolve bugs. The sub­mis­sion of bugs shall be not hard too. If I look at the state of the world as it is ATM, I would say a webin­ter­face with authen­ti­ca­tion is not a big bur­den to take if I real­ly want to get my prob­lem fixed. Some com­mand line tool would be nice to have, but regard­ing the cur­rent state of our bug­track­er it needs to be option­al instead of a hard requirement.

Apart from mak­ing it easy to track and resolve prob­lems, the soft­ware also needs to be able to make us aware of the biggest prob­lems. Now… you may ask what is a big prob­lem. Well… IMO it does not mat­ter to you what I think is big or small here. The per­son with a prob­lem needs to decide what is a big prob­lem to him. And peo­ple with the same prob­lem need to be able to tell that it is also a big prob­lem for them. So a fea­ture which allows to “vote” or “+1” or “AOL” (or how­ev­er you want to call it) would allow to let users with prob­lems voice their opin­ion upon the rel­e­vance of the prob­lem to our user­base. This also means there needs to be a way to see the high­est vot­ed prob­lems. An auto­mat­ic mail would be best, but as above this is option­al. If I as a devel­op­er real­ly care about this, I can be both­ered to login to a webin­ter­face (or maybe some­one vol­un­teers to make a copy & paste and send a mail… we need to be will­ing to rethink our procedures).

Get­ting patch­es more easy into a FreeB­SD branch

It looks to me that this top­ic is requires a lit­tle bit more involve­ment from mul­ti­ple tools. In my opin­ion we need to switch to a dis­trib­uted ver­sion con­trol sys­tem. One which allows to eas­i­ly cre­ate my own branch of FreeB­SD on my own hard­ware, and which allows to let oth­er users use my branch eas­i­ly (if I want to allow oth­er to branch from my branch). It also needs to be able to let me push my changes towards FreeB­SD. Obvi­ous­ly not direct­ly into the offi­cial sources, but into some kind of stag­ing area. Oth­er peo­ple should be able to have a look at this stag­ing area and be able to review what I did. They need to be able to make some com­ments for oth­ers to see, or give some kind of (multi-dimensional?-)rating for the patch (code qual­i­ty / works for me / does not work / …). Based upon the review/rating and maybe some auto­mat­ed eval­u­a­tion (com­pile test / regres­sion test / bench­mark run) a com­mit­ter could push the patch into the offi­cial FreeB­SD tree (ide­al would be some auto­mat­ed noti­fi­ca­tion sys­tem, a push but­ton solu­tion for inte­gra­tion and so on, but as above we should not be afraid if we do not get all the bells and whistles).

If we would have some­thing like this in place, cre­at­ing some kind of long-term-release branch could be used more eas­i­ly in a colab­o­ra­tive man­ner. Com­pa­nies which use the same long-term-release branch could sub­mit their back­ports of fixes/features this way. They also could see if sim­i­lar branch­es (there could be relat­ed but dif­fer­ent branch­es, like 9.4‑security-fixes-only <= 9.4‑official-errata-only <= 9.4‑bugfixes <= 9.4‑bugfixes-and-driverupdates <= …) could be merged to their in-house branch (and maybe con­se­quent­ly push-back to the offi­cial branch they branched from if the patch comes from a dif­fer­ent branch).

It does not mat­ter here if we would cre­ate a fixed set of branch­es for each release, or if we only cre­ate some special-purpose branch­es based upon the phase of the moon (ide­al­ly we would cre­ate a lot of branch­es for every release, companies/users can cher­ry pick/submit what they want, and the sta­tus of a long-term-branch is sole­ly based upon the inflow of patch­es and not by what the secu­ri­ty team or release man­ag­er or a ran­dom devel­op­er thinks it should be… but the real­i­ty will prob­a­bly be some­where in the middle).

I do not know if tools exists to make all this hap­pen, or which tools could be put togeth­er to make it hap­pen. I also did not men­tion on pur­pose tools I am aware of which already pro­vide (small) parts of this. These are just some ideas to think about. Inter­est­ed par­ties are invit­ed to join the dis­cus­sion on hackers@ (which is far away from dis­cussing spe­cif­ic tools or fea­tures), but you are also free to add some com­ments here.