Strange per­form­ance prob­lem with the IBM HTTP Serv­er (mod­i­fied apache)

Re­cently we had a strange per­form­ance prob­lem at work. A web ap­plic­a­tion was hav­ing slow re­sponse times from time to time and users com­plained. We did not see an un­com­mon CPU/​mem/​swap us­age on any in­volved ma­chine. I gen­er­ated heat-​maps from per­form­ance meas­ure­ments and there where no ob­vi­ous traces of slow be­ha­vi­or. We did not find any reas­on why the ap­plic­a­tion should be slow for cli­ents, but ob­vi­ously it was.

Then someone men­tioned two re­cent apache DoS prob­lems. Num­ber one – the cook­ie hash is­sue – did not seem to be the cause, we did not see a huge CPU or memory con­sump­tion which we would ex­pect to see with such an at­tack. The second one – the slow reads prob­lem (no max con­nec­tion dur­a­tion timeout in apache, can be ex­ploited by a small re­ceive win­dow for TCP) – looked like it could be an is­sue. The slow read DoS prob­lem can be de­tec­ted by look­ing at the server-​status page.

What you would see on the server-​status page are a lot of work­er threads in the ‘W’ (write data) state. This is sup­posed to be an in­dic­a­tion of slow reads. We did see this.

As our site is be­hind a re­verse proxy with some kind of IDS/​IPS fea­ture, we took the re­verse proxy out of the pic­ture to get a bet­ter view of who is do­ing what (we do not have X-​Forwarded-​For con­figured).

At this point we no­ticed still a lot of con­nec­tion in the ‘W’ state from the rev-​proxy. This was strange, it was not sup­posed to do this. After re­start­ing the rev-​proxy (while the cli­ents went dir­ectly to the web­serv­ers) we had those ‘W’ entries still in the server-​status. This was get­ting really strange. And to add to this, the dur­a­tion of the ‘W’ state from the rev-​proxy tells that this state is act­ive since sev­er­al thou­sand seconds. Ugh. WTF?

Ok, next step: killing the of­fend­ers. First I veri­fied in the list of con­nec­tions in the server-​status (extended-​status is ac­tiv­ated) that all work­er threads with the rev-​proxy con­nec­tion of a giv­en PID are in this strange state and no cli­ent re­quest is act­ive. Then I killed this par­tic­u­lar PID. I wanted to do this un­til I do not have those strange con­nec­tions any­more. Un­for­tu­nately I ar­rived at PIDs which were lis­ted in the server-​status (even after a re­fresh), but not avail­able in the OS. That is bad. Very bad.

So the next step was to move all cli­ents away from one web­serv­er, and then to re­boot this web­serv­er com­pletely to be sure the en­tire sys­tem is in a known good state for fu­ture mon­it­or­ing (the big ham­mer ap­proach).

As we did not know if this strange state was due to some kind of mis-​administration of the sys­tem or not, we de­cided to have the rev-​proxy again in front of the web­serv­er and to mon­it­or the sys­tems.

We sur­vived about one and a half day. After that all work­er threads on all web­serv­ers where in this state. DoS. At this point we where sure there was some­thing ma­li­cious go­ing on (some days later our man­age­ment showed us a mail from a com­pany which offered se­cur­ity con­sult­ing 2 months be­fore to make sure we do not get hit by a DDoS dur­ing the hol­i­day sea­son… a co­in­cid­ence?).

Next step, veri­fic­a­tion of miss­ing se­cur­ity patches (un­for­tu­nately it is not us who de­cides which patches we ap­ply to the sys­tems). What we no­ticed is, that the rev-​proxy is miss­ing a patch for a DoS prob­lem, and for the web­serv­ers a new fix­pack was sched­uled to be re­leased not far in the fu­ture (as of this writ­ing: it is avail­able now).

Since we ap­plied the DoS fix for the rev-​proxy, we do not have a prob­lem any­more. This is not really con­clus­ive, as we do not really know if this fixed the prob­lem or if the at­tack­er stopped at­tack­ing us.

From read­ing what the DoS patch fixes, we would as­sume we should see some con­tinu­ous traffic go­ing on between the rev-​rpoxy and the web­serv­er, but there was noth­ing when we ob­served the strange state.

We are still not al­lowed to ap­ply patches as we think we should do, but at least we have a bet­ter mon­it­or­ing in place to watch out for this par­tic­u­lar prob­lem (ac­tiv­ate the ex­ten­ded status in apache/​IHS, look for lines with state ‘W’ and a long dur­a­tion (column ‘SS’), raise an alert if the dur­a­tion is high­er than the max. possible/​expected/​desired dur­a­tion for all pos­sible URLs).

More drivers avail­able in the FreeBSD-​kernel doxy­gen docs

Yes­ter­day I com­mit­ted some more con­figs to gen­er­ate doxy­gen doc­u­ment­a­tion of FreeBSD-​kernel drivers. I mech­an­ic­ally gen­er­ated miss­ing con­figs for sub­dir­ect­or­ies of src/​sys/​dev/​. This means there is no de­pend­ency in­form­a­tion in­cluded in the con­figs, and as such you will not get links e.g. to the PCI doc­u­ment­a­tion, if a driver calls func­tions in the PCI driver (feel free to tell me about such de­pend­en­cies).

If  you want to gen­er­ate the HTML or PDF ver­sion of some sub­sys­tem, just go to src/​tools/​kerneldoc/​subsys/​ an run “make” to get a list of tar­gets to build. As an ex­ample, “make dev_​sound” will gen­er­ate the HTML ver­sion for the sound sys­tem, “make pdf-​dev_​sound” gen­er­ates the PDF ver­sion. The sound sys­tem is prob­ably the most “nice” ex­ample, as it in­cludes a page with TODO items, and has even some real API docs in­stead of just the call-​graphs and such auto­mat­ic­ally gen­er­ated in­form­a­tion.

Some drivers already have (some) doxy­gen markup (I did just a quick grep for „/​*[*!]“ to de­tect doxy­gen markup in­dic­at­ors, no idea about the cov­er­age or qual­ity), namely:

There is more doc­u­ment­a­tion than only for those drivers, I just lis­ted those as there are at least parts of doxy­gen doc­u­ment­a­tion in­side.

How to fix FreeBSD for cor­por­a­tions (and user with smal­ler in­stall­a­tions), the tools-​viewpoint

There is a huge dis­cus­sion go­ing on on hackers@ about how FreeBSD is not suit­able for large in­stall­a­tions (any­more?). As of this writ­ing, the dis­cus­sion seems to get some discussion-​clusters. We have some sub-​topics which could lead to some good im­prove­ments.

One sub­top­ic is the re­lease en­gin­eer­ing. Some changes like a more guided ap­proach of what should be merged to which branch, the fre­quency of re­leases and maybe some kind of long-term-branch(es). There is some dis­cus­sion to get maybe some joined-​funding in some way from in­ter­ested parties to pay someone to take care about long-term-branch(es).

An­oth­er sub­top­ic is the way bugs are handled in our old bugtrack­ing soft­ware and how patches go un­noticed there.

And both of them are con­nec­ted (parts more, parts less) by what can be done in a vo­lun­teer pro­ject.

To me it looks like the pro­pos­als “just” need some re­fine­ments and some “vo­lun­teers” to put value (this means man power and/​or money) to what they said.

What I want to dis­cuss here is, how tools could help with mak­ing PRs/​patches more vis­ible to de­velopers (there is already the pos­sib­il­ity to get emails from the small bugbuster-​team about patches in PR data­base, but you have to ask them to get them) and how to make it more easy to get patches in­to FreeBSD.

Mak­ing bugs more vis­ible to de­velopers

The ob­vi­ous first: We need a dif­fer­ent bugtrack­ing sys­tem. We already know about it. There is (or was…) even someone work­ing IIRC on an eval­u­ation of what could be done and how easy/​hard it would be. I am not aware of any out­come, des­pite the fact that it is months (or even a year) since this was an­nounced. I do not blame any­one here, I would like to get time to fin­ish some FreeBSD vo­lun­teer work my­self.

In my opin­ion this needs to be handled in a com­mer­cial way. Someone needs to be of­fi­cially paid (with a dead­line) to pro­duce a res­ult. Un­for­tu­nately there is the prob­lem that the re­quire­ments are in a way, that people do not have to change their workflows/​procedures.

IIRC people ask that they should be able to send a mail to the bugtrack­er without the need for au­then­tic­a­tion. Per­son­ally I think the bugtrack­ing is­sue is in a state where we need to change our workflows/​procedures. It is con­veni­ent to get mails from the bugtrack­er and only have to reply to the mail to add some­thing. On the oth­er hand, if I re­port bugs some­where, and if I really care about the prob­lem res­ol­u­tion, I am will­ing lo­gin to whatever in­ter­face to get this damn prob­lem solved.

Send­ing a prob­lem re­port from the sys­tem where I have the is­sue in an easy way is a very use­ful fea­ture. Cur­rently we have send-​pr for this and it uses emails. This means it re­quires a work­ing email setup. As an user I do not care if the tool uses email or HTTP or HTTPS, I just want to have an easy way to sub­mit the prob­lem. I would not mind if I first have to do a “send-​problem re­gister me@tld” (once), “send-​problem lo­gin me@tld” (once per system+user I want to send from) and then maybe a “send-​problem tem­plate write_template_here.txt” (to get some tem­plate text to fill out), edit the tem­plate file and then run “send-​problem send my_report.txt file1 file2 …”. That would be a dif­fer­ent work­flow, but still easy.

Email no­ti­fic­a­tions are surely needed, but if I really care about a prob­lem, I can be bothered to re­gister first. So in my opin­ion, we need a dif­fer­ent bugtrack­er des­per­ately enough that we need to drop our re­quire­ments re­gard­ing our cur­rent workflow/​procedures (even if it means we can not get a com­mand line way of sub­mit­ting bugs at all). The primary goal of the soft­ware needs to be to make it easy to track and re­solve bugs. The sub­mis­sion of bugs shall be not hard too. If I look at the state of the world as it is ATM, I would say a webin­ter­face with au­then­tic­a­tion is not a big bur­den to take if I really want to get my prob­lem fixed. Some com­mand line tool would be nice to have, but re­gard­ing the cur­rent state of our bugtrack­er it needs to be op­tion­al in­stead of a hard re­quire­ment.

Apart from mak­ing it easy to track and re­solve prob­lems, the soft­ware also needs to be able to make us aware of the biggest prob­lems. Now… you may ask what is a big prob­lem. Well… IMO it does not mat­ter to you what I think is big or small here. The per­son with a prob­lem needs to de­cide what is a big prob­lem to him. And people with the same prob­lem need to be able to tell that it is also a big prob­lem for them. So a fea­ture which al­lows to “vote” or “+1” or “AOL” (or how­ever you want to call it) would al­low to let users with prob­lems voice their opin­ion upon the rel­ev­ance of the prob­lem to our userbase. This also means there needs to be a way to see the highest voted prob­lems. An auto­mat­ic mail would be best, but as above this is op­tion­al. If I as a de­veloper really care about this, I can be bothered to lo­gin to a webin­ter­face (or maybe someone vo­lun­teers to make a copy & paste and send a mail… we need to be will­ing to re­think our pro­ced­ures).

Get­ting patches more easy in­to a FreeBSD branch

It looks to me that this top­ic is re­quires a little bit more in­volve­ment from mul­tiple tools. In my opin­ion we need to switch to a dis­trib­uted ver­sion con­trol sys­tem. One which al­lows to eas­ily cre­ate my own branch of FreeBSD on my own hard­ware, and which al­lows to let oth­er users use my branch eas­ily (if I want to al­low oth­er to branch from my branch). It also needs to be able to let me push my changes to­wards FreeBSD. Ob­vi­ously not dir­ectly in­to the of­fi­cial sources, but in­to some kind of sta­ging area. Oth­er people should be able to have a look at this sta­ging area and be able to re­view what I did. They need to be able to make some com­ments for oth­ers to see, or give some kind of (multi-dimensional?-)rating for the patch (code qual­ity /​ works for me /​ does not work /​ …). Based upon the review/​rating and maybe some auto­mated eval­u­ation (com­pile test /​ re­gres­sion test /​ bench­mark run) a com­mit­ter could push the patch in­to the of­fi­cial FreeBSD tree (ideal would be some auto­mated no­ti­fic­a­tion sys­tem, a push but­ton solu­tion for in­teg­ra­tion and so on, but as above we should not be afraid if we do not get all the bells and whistles).

If we would have some­thing like this in place, cre­at­ing some kind of long-​term-​release branch could be used more eas­ily in a col­ab­or­at­ive man­ner. Com­pan­ies which use the same long-​term-​release branch could sub­mit their back­ports of fixes/​features this way. They also could see if sim­il­ar branches (there could be re­lated but dif­fer­ent branches, like 9.4–se­cur­ity-fixes-​only <= 9.4-official-errata-only <= 9.4-bugfixes <= 9.4-bugfixes-and-driverupdates <= …) could be merged to their in-​house branch (and maybe con­sequently push-​back to the of­fi­cial branch they branched from if the patch comes from a dif­fer­ent branch).

It does not mat­ter here if we would cre­ate a fixed set of branches for each re­lease, or if we only cre­ate some special-​purpose branches based upon the phase of the moon (ideally we would cre­ate a lot of branches for every re­lease, companies/​users can cherry pick/​submit what they want, and the status of a long-​term-​branch is solely based upon the in­flow of patches and not by what the se­cur­ity team or re­lease man­ager or a ran­dom de­veloper thinks it should be… but the real­ity will prob­ably be some­where in the middle).

I do not know if tools ex­ists to make all this hap­pen, or which tools could be put to­geth­er to make it hap­pen. I also did not men­tion on pur­pose tools I am aware of which already provide (small) parts of this. These are just some ideas to think about. In­ter­ested parties are in­vited to join the dis­cus­sion on hackers@ (which is far away from dis­cuss­ing spe­cif­ic tools or fea­tures), but you are also free to add some com­ments here.