Recently we had a strange performance problem at work. A web application was having slow response times from time to time and users complained. We did not see an uncommon CPU/mem/swap usage on any involved machine. I generated heat-maps from performance measurements and there where no obvious traces of slow behavior. We did not find any reason why the application should be slow for clients, but obviously it was.
Then someone mentioned two recent apache DoS problems. Number one — the cookie hash issue — did not seem to be the cause, we did not see a huge CPU or memory consumption which we would expect to see with such an attack. The second one — the slow reads problem (no max connection duration timeout in apache, can be exploited by a small receive window for TCP) — looked like it could be an issue. The slow read DoS problem can be detected by looking at the server-status page.
What you would see on the server-status page are a lot of worker threads in the ‘W’ (write data) state. This is supposed to be an indication of slow reads. We did see this.
As our site is behind a reverse proxy with some kind of IDS/IPS feature, we took the reverse proxy out of the picture to get a better view of who is doing what (we do not have X-Forwarded-For configured).
At this point we noticed still a lot of connection in the ‘W’ state from the rev-proxy. This was strange, it was not supposed to do this. After restarting the rev-proxy (while the clients went directly to the webservers) we had those ‘W’ entries still in the server-status. This was getting really strange. And to add to this, the duration of the ‘W’ state from the rev-proxy tells that this state is active since several thousand seconds. Ugh. WTF?
Ok, next step: killing the offenders. First I verified in the list of connections in the server-status (extended-status is activated) that all worker threads with the rev-proxy connection of a given PID are in this strange state and no client request is active. Then I killed this particular PID. I wanted to do this until I do not have those strange connections anymore. Unfortunately I arrived at PIDs which were listed in the server-status (even after a refresh), but not available in the OS. That is bad. Very bad.
So the next step was to move all clients away from one webserver, and then to reboot this webserver completely to be sure the entire system is in a known good state for future monitoring (the big hammer approach).
As we did not know if this strange state was due to some kind of mis-administration of the system or not, we decided to have the rev-proxy again in front of the webserver and to monitor the systems.
We survived about one and a half day. After that all worker threads on all webservers where in this state. DoS. At this point we where sure there was something malicious going on (some days later our management showed us a mail from a company which offered security consulting 2 months before to make sure we do not get hit by a DDoS during the holiday season… a coincidence?).
Next step, verification of missing security patches (unfortunately it is not us who decides which patches we apply to the systems). What we noticed is, that the rev-proxy is missing a patch for a DoS problem, and for the webservers a new fixpack was scheduled to be released not far in the future (as of this writing: it is available now).
Since we applied the DoS fix for the rev-proxy, we do not have a problem anymore. This is not really conclusive, as we do not really know if this fixed the problem or if the attacker stopped attacking us.
From reading what the DoS patch fixes, we would assume we should see some continuous traffic going on between the rev-rpoxy and the webserver, but there was nothing when we observed the strange state.
We are still not allowed to apply patches as we think we should do, but at least we have a better monitoring in place to watch out for this particular problem (activate the extended status in apache/IHS, look for lines with state ‘W’ and a long duration (column ‘SS’), raise an alert if the duration is higher than the max. possible/expected/desired duration for all possible URLs).
Tags: dos problem
, dos problems
, memory consumption
, performance measurements
, performance problem
, proxy connection
, reverse proxy
, slow response times
, swap usage
, worker threads
Yesterday I committed some more configs to generate doxygen documentation of FreeBSD-kernel drivers. I mechanically generated missing configs for subdirectories of src/sys/dev/. This means there is no dependency information included in the configs, and as such you will not get links e.g. to the PCI documentation, if a driver calls functions in the PCI driver (feel free to tell me about such dependencies).
If you want to generate the HTML or PDF version of some subsystem, just go to src/tools/kerneldoc/subsys/ an run “make” to get a list of targets to build. As an example, “make dev_sound” will generate the HTML version for the sound system, “make pdf-dev_sound” generates the PDF version. The sound system is probably the most “nice” example, as it includes a page with TODO items, and has even some real API docs instead of just the call-graphs and such automatically generated information.
Some drivers already have (some) doxygen markup (I did just a quick grep for ‘/*[*!]’ to detect doxygen markup indicators, no idea about the coverage or quality), namely:
There is more documentation than only for those drivers, I just listed those as there are at least parts of doxygen documentation inside.
, freebsd kernel
, kernel drivers
, pci driver
There is a huge discussion going on on hackers@ about how FreeBSD is not suitable for large installations (anymore?). As of this writing, the discussion seems to get some discussion-clusters. We have some sub-topics which could lead to some good improvements.
One subtopic is the release engineering. Some changes like a more guided approach of what should be merged to which branch, the frequency of releases and maybe some kind of long-term-branch(es). There is some discussion to get maybe some joined-funding in some way from interested parties to pay someone to take care about long-term-branch(es).
Another subtopic is the way bugs are handled in our old bugtracking software and how patches go unnoticed there.
And both of them are connected (parts more, parts less) by what can be done in a volunteer project.
To me it looks like the proposals “just” need some refinements and some “volunteers” to put value (this means man power and/or money) to what they said.
What I want to discuss here is, how tools could help with making PRs/patches more visible to developers (there is already the possibility to get emails from the small bugbuster-team about patches in PR database, but you have to ask them to get them) and how to make it more easy to get patches into FreeBSD.
Making bugs more visible to developers
The obvious first: We need a different bugtracking system. We already know about it. There is (or was…) even someone working IIRC on an evaluation of what could be done and how easy/hard it would be. I am not aware of any outcome, despite the fact that it is months (or even a year) since this was announced. I do not blame anyone here, I would like to get time to finish some FreeBSD volunteer work myself.
In my opinion this needs to be handled in a commercial way. Someone needs to be officially paid (with a deadline) to produce a result. Unfortunately there is the problem that the requirements are in a way, that people do not have to change their workflows/procedures.
IIRC people ask that they should be able to send a mail to the bugtracker without the need for authentication. Personally I think the bugtracking issue is in a state where we need to change our workflows/procedures. It is convenient to get mails from the bugtracker and only have to reply to the mail to add something. On the other hand, if I report bugs somewhere, and if I really care about the problem resolution, I am willing login to whatever interface to get this damn problem solved.
Sending a problem report from the system where I have the issue in an easy way is a very useful feature. Currently we have send-pr for this and it uses emails. This means it requires a working email setup. As an user I do not care if the tool uses email or HTTP or HTTPS, I just want to have an easy way to submit the problem. I would not mind if I first have to do a “send-problem register me@tld” (once), “send-problem login me@tld” (once per system+user I want to send from) and then maybe a “send-problem template write_template_here.txt” (to get some template text to fill out), edit the template file and then run “send-problem send my_report.txt file1 file2 …”. That would be a different workflow, but still easy.
Email notifications are surely needed, but if I really care about a problem, I can be bothered to register first. So in my opinion, we need a different bugtracker desperately enough that we need to drop our requirements regarding our current workflow/procedures (even if it means we can not get a command line way of submitting bugs at all). The primary goal of the software needs to be to make it easy to track and resolve bugs. The submission of bugs shall be not hard too. If I look at the state of the world as it is ATM, I would say a webinterface with authentication is not a big burden to take if I really want to get my problem fixed. Some command line tool would be nice to have, but regarding the current state of our bugtracker it needs to be optional instead of a hard requirement.
Apart from making it easy to track and resolve problems, the software also needs to be able to make us aware of the biggest problems. Now… you may ask what is a big problem. Well… IMO it does not matter to you what I think is big or small here. The person with a problem needs to decide what is a big problem to him. And people with the same problem need to be able to tell that it is also a big problem for them. So a feature which allows to “vote” or “+1″ or “AOL” (or however you want to call it) would allow to let users with problems voice their opinion upon the relevance of the problem to our userbase. This also means there needs to be a way to see the highest voted problems. An automatic mail would be best, but as above this is optional. If I as a developer really care about this, I can be bothered to login to a webinterface (or maybe someone volunteers to make a copy & paste and send a mail… we need to be willing to rethink our procedures).
Getting patches more easy into a FreeBSD branch
It looks to me that this topic is requires a little bit more involvement from multiple tools. In my opinion we need to switch to a distributed version control system. One which allows to easily create my own branch of FreeBSD on my own hardware, and which allows to let other users use my branch easily (if I want to allow other to branch from my branch). It also needs to be able to let me push my changes towards FreeBSD. Obviously not directly into the official sources, but into some kind of staging area. Other people should be able to have a look at this staging area and be able to review what I did. They need to be able to make some comments for others to see, or give some kind of (multi-dimensional?-)rating for the patch (code quality / works for me / does not work / …). Based upon the review/rating and maybe some automated evaluation (compile test / regression test / benchmark run) a committer could push the patch into the official FreeBSD tree (ideal would be some automated notification system, a push button solution for integration and so on, but as above we should not be afraid if we do not get all the bells and whistles).
If we would have something like this in place, creating some kind of long-term-release branch could be used more easily in a colaborative manner. Companies which use the same long-term-release branch could submit their backports of fixes/features this way. They also could see if similar branches (there could be related but different branches, like 9.4–security–fixes-only <= 9.4-official-errata-only <= 9.4-bugfixes <= 9.4-bugfixes-and-driverupdates <= …) could be merged to their in-house branch (and maybe consequently push-back to the official branch they branched from if the patch comes from a different branch).
It does not matter here if we would create a fixed set of branches for each release, or if we only create some special-purpose branches based upon the phase of the moon (ideally we would create a lot of branches for every release, companies/users can cherry pick/submit what they want, and the status of a long-term-branch is solely based upon the inflow of patches and not by what the security team or release manager or a random developer thinks it should be… but the reality will probably be somewhere in the middle).
I do not know if tools exists to make all this happen, or which tools could be put together to make it happen. I also did not mention on purpose tools I am aware of which already provide (small) parts of this. These are just some ideas to think about. Interested parties are invited to join the discussion on hackers@ (which is far away from discussing specific tools or features), but you are also free to add some comments here.
Tags: bugtracking system
, interested parties
, man power
, release engineering
, volunteer project