Today I have read an interesting investigation and problem analysis from Jim Gettys.
It is a set of articles he wrote over several months and is not finished writing as of this writing (if you are deeply interested in it go and read them, the most interesting ones are from December and January and the comments to the articles are also contributing to the big picture). Basically he is telling that a lot of network problems users at home (with ADSL/cable or WLAN) experience are because buffers in the network hardware or in operating systems are too big. He also proposes workarounds until this problem is attacked by OS vendors and equipment manufacturers.
Basically he is telling the network congestion algorithms can not do their work good, because the network buffers which are too big come into the way of their work (not reporting packet loss timely enough respectively try to not lose packets in situations where packet loss would be better because it would trigger action in the congestion algorithms).
He investigated the behavior of Linux, OS X and Windows (the system he had available). I wanted to have a quick look at the situation in FreeBSD regarding this, but it seems at least with my network card I am not able to see/find the corresponding size of the buffers in drivers in 30 seconds.
I think it would be very good if this issue is investigated in FreeBSD, and apart from maybe taking some action in the source also write some section for the handbook which explains the issue (one problem here is, that there are situations where you want/need to have such big buffers and as such we can not just downsize them) and how to benchmark and tune this.
Unfortunately I even have too much on my plate to even further look into this. 🙁 I hope one of the network people in FreeBSD is picking up the ball and starts playing.
From if_rl it looks like it’s hardcoded for each driver. if_rlreg.h says the 8169 supports up to 1024 but we set RL_8169_TX_DESC_CNT to 256.
Based upon the experiments Jim Gettys did in the LAN and WLAN, it looks like it would be better to make this configurable.
At least on the TCP level buffers are self-tuning. You can configure a maximum buffer size and an increment step (net.inet.tcp.sendbuf* and net.inet.tcp.recbuf*), which I presume is also the minimal size (at least that’s the way I’d have implemented something like this).
I increased the max buffer size to gap disconnected periods with a 3g connection, while travelling on a train.
I think driver buffers are only relevant if they are not much smaller than the TCP buffer. I guess UDP packages (which I’d use for realtime streaming) are only subject to driver buffering, so I see why this discussion is relevant for realtime applications.
I expect fixed buffers are chosen to allow full saturation. How much work would it be to make each buffer self-tuning? How much if there was a common framework for this task in the kernel?
Back to the TCP level, I think one problem I see is that there is one TCP stack per machine/jail. A now fixed bug in the wpi driver caused package loss on other interfaces, like lo0. Effectively making all X applications die. That such a thing is possible is ridiculous.
An other implication is that my enormous 3g caused buffers also effect other interfaces like LAN or the local interface, which is neither necessary nor desired.
I understand it was a lot of work to give jails their own stack, but I wonder, wouldn’t it be better to have one stack per interface/alias. Of course that would necessitate an abstraction layer that distributes TCP requests to the right TCP stacks.
To get a seperate network stack per jail you need to wait for the VIMAGE work to be production ready (a lot of code is already in 9‑current).
I know about send-/recvspace, and that it is auto-tuning (in linux this seems to be a fixed interface specific setting, while in FreeBSD it is a global option but auto-adapting).
The size of driver buffers is what this posting is about, and I have confirmation that it is not configurable. Making it configurable at run-time is a major task (some Intel drivers are already prepared as they share code with the Linux driver), I was told. Making it a boot-time tunable could be feasible, but our “NIC guru” does not know how much free time he can invest into this. He wants to take care about this at least in new drivers he develops.