The problem I see at work: A T4-2 with 3 guest LDOMs, virtualized disks and networks lost the complete network connectivity “out of the blue” once, and maybe “sporadic” directly after a cold boot. After a lot of discussion with Oracle, I have the impression that we have two problems here.
Total network loss of the machine (no zone or guest LDOM or the primary LDOM was able to have receive or send IP packets). This happened once. No idea how to reproduce it. In the logs we see the message “[ID 920994 kern.warning] WARNING: vnetX: exceeded number of permitted handshake attempts (5) on channel xxx”. According to Oracle this is supposed to be fixed in 148677 – 01 which will come with Solaris 10u11. They suggested to use a vsw interface instead of a vnet interface on the primary domain to at least lower the probability of this problem hitting us. They were not able to tell us how to reproduce the problem (seems to be a race condition, at least I get this impression based upon the description of the Oracle engineer handling the SR). Only a reboot helped to get the problem solved. I was told we are the only client which reported this kind of problem, the patch for this problem is based upon an internal bugreport from internal tests.
After cold boots sometimes some machines (not all) are not able to connect to an IP on the T4. A reboot helps, as does removing an interface from an aggregate and directly adding it again (see below for the system config). To try to reproduce the problem, we did a lot of warm reboots of the primary domain, and the problem never showed up. We did some cold reboots, and the problem showed up once.
In case someone else sees one of those problems on his machines too, please get in contact with me to see what we have in common to try to track this down further and to share info which may help in maybe reproducing the problems.
- T4-2 with 4 HBAs and 8 NICs (4 * igb on-board, 4 * nxge on additional network card)
- 3 guest LDOMs and one io+control domain (both in the primary domain)
- the guest LDOMs use SAN disks over the 4 HBAs
- the primary domain uses a mirrored zpool on SSDs
- 5 vswitch in the hypervisor
- 4 aggregates (aggr1 — aggr4 with L2-policy), each one with one igb and one nxge NIC
- each aggregate is connected to a separate vswitch (the 5th vswitch is for machine-internal communication)
- each guest LDOM has three vnets, each vnets connected to a vswitch (1 guest LDOM has aggr1+2 only for zones (via vnets), 2 guest LDOMs have aggr 3+4 only for zones (via vnets), and all LDOMs have aggr2+3 (via vnets) for global-zone communication, all LDOMs are additionally connected to the machine-internal-only vswitch via the 3rd vnet)
- primary domain uses 2 vnets connected to the vswitch which is connected to aggr2 and aggr3 (consistency with the other LDOMs on this machine) and has no zones
- this means each entity (primary domain, guest LDOMs and each zone) has two vnets in and those two vnets are configured in a link-based IPMP setup (vnet-linkprop=phys-state)
- each vnet has VLAN tagging configured in the hypervisor (with the zones being in different VLANs than the LDOMs)
The proposed change by Oracle is to replace the 2 vnet interfaces in the primary domain with 2 vsw interfaces (which means to do VLAN tagging in the primary domain directly instead of in the vnet config). To have IPMP working this means to have vsw-linkprop=phys-state. We have two systems with the same setup, on one system we already changed this and it is working as before. As we don’t know how to reproduce the 1st problem, we don’t know if the problem is fixed or not, respectively what the probability is to get hit again by this problem.
Ideas / suggestions / info welcome.
GD Star Rating
GD Star Rating
In mid-April a woman from the marketing department of No Starch Press contacted me and asked if I am interested to do a public review of the FreeBSD Device Drivers book by Joseph Kong (no link to a book shop, go and have a look in your preferred one). Just this simple question, no strings attached.
I had my nose in some device drivers in the past, but I never wrote one, and never had a look at the big picture. I was interested to know how everything fits together, so this made me a good victim for a review (novice enough to learn something new and to have a look if enough is explained, and experienced enough to understand what is going on in the FreeBSD kernel).
Some minutes after I agreed to review it (but with a little notice that I do not know how long I need to review it), I had the PDF version of the book. That was faster than I expected (maybe I am too old-school and used to have paper versions of books in my hands).
Let the review begin… but bear with me, this is the first time I do a real public review of a book (instead of a technical review for an author). And as this is my very own personal opinion, I will not allow comments here. This page is all about my opinion while reading the book, questions I have while reading the book shall serve as a hint about the quality of the book and they should be answered in the book, not here.
In short, the book is not perfect, but it is a good book. There is room for improvement, but on a very high level. If you want to write a device driver for FreeBSD, this book is a must. I suggest to read it completely, even chapters which do not belong to the type of driver you want to write (specially the case studies of real drivers). The reason is that each chapter has some notes which may not only apply to the chapter in question, but to all kinds of device drivers. The long review follows now.
The first chapter is titled “Building and running modules”. The author begins with description of the usual device driver types (NIC driver, pseudo-device, …) and how they can be added to the kernel (statically linked in or as a module). The first code example is a small and easy kernel module, so that we do not have to reboot the system we use to develop a driver (except we make a fault during driver development which causes the machine to panic or hang). Every part of the example is well explained. This is followed by an overview about character devices (e.g. disks) and a simple character-device driver (so far a pseudo-device, as we do not have real hardware we access) which is not only as-well explained as the module-example, but there is also a note where the code was simplified and what should be done instead.
After reading this chapter you should be able to write your own kernel module in 5 minutes (well, after 5 minutes it will not be able to do a lot — just a “hello world” – but at least you can already load/unload/execute some code into/from/in the kernel).
I have not tried any example myself, but I compiled a lot of modules and drivers I modified in the past and remember to have seen the described parts.
The second chapter explains how to allocate and free memory in the kernel. There is the possibility to allocate maybe-contiguous memory (the normal case, when your hardware does not do DMA or does not have the requirement that the memory region it makes DMA from/too needs to be contiguous), and really contiguous. For the size argument of the freeing of the the contiguous memory there is the sentence “Generally, size should be equal the amount allocated.”. Immediately I wanted to know what happens if you specify a different size (as a non-native english speaker I understand this sentence in a way that I am allowed to specify a different size and as such are able to free only parts of the allocated memory). Unfortunately this is not answered. I had a look into the source, the kernel frees memory pages, so the size argument (and addr argument) will be rounded to include a full page. This means theoretically I am able to free parts of the allocated memory, but this is a source-maintenance nightmare (needs knowledge about the machine specific page boundaries and you need to make sure that you do the absolutely correct size calculations). To me this looks more like as long as nobody is pointing a gun at my head and tells me to use a different size, specifying the same size as made during the allocation of this memory region is the way to go.
After reading this chapter you should know how to kill the system by allocating all the RAM in the kernel.
Again, I did not try to compile the examples in this chapter, but the difference of the memory allocation in the kernel compared with memory allocation in the userland is not that big.
The third chapter explains the device communication and control interfaces (ioctl/sysctl) of a driver. The ioctl part teached me some parts I always wanted to know when I touched some ioctls, but never bothered to find out before. Unfortunately this makes me a little bit nervous about the way ioctls are handled in the FreeBSD linuxulator, but this is not urgent ATM (and can probably be handled by a commend in the right place). The sysctl part takes a little bit longer to follow through, but there is also more to learn about it. If you just modify an existing driver with an existing sysctl interface, it probably just comes down to copy&paste with little modifications, but if you need to make more complex changes or want to add a sysctl interface to a driver, this part of the book is a good way to understand what is possible and how everything fits together. Personally I would have wished for a more detailed guide when to pick the ioctl interface and when the sysctl interface than what was written in the conclusion of the chapter, but it is probably not that easy to come up with a good list which fits most drivers.
After reading this chapter you should be able to get data in and out of the kernel in 10 minutes.
As before, I did not compile the examples in this chapter. I already added ioctls and sysctls in various places in the FreeBSD kernel.
Chapter 4 is about thread synchronization – mutexes, shared/exclusive locks, reader/writer locks and condition variables. For me this chapter is not as good as the previous ones. While I got a good explanation of everything, I missed a nice overview table which compares the various methods of thread synchronization. Brendan Gregg did a nice table to give an overview of DTrace variable types and when to use them. Something like this would have been nice in this chapter too. Apart from this I got all the info I need (but hey, I already wrote a NFS client for an experimental computer with more than 200000 CPUs in 1998, so I’m familiar with such synchronization primitives).
Delayed execution is explained in chapter 5. Most of the information presented there was new to me. While there where not much examples presented (there will be some in a later chapter), I got a good overview about what exists. This time there was even an overview when to use which type of delayed execution infrastructure. I would have preferred to have this overview in the beginning of the chapter, but that is maybe some kind of personal preference.
In chapter 6 a complete device driver is dissected. It is the virtual null modem terminal driver. The chapter provides real-world examples of event-handlers, callouts and taskqueues which where not demonstrated in chapter five. At the same time the chapter serves as a description of the functions a TTY driver needs to have.
Automated device detection with Newbus and the corresponding resource allocation (I/O ports, device memory and interrupts) are explained in chapter 7. It is easy… if you have a real device to play with. Unfortunately the chapter missed a paragraph or two about the suspend and resume methods. If you think about it, it is not hard to come up with what they are supposed to do, but a little explicit description of what they shall do, in what state the hardware should be put and what to assume when being called would have been nice.
Chapter 8 is about interrupts. It is easy to add an interrupt handler (or to remove one), the hard part is to generate an interrupt. The example code uses the parallel port, and the chapter also contains a little explanation how to generate an interrupt… if you are not afraid to touch real hardware (the parallel port) with a resistor.
In chapter 9 the lpt(4) driver is explained, as most of the topics discussed so far are used inside. The explanation how everything is used is good, but what I miss sometimes is why they are used. The most prominent (and only) example here for me is why are callouts used to catch stray interrupts? That callouts are a good way of handling this is clear to me, the big question is why can there be stray interrupts. Can this happen only for the parallel port (respectively a limited amount of devices), or does every driver for real interrupt driven hardware need to come with something like this? I assume this is something specific to the device, but a little explanation regarding this would have been nice.
Accessing I/O ports and I/O memory for devices are explained in chapter 10 based upon a driver for a LED device (turn on and off 2 LEDs on an ISA bus). All the functions to read and write data are well explained, just the part about the memory barrier is a little bit short. It is not clear why the CPU reordering of memory accesses matter to what looks like function calls. Those function calls may be macros, but this is not explained in the text. Some little examples when to use the barriers instead of an abstract description would also have been nice at this point.
Chapter 11 is similar to chapter 10, just that a PCI bus driver is discussed instead of an ISA bus driver. The differences are not that big, but important.
In chapter 12 it is explained how to do DMA in a driver. This part is not easy to understand. I would have wanted to have more examples and explanations of the DMA tag and DMA map parts. I am also surprised to see different supported architectures for the flags BUS_DMA_COHERENT and BUS_DMA_NOCACHE for different functions. Either this means FreeBSD is not coherent in those parts, or it is a bug in the book, or it is supposed to be like this and the reasons are not explained in the book. As there is no explicit note about this, it probably leads to confusion of readers which pay enough attention here. It would also have been nice to have an explanation when to use those flags which are only implemented on a subset of the architectures FreeBSD supports. Anyway, the explanations give enough information to understand what is going on and to be able to have a look at other device drivers for real-live examples and to get a deeper understanding of this topic.
Disk drivers and block I/O (bio) requests are described in chapter 13. With this chapter I have a little problem. The author used the word “undefined” in several places where I as a non-native speaker would have used “not set” or “set to 0″. The word “undefined” implies for me that there may be garbage inside, whereas from a technical point of view I can not imagine that some random value in those places would have the desired result. In my opinion each such place is obvious, so I do not expect that an experienced programmer would lose time/hairs/sanity over it, but inexperienced programmers which try to assemble the corresponding structures on the (uninitialized) heap (for whatever reason), may struggle with this.
Chapter 14 is about the CAM layer. While the previous chapter showed how to write a driver for a disk device, chapter 14 gave an overview about how to an HBA to the CAM layer. It is just an overview, it looks like CAM needs a book on its own to be fully described. The simple (and most important) cases are described, with the hardware-specific parts being an exercise for the person writing the device driver. I have the impression it gives enough details to let someone with hardware (or protocol), and more importantly documentation for this device, start writing a driver.
It would have been nice if chapter 13 and 14 would have had a little schematic which describes at which level of the kernel-subsystems the corresponding driver sits. And while I am at it, a schematic with all the driver components discussed in this book at the beginning as an overview, or in the end as an annex, would be great too.
An overview of USB drivers is given in chapter 15 with the USB printer driver as an example for the explanation of the USB driver interfaces. If USB would not be as complex as it is, it would be a nice chapter to start driver-writing experiments (due to the availability of various USB devices). Well… bad luck for curious people. BTW, the author gives pointers to the official USB docs, so if you are really curious, feel free to go ahead.
Chapter 16 is the first part about network drivers. It deals with ifnet (e.g. stuff needed for ifconfig), ifmedia (simplified: which kind of cable and speed is supported), mbufs and MSI(-X). As in other chapters before, a little overview and a little picture in the beginning would have been nice.
Finally, in chapter 17, the packet reception and transmission of network drivers is described. Large example code is broken up into several pieces here, for more easy discussion of related information.
One thing I miss after reaching the end of the book is a discussion of sound drivers. And this is surely not the only type of drivers which is not discussed, I can come up with crypto, firewire, gpio, watchdog, smb and iic devices within a few seconds. While I think that it is much more easy to understand all those drivers now after reading the book, it would have been nice to have at least a little overview of other driver types and maybe even a short description of their driver methods.
Conclusion: As I wrote already in the beginning, the book is not perfect, but it is good. While I have not written a device driver for FreeBSD, the book provided enough insight to be able to write one and to understand existing drivers. I really hope there will be a second edition which addresses the minor issues I had while reading it to make it a perfect book.
GD Star Rating
GD Star Rating
Tags: book questions
, device driver
, device drivers
, freebsd kernel
, marketing department
, paper versions
, pdf version
, personal opinion
, preferred one
I have a little problem finding a clean solution to the following problem.
A machine with two network interfaces and no default route. The first interface gets an IP at boot time and the corresponding static route is inserted during boot into the routing table without problems. The second interface only gets an IP address when the shared-IP zones on the machine are started, during boot the interface is plumbed but without any address. The networks on those interfaces are not connected and the machine is not a gateway (this means we have a machine–administration network and a production-network). The static routes we want to have for the addresses of the zones are not added to the routing table, because the next hop is not reachable at the time the routing-setup is done. As soon as the zones are up (and the interface gets an IP), a re-run of the routing-setup adds the missing static routes.
Unfortunately I can not tell Solaris to keep the static route even if the next hop is not reachable ATM (at least I have not found an option to the route command which does this).
One solution to this problem would be to add an address at boot to the interface which does not have an address at boot-time ATM (probably with the deprecated flag set). The problem is, that this subnet (/28) has not enough free addresses anymore, so this is not an option.
Another solution is to use a script which re-runs the routing-setup after the zones are started. This is a pragmatic solution, but not a clean solution.
As I understand the in.routed man-page in.routed is not an option with the default config, because the machine shall not route between the networks, and shall not change the routing based upon RIP messages from other machines. Unfortunately I do not know enough about it to be sure, and I do not get the time to play around with this. I have seen some intersting options regarding this in the man-page, but playing around with this and sniffing the network to see what happens, is not an option ATM. Anyone with a config/tutorial for this “do not broadcast anything, do not accept anything from outside”-case (if possible)?
GD Star Rating
GD Star Rating
Tags: administration network
, boot time
, clean solution
, default config
, default route
, network interfaces
, pragmatic solution
, routing table
, static route
, static routes
As I already wrote, theoretically ADSL RAM is available at my place. The analysis of the situation revealed first that the ISP side of my line uses outdated hardware. After the technician I know unofficially took care about it (remotely switching me to a different port), I have seen an immediate improvement of the signal to noise ratio. It is about 20 dB better.
Unfortunately this was not enough to be able to switch to the rate adaptive mode. According to their database the line length allows to give me 1.5 MBit. My line is running already at 2 MBit and my ADSL modem tells me it could do 8 MBit, so I disagree a bit with their database.
As the technician agrees with me, the next step would be to temporary move my house by some hundred meters towards the ISP endpoint of the line, unfortunately the higher management seems to be having some business ideas with our region (FTTT, Fiber To The Town (which means we will probably get 16 MBit via ADSL) … but maybe even FTTH), so they are now monitoring the database for such changes since a while.
I have the impression they seem to prevent such changes to the database because they think that if people get 2 MBit (instead of nothing, large parts of a town nearby does not even have the slowest ADSL connection) or 8 MBit (instead of 2 MBit), they are not interested in getting FTTH (or 16 MBit). Together with their IPTV initiative I do not really understand it. To get their IPTV, you need to have at least a 8 MBit line. With 8 MBit you can only cover one TV at SD resolution (at least with their IPTV offer), if you want HD resolution, you need to switch to their VDSL stuff (which is not available in our town). What people are doing currently is to switch to a cable provider where they can get about 32 MBit (I do not switch, switching is a risky action here, I rather stay with a slow connection that to have no connection at all for some months). With 32 MBit (and TV) people have less a need to switch to fiber (and pay 150 EUR for the work to get fiber into the house) than with 2 MBit or nothing.
The final outcome is, that the technician I know does not want to ask someone to play with the database to move my house temporary (which I can understand). The good part of those news is, that I may get more than 8 MBit in the not so distant future (the current planning is to finish the FTTT work until autumn).
GD Star Rating
GD Star Rating
Tags: adsl connection
, adsl modem
, business ideas
, cable provider
, mbit line
, outdated hardware
, signal to noise
, signal to noise ratio