Alexander Leidinger

Just another weblog

Jan
15

Com­plete net­work loss on Solaris 10u10 CPU 2012-10 on vir­tu­al­ized T4-2

The prob­lem I see at work: A T4-2 with 3 guest LDOMs, vir­tu­al­ized disks and net­works lost the com­plete net­work con­nec­tiv­ity “out of the blue” once, and maybe “spo­radic” directly after a cold boot. After a lot of dis­cus­sion with Ora­cle, I have the impres­sion that we have two prob­lems here.

1st prob­lem:
Total net­work loss of the machine (no zone or guest LDOM or the pri­mary LDOM was able to have receive or send IP pack­ets). This hap­pened once. No idea how to repro­duce it. In the logs we see the mes­sage “[ID 920994 kern.warning] WARNING: vnetX: exceeded num­ber of per­mit­ted hand­shake attempts (5) on chan­nel xxx”. Accord­ing to Ora­cle this is sup­posed to be fixed in 148677 – 01 which will come with Solaris 10u11. They sug­gested to use a vsw inter­face instead of a vnet inter­face on the pri­mary domain to at least lower the prob­a­bil­ity of this prob­lem hit­ting us. They were not able to tell us how to repro­duce the prob­lem (seems to be a race con­di­tion, at least I get this impres­sion based upon the descrip­tion of the Ora­cle engi­neer han­dling the SR). Only a reboot helped to get the prob­lem solved. I was told we are the only client which reported this kind of prob­lem, the patch for this prob­lem is based upon an inter­nal bugre­port from inter­nal tests.

2nd prob­lem:
After cold boots some­times some machines (not all) are not able to con­nect to an IP on the T4. A reboot helps, as does remov­ing an inter­face from an aggre­gate and directly adding it again (see below for the sys­tem con­fig). To try to repro­duce the prob­lem, we did a lot of warm reboots of the pri­mary domain, and the prob­lem never showed up. We did some cold reboots, and the prob­lem showed up once.

In case some­one else sees one of those prob­lems on his machines too, please get in con­tact with me to see what we have in com­mon to try to track this down fur­ther and to share info which may help in maybe repro­duc­ing the problems.

Sys­tem setup:

  • T4-2 with 4 HBAs and 8 NICs (4 * igb on-board, 4 * nxge on addi­tional net­work card)
  • 3 guest LDOMs and one io+control domain (both in the pri­mary domain)
  • the guest LDOMs use SAN disks over the 4 HBAs
  • the pri­mary domain uses a mir­rored zpool on SSDs
  • 5 vswitch in the hypervisor
  • 4 aggre­gates (aggr1 — aggr4 with L2-policy), each one with one igb and one nxge NIC
  • each aggre­gate is con­nected to a sep­a­rate vswitch (the 5th vswitch is for machine-internal communication)
  • each guest LDOM has three vnets, each vnets con­nected to a vswitch (1 guest LDOM has aggr1+2 only for zones (via vnets), 2 guest LDOMs have aggr 3+4 only for zones (via vnets), and all LDOMs have aggr2+3 (via vnets) for global-zone com­mu­ni­ca­tion, all LDOMs are addi­tion­ally con­nected to the machine-internal-only vswitch via the 3rd vnet)
  • pri­mary domain uses 2 vnets con­nected to the vswitch which is con­nected to aggr2 and aggr3 (con­sis­tency with the other LDOMs on this machine) and has no zones
  • this means each entity (pri­mary domain, guest LDOMs and each zone) has two vnets in and those two vnets are con­fig­ured in a link-based IPMP setup (vnet-linkprop=phys-state)
  • each vnet has VLAN tag­ging con­fig­ured in the hyper­vi­sor (with the zones being in dif­fer­ent VLANs than the LDOMs)

The pro­posed change by Ora­cle is to replace the 2 vnet inter­faces in the pri­mary domain with 2 vsw inter­faces (which means to do VLAN tag­ging in the pri­mary domain directly instead of in the vnet con­fig). To have IPMP work­ing this means to have vsw-linkprop=phys-state. We have two sys­tems with the same setup, on one sys­tem we already changed this and it is work­ing as before. As we don’t know how to repro­duce the 1st prob­lem, we don’t know if the prob­lem is fixed or not, respec­tively what the prob­a­bil­ity is to get hit again by this problem.

Ideas / sug­ges­tions / info welcome.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Jul
13

Book review: FreeBSD Device Drivers

In mid-April a woman from the mar­ket­ing depart­ment of No Starch Press con­tacted me and asked if I am inter­ested to do a pub­lic review of the FreeBSD Device Dri­vers book by Joseph Kong (no link to a book shop, go and have a look in your pre­ferred one). Just this sim­ple ques­tion, no strings attached.

I had my nose in some device dri­vers in the past, but I never wrote one, and never had a look at the big pic­ture. I was inter­ested to know how every­thing fits together, so this made me a good vic­tim for a review (novice enough to learn some­thing new and to have a look if enough is explained, and expe­ri­enced enough to under­stand what is going on in the FreeBSD ker­nel).

Some min­utes after I agreed to review it (but with a lit­tle notice that I do not know how long I need to review it), I had the PDF ver­sion of the book. That was faster than I expected (maybe I am too old-school and used to have paper ver­sions of books in my hands).

Let the review begin… but bear with me, this is the first time I do a real pub­lic review of a book (instead of a tech­ni­cal review for an author). And as this is my very own per­sonal opin­ion, I will not allow com­ments here. This page is all about my opin­ion while read­ing the book, ques­tions I have while read­ing the book shall serve as a hint about the qual­ity of the book and they should be answered in the book, not here.

In short, the book is not per­fect, but it is a good book. There is room for improve­ment, but on a very high level. If you want to write a device dri­ver for FreeBSD, this book is a must. I sug­gest to read it com­pletely, even chap­ters which do not belong to the type of dri­ver you want to write (spe­cially the case stud­ies of real dri­vers). The rea­son is that each chap­ter has some notes which may not only apply to the chap­ter in ques­tion, but to all kinds of device dri­vers. The long review fol­lows now.

The first chap­ter is titled “Build­ing and run­ning mod­ules”. The author begins with descrip­tion of the usual device dri­ver types (NIC dri­ver, pseudo-device, …) and how they can be added to the ker­nel (sta­t­i­cally linked in or as a mod­ule). The first code exam­ple is a small and easy ker­nel mod­ule, so that we do not have to reboot the sys­tem we use to develop a dri­ver (except we make a fault dur­ing dri­ver devel­op­ment which causes the machine to panic or hang). Every part of the exam­ple is well explained. This is fol­lowed by an overview about char­ac­ter devices (e.g. disks) and a sim­ple character-device dri­ver (so far a pseudo-device, as we do not have real hard­ware we access) which is not only as-well explained as the module-example, but there is also a note where the code was sim­pli­fied and what should be done instead.

After read­ing this chap­ter you should be able to write your own ker­nel mod­ule in 5 min­utes (well, after 5 min­utes it will not be able to do a lot — just a “hello world” – but at least you can already load/unload/execute some code into/from/in the kernel).

I have not tried any exam­ple myself, but I com­piled a lot of mod­ules and dri­vers I mod­i­fied in the past and remem­ber to have seen the described parts.

The sec­ond chap­ter explains how to allo­cate and free mem­ory in the ker­nel. There is the pos­si­bil­ity to allo­cate maybe-contiguous mem­ory (the nor­mal case, when your hard­ware does not do DMA or does not have the require­ment that the mem­ory region it makes DMA from/too needs to be con­tigu­ous), and really con­tigu­ous. For the size argu­ment of the free­ing of the the con­tigu­ous mem­ory there is the sen­tence “Gen­er­ally, size should be equal the amount allo­cated.”. Imme­di­ately I wanted to know what hap­pens if you spec­ify a dif­fer­ent size (as a non-native eng­lish speaker I under­stand this sen­tence in a way that I am allowed to spec­ify a dif­fer­ent size and as such are able to free only parts of the allo­cated mem­ory). Unfor­tu­nately this is not answered. I had a look into the source, the ker­nel frees mem­ory pages, so the size argu­ment (and addr argu­ment) will be rounded to include a full page. This means the­o­ret­i­cally I am able to free parts of the allo­cated mem­ory, but this is a source-maintenance night­mare (needs knowl­edge about the machine spe­cific page bound­aries and you need to make sure that you do the absolutely cor­rect size cal­cu­la­tions).  To me this looks more like as long as nobody is point­ing a gun at my head and tells me to use a dif­fer­ent size, spec­i­fy­ing the same size as made dur­ing the allo­ca­tion of this mem­ory region is the way to go.

After read­ing this chap­ter you should know how to kill the sys­tem by allo­cat­ing all the RAM in the kernel.

Again, I did not try to com­pile the exam­ples in this chap­ter, but the dif­fer­ence of the mem­ory allo­ca­tion in the ker­nel com­pared with mem­ory allo­ca­tion in the user­land is not that big.

The third chap­ter explains the device com­mu­ni­ca­tion and con­trol inter­faces (ioctl/sysctl) of a dri­ver. The ioctl part teached me some parts I always wanted to know when I touched some ioctls, but never both­ered to find out before. Unfor­tu­nately this makes me a lit­tle bit ner­vous about the way ioctls are han­dled in the FreeBSD lin­ux­u­la­tor, but this is not urgent ATM (and can prob­a­bly be han­dled by a com­mend in the right place). The sysctl part takes a lit­tle bit longer to fol­low through, but there is also more to learn about it. If you just mod­ify an exist­ing dri­ver with an exist­ing sysctl inter­face, it prob­a­bly just comes down to copy&paste with lit­tle mod­i­fi­ca­tions, but if you need to make more com­plex changes or want to add a sysctl inter­face to a dri­ver, this part of the book is a good way to under­stand what is pos­si­ble and how every­thing fits together. Per­son­ally I would have wished for a more detailed guide when to pick the ioctl inter­face and when the sysctl inter­face than what was writ­ten in the con­clu­sion of the chap­ter, but it is prob­a­bly not that easy to come up with a good list which fits most drivers.

After read­ing this chap­ter you should be able to get data in and out of the ker­nel in 10 minutes.

As before, I did not com­pile the exam­ples in this chap­ter. I already added ioctls and sysctls in var­i­ous places in the FreeBSD kernel.

Chap­ter 4 is about thread syn­chro­niza­tion – mutexes, shared/exclusive locks, reader/writer locks and con­di­tion vari­ables. For me this chap­ter is not as good as the pre­vi­ous ones. While I got a good expla­na­tion of every­thing, I missed a nice overview table which com­pares the var­i­ous meth­ods of thread syn­chro­niza­tion. Bren­dan Gregg did a nice table to give an overview of DTrace vari­able types and when to use them. Some­thing like this would have been nice in this chap­ter too. Apart from this I got all the info I need (but hey, I already wrote a NFS client for an exper­i­men­tal com­puter with more than 200000 CPUs in 1998, so I’m famil­iar with such syn­chro­niza­tion primitives).

Delayed exe­cu­tion is explained in chap­ter 5. Most of the infor­ma­tion pre­sented there was new to me. While there where not much exam­ples pre­sented (there will be some in a later chap­ter), I got a good overview about what exists. This time there was even an overview when to use which type of delayed exe­cu­tion infra­struc­ture. I would have pre­ferred to have this overview in the begin­ning of the chap­ter, but that is maybe some kind of per­sonal preference.

In chap­ter 6 a com­plete device dri­ver is dis­sected. It is the vir­tual null modem ter­mi­nal dri­ver. The chap­ter pro­vides real-world exam­ples of event-handlers, call­outs and taskqueues which where not demon­strated in chap­ter five. At the same time the chap­ter serves as a descrip­tion of the func­tions a TTY dri­ver needs to have.

Auto­mated device detec­tion with New­bus and the cor­re­spond­ing resource allo­ca­tion (I/O ports, device mem­ory and inter­rupts) are explained in chap­ter 7. It is easy… if you have a real device to play with. Unfor­tu­nately the chap­ter missed a para­graph or two about the sus­pend and resume meth­ods. If you think about it, it is not hard to come up with what they are sup­posed to do, but a lit­tle explicit descrip­tion of what they shall do, in what state the hard­ware should be put and what to assume when being called would have been nice.

Chap­ter 8 is about inter­rupts. It is easy to add an inter­rupt han­dler (or to remove one), the hard part is to gen­er­ate an inter­rupt. The exam­ple code uses the par­al­lel port, and the chap­ter also con­tains a lit­tle expla­na­tion how to gen­er­ate an inter­rupt… if you are not afraid to touch real hard­ware (the par­al­lel port) with a resistor.

In chap­ter 9 the lpt(4) dri­ver is explained, as most of the top­ics dis­cussed so far are used inside. The expla­na­tion how every­thing is used is good, but what I miss some­times is why they are used. The most promi­nent (and only) exam­ple here for me is why are call­outs used to catch stray inter­rupts? That call­outs are a good way of han­dling this is clear to me, the big ques­tion is why can there be stray inter­rupts. Can this hap­pen only for the par­al­lel port (respec­tively a lim­ited amount of devices), or does every dri­ver for real inter­rupt dri­ven hard­ware need to come with some­thing like this? I assume this is some­thing spe­cific to the device, but a lit­tle expla­na­tion regard­ing this would have been nice.

Access­ing I/O ports and I/O mem­ory for devices are explained in chap­ter 10 based upon a dri­ver for a LED device (turn on and off 2 LEDs on an ISA bus). All the func­tions to read and write data are well explained, just the part about the mem­ory bar­rier is a lit­tle bit short. It is not clear why the CPU reorder­ing of mem­ory accesses mat­ter to what looks like func­tion calls. Those func­tion calls may be macros, but this is not explained in the text. Some lit­tle exam­ples when to use the bar­ri­ers instead of an abstract descrip­tion would also have been nice at this point.

Chap­ter 11 is sim­i­lar to chap­ter 10, just that a PCI bus dri­ver is dis­cussed instead of an ISA bus dri­ver. The dif­fer­ences are not that big, but important.

In chap­ter 12 it is explained how to do DMA in a dri­ver. This part is not easy to under­stand. I would have wanted to have more exam­ples and expla­na­tions of the DMA tag and DMA map parts. I am also sur­prised to see dif­fer­ent sup­ported archi­tec­tures for the flags BUS_DMA_COHERENT and BUS_DMA_NOCACHE for dif­fer­ent func­tions. Either this means FreeBSD is not coher­ent in those parts, or it is a bug in the book, or it is sup­posed to be like this and the rea­sons are not explained in the book. As there is no explicit note about this, it prob­a­bly leads to con­fu­sion of read­ers which pay enough atten­tion here. It would also have been nice to have an expla­na­tion when to use those flags which are only imple­mented on a sub­set of the archi­tec­tures FreeBSD sup­ports. Any­way, the expla­na­tions give enough infor­ma­tion to under­stand what is going on and to be able to have a look at other device dri­vers for real-live exam­ples and to get a deeper under­stand­ing of this topic.

Disk dri­vers and block I/O (bio) requests are described in chap­ter 13. With this chap­ter I have a lit­tle prob­lem. The author used the word “unde­fined” in sev­eral places where I as a non-native speaker would have used “not set” or “set to 0″. The word “unde­fined” implies for me that there may be garbage inside, whereas from a tech­ni­cal point of view I can not imag­ine that some ran­dom value in those places would have the desired result. In my opin­ion each such place is obvi­ous, so I do not expect that an expe­ri­enced pro­gram­mer would lose time/hairs/sanity over it, but inex­pe­ri­enced pro­gram­mers which try to assem­ble the cor­re­spond­ing struc­tures on the (unini­tial­ized) heap (for what­ever rea­son), may strug­gle with this.

Chap­ter 14 is about the CAM layer. While the pre­vi­ous chap­ter showed how to write a dri­ver for a disk device, chap­ter 14 gave an overview about how to an HBA to the CAM layer. It is just an overview, it looks like CAM needs a book on its own to be fully described. The sim­ple (and most impor­tant) cases are described, with the hardware-specific parts being an exer­cise for the per­son writ­ing the device dri­ver. I have the impres­sion it gives enough details to let some­one with hard­ware (or pro­to­col), and more impor­tantly doc­u­men­ta­tion for this device, start writ­ing a driver.

It would have been nice if chap­ter 13 and 14 would have had a lit­tle schematic which describes at which level of the kernel-subsystems the cor­re­spond­ing dri­ver sits. And while I am at it, a schematic with all the dri­ver com­po­nents dis­cussed in this book at the begin­ning as an overview, or in the end as an annex, would be great too.

An overview of USB dri­vers is given in chap­ter 15 with the USB printer dri­ver as an exam­ple for the expla­na­tion of the USB dri­ver inter­faces. If USB would not be as com­plex as it is, it would be a nice chap­ter to start driver-writing exper­i­ments (due to the avail­abil­ity of var­i­ous USB devices). Well… bad luck for curi­ous peo­ple. BTW, the author gives point­ers to the offi­cial USB docs, so if you are really curi­ous, feel free to go ahead. :)

Chap­ter 16 is the first part about net­work dri­vers. It deals with ifnet (e.g. stuff needed for ifcon­fig), ifme­dia (sim­pli­fied: which kind of cable and speed is sup­ported), mbufs and MSI(-X). As in other chap­ters before, a lit­tle overview and a lit­tle pic­ture in the begin­ning would have been nice.

Finally, in chap­ter 17, the packet recep­tion and trans­mis­sion of net­work dri­vers is described. Large exam­ple code is bro­ken up into sev­eral pieces here, for more easy dis­cus­sion of related information.

One thing I miss after reach­ing the end of the book is a dis­cus­sion of sound dri­vers. And this is surely not the only type of dri­vers which is not dis­cussed, I can come up with crypto, firewire, gpio, watch­dog, smb and iic devices within a few sec­onds. While I think that it is much more easy to under­stand all those dri­vers now after read­ing the book, it would have been nice to have at least a lit­tle overview of other dri­ver types and maybe even a short descrip­tion of their dri­ver methods.

Con­clu­sion: As I wrote already in the begin­ning, the book is not per­fect, but it is good. While I have not writ­ten a device dri­ver for FreeBSD, the book pro­vided enough insight to be able to write one and to under­stand exist­ing dri­vers. I really hope there will be a sec­ond edi­tion which addresses the minor issues I had while read­ing it to make it a per­fect book.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
May
31

Free DLNA server which works good with my Sony BRAVIA TV

In sev­eral pre­vi­ous posts I wrote about my quest for the right source for­mat to stream video to my Sony BRAVIA TV (build in 2009). The last week-end I finally found some­thing which sat­is­fies me.

What I found was serviio, a free UPnP-AV (DLNA) server. It is writ­ten in java and runs on Win­dows, Linux and FreeBSD (it is not listed on the web­site, but we have an not-so-up-to-date ver­sion in the ports tree). If nec­es­sary it transcodes the input to an appro­pri­ate for­mat for the DLNA ren­derer (in my case the TV).

I tested it with my slow Net­book, so that I was able to see with which input for­mat it will just remux the input con­tainer to a MPEG trans­port stream, and which input for­mat would be really re-encoded to a for­mat the TV understands.

The bot­tom line of the tests is, that I just need to use a sup­ported con­tainer (like MKV or MP4 or AVI) with H.264-encoded video (e.g. encoded by x264) and AC3 audio.

The TV is able to chose between sev­eral audio streams, but I have not tested if serviio is able to serve files with mul­ti­ple audio streams (my wife has a dif­fer­ent mother lan­guage than me, so it is inter­est­ing for us to have mul­ti­ple audio streams for a movie), and I do not know if DLNA sup­ports some­thing like this.

Now I just have to replace minidlna (which only works good with my TV for MP3s and Pic­tures) with serviio on my FreeBSD file server and we can for­get about the disk-juggling.

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Sep
30

Forc­ing a route in Solaris?

I have a lit­tle prob­lem find­ing a clean solu­tion to the fol­low­ing problem.

A machine with two net­work inter­faces and no default route. The first inter­face gets an IP at boot time and the cor­re­spond­ing sta­tic route is inserted dur­ing boot into the rout­ing table with­out prob­lems. The sec­ond inter­face only gets an IP address when the shared-IP zones on the machine are started, dur­ing boot the inter­face is plumbed but with­out any address. The net­works on those inter­faces are not con­nected and the machine is not a gate­way (this means we have a machine–admin­is­tra­tion net­work and a production-network). The sta­tic routes we want to have for the addresses of the zones are not added to the rout­ing table, because the next hop is not reach­able at the time the routing-setup is done. As soon as the zones are up (and the inter­face gets an IP), a re-run of the routing-setup adds the miss­ing sta­tic routes.

Unfor­tu­nately I can not tell Solaris to keep the sta­tic route even if the next hop is not reach­able ATM (at least I have not found an option to the route com­mand which does this).

One solu­tion to this prob­lem would be to add an address at boot to the inter­face which does not have an address at boot-time ATM (prob­a­bly with the dep­re­cated flag set). The prob­lem is, that this sub­net (/28) has not enough free addresses any­more, so this is not an option.

Another solu­tion is to use a script which re-runs the routing-setup after the zones are started. This is a prag­matic solu­tion, but not a clean solution.

As I under­stand the in.routed man-page in.routed is not an option with the default con­fig, because the machine shall not route between the net­works, and shall not change the rout­ing based upon RIP mes­sages from other machines. Unfor­tu­nately I do not know enough about it to be sure, and I do not get the time to play around with this. I have seen some inter­st­ing options regard­ing this in the man-page, but play­ing around with this and sniff­ing the net­work to see what hap­pens, is not an option ATM. Any­one with a config/tutorial for this “do not broad­cast any­thing, do not accept any­thing from outside”-case (if possible)?

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,
Apr
27

ADSL RAM … finally aban­doned (but with good news)

As I already wrote, the­o­ret­i­cally ADSL RAM is avail­able at my place. The analy­sis of the sit­u­a­tion revealed first that the ISP side of my line uses out­dated hard­ware. After the tech­ni­cian I know unof­fi­cially took care about it (remotely switch­ing me to a dif­fer­ent port), I have seen an imme­di­ate improve­ment of the sig­nal to noise ratio. It is about 20 dB better.

Unfor­tu­nately this was not enough to be able to switch to the rate adap­tive mode. Accord­ing to their data­base the line length allows to give me 1.5 MBit. My line is run­ning already at 2 MBit and my ADSL modem tells me it could do 8 MBit, so I dis­agree a bit with their database.

As the tech­ni­cian agrees with me, the next step would be to tem­po­rary move my house by some hun­dred meters towards the ISP end­point of the line, unfor­tu­nately the higher man­age­ment seems to be hav­ing some busi­ness ideas with our region (FTTT, Fiber To The Town (which means we will prob­a­bly get 16 MBit via ADSL) … but maybe even FTTH), so they are now mon­i­tor­ing the data­base for such changes since a while.

I have the impres­sion they seem to pre­vent such changes to the data­base because they think that if peo­ple get 2 MBit (instead of noth­ing, large parts of a town nearby does not even have the slow­est ADSL con­nec­tion) or 8 MBit (instead of 2 MBit), they are not inter­ested in get­ting FTTH (or 16 MBit). Together with their IPTV ini­tia­tive I do not really under­stand it. To get their IPTV, you need to have at least a 8 MBit line. With 8 MBit you can only cover one TV at SD res­o­lu­tion (at least with their IPTV offer), if you want HD res­o­lu­tion, you need to switch to their VDSL stuff (which is not avail­able in our town). What peo­ple are doing cur­rently is to switch to a cable provider where they can get about 32 MBit (I do not switch, switch­ing is a risky action here, I rather stay with a slow con­nec­tion that to have no con­nec­tion at all for some months). With 32 MBit (and TV) peo­ple have less a need to switch to fiber (and pay 150 EUR for the work to get fiber into the house) than with 2 MBit or nothing.

The final out­come is, that the tech­ni­cian I know does not want to ask some­one to play with the data­base to move my house tem­po­rary (which I can under­stand). The good part of those news is, that I may get more than 8 MBit in the not so dis­tant future (the cur­rent plan­ning is to fin­ish the FTTT work until autumn).

GD Star Rat­ing
load­ing…
GD Star Rat­ing
load­ing…
Share

Tags: , , , , , , , , ,