No phone @Work since a week…

Since a week we (7−10 peo­ple) can not make or receive phone calls with the phones at work. Luck­i­ly this is only the remote-half of the team which works for this client, the onsite peo­ple do not have this problem.

As bad as this is from the client-relations side of view, I have to say this makes it qui­et and relax­ing here ATM… We get requests via EMail or our ticket-system (or a cowork­er pro­vides some info on the mobiles) and we can han­dle them with­out much disturbance.

Before you think bad about the com­pa­ny I work for… we are just a sub­con­trac­tor, the phone lines are not han­dled by us (but I was told that the issue is now looked at by the big boss).

The run­ning gag of the week is mak­ing the error-noise of the phone.

EMC^2/Legato Net­work­er 7.5.1.4 tests

Regard­ing our last prob­lems with NW:

  • OK: the “restart NW-server direct­ly after delet­ing a client with index entries”-crash is fixed
  • Most­ly OK: shut­ting down a stor­age node does not crash the NW-server any­more… most of the time (some­times there is some strange behav­ior in this regard, we do not have enough evi­dence, but there may be still some sleep­ing dragons)
  • ?: we did not yet check the dis­as­ter recov­er part
  • NOK: the post-cmd is still run one minute after the pre-cmd in some cas­es, maybe this is relat­ed to a session/save-set which is not yet start­ed but the pre-cmd is already run, if this is the case, this could maybe also affect the case where there is more than one minute of delay between the end of one session/save-set on a machine and the start of anoth­er session/save-set on the same machine (the sup­port is investigating)
  • NOK: some Oracle-RMAN back­ups (cus­tom save com­mand, perl script) show a run­ning ses­sion in the NW-monitoring and some do not, after the back­up mmin­fo some­times lists the group of a RMAN-save-set and some­times not (for the same client), under inves­ti­ga­tion by the support

So, for us 7.5.1.4 is still a beta version.

Fun­ny dis­cus­sion @Work

Warn­ing, some per­sons need to be in a spe­cial mood to appre­ci­ate the fol­low­ing! Obvi­ous­ly, we were in such a spe­cial mood…

My ques­tion: Shall I take care about X?

Cowork­ers answer: Yes, you can!

My response after tak­ing care: Yes, I can! Yeah! Yes, I did! Yeah!

Cowork­ers response: Maybe I can make a T‑Shirt like Obama…

What I think about it: Yes, you can!

EMC^2/Legato Net­work­er 7.5.1.4 status

The update to 7.5.1.4 went fine. No major prob­lems encoun­tered. So far we did not see any regres­sions. The com­plete sys­tem feels a lit­tle bit more sta­ble (no restarts nec­es­sary so far, before some where nec­es­sary from time to time). We still have to test all our prob­lem cases:

  • restart NW-server direct­ly after delet­ing a client with index entries (man­u­al copy of /nsr need­ed before, in case the medi­adb cor­rup­tion bug is not fixed as promised)
  • shut­down a stor­age node to test if the NW-server still crash­es in this case
  • start with an emp­ty medi­adb but pop­u­lat­ed clients (emp­ty /nsr/mm, but untouched /nsr/res) and scan some tapes to check if “shad­ow clients” (my term for clients which have the same client ID but get new­ly cre­at­ed dur­ing the scan­ning with a new client ID and a name of “~<original-name>-<number>”) still get cre­at­ed instead of pop­u­lat­ing the index of the cor­rect client

The first two ones are sup­posed to be fixed, the last one is maybe not fixed.

Not fixed (accord­ing to the sup­port) is the prob­lem of need­ing a restart of the NW-server when mov­ing a tape library from one stor­age node to anoth­er stor­age node. It also seems that our prob­lem with the man­u­al cloning of save sets is not solved. There are still some clone process­es which do not get out of the “serv­er busy” loop, no mat­ter how idle the NW-server is. In this case it can be seen that nsr­clone is wait­ing in nanosleep (use pstack or dtrace to see it). The strange thing is, that a safe set which is “fail­ing” with such behav­ior will always cause this behav­ior. We need to have a deep­er look to see if we find sim­i­lar­i­ties between such safe sets and dif­fer­ences to safe sets which can be cloned with­out problems.