EMC^2/Legato Net­work­er 7.5.1.4 status

The update to 7.5.1.4 went fine. No major prob­lems encoun­tered. So far we did not see any regres­sions. The com­plete sys­tem feels a lit­tle bit more sta­ble (no restarts nec­es­sary so far, before some where nec­es­sary from time to time). We still have to test all our prob­lem cases:

  • restart NW-server direct­ly after delet­ing a client with index entries (man­u­al copy of /nsr need­ed before, in case the medi­adb cor­rup­tion bug is not fixed as promised)
  • shut­down a stor­age node to test if the NW-server still crash­es in this case
  • start with an emp­ty medi­adb but pop­u­lat­ed clients (emp­ty /nsr/mm, but untouched /nsr/res) and scan some tapes to check if “shad­ow clients” (my term for clients which have the same client ID but get new­ly cre­at­ed dur­ing the scan­ning with a new client ID and a name of “~<original-name>-<number>”) still get cre­at­ed instead of pop­u­lat­ing the index of the cor­rect client

The first two ones are sup­posed to be fixed, the last one is maybe not fixed.

Not fixed (accord­ing to the sup­port) is the prob­lem of need­ing a restart of the NW-server when mov­ing a tape library from one stor­age node to anoth­er stor­age node. It also seems that our prob­lem with the man­u­al cloning of save sets is not solved. There are still some clone process­es which do not get out of the “serv­er busy” loop, no mat­ter how idle the NW-server is. In this case it can be seen that nsr­clone is wait­ing in nanosleep (use pstack or dtrace to see it). The strange thing is, that a safe set which is “fail­ing” with such behav­ior will always cause this behav­ior. We need to have a deep­er look to see if we find sim­i­lar­i­ties between such safe sets and dif­fer­ences to safe sets which can be cloned with­out problems.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.