Alexander Leidinger

Just another weblog


EMC^2/Legato Net­worker status

The update to went fine. No major prob­lems encoun­tered. So far we did not see any regres­sions. The com­plete sys­tem feels a lit­tle bit more sta­ble (no restarts nec­es­sary so far, before some where nec­es­sary from time to time). We still have to test all our prob­lem cases:

  • restart NW-server directly after delet­ing a client with index entries (man­ual copy of /nsr needed before, in case the medi­adb cor­rup­tion bug is not fixed as promised)
  • shut­down a stor­age node to test if the NW-server still crashes in this case
  • start with an empty medi­adb but pop­u­lated clients (empty /nsr/mm, but untouched /nsr/res) and scan some tapes to check if “shadow clients” (my term for clients which have the same client ID but get newly cre­ated dur­ing the scan­ning with a new client ID and a name of “~<original-name>-<number>”) still get cre­ated instead of pop­u­lat­ing the index of the cor­rect client

The first two ones are sup­posed to be fixed, the last one is maybe not fixed.

Not fixed (accord­ing to the sup­port) is the prob­lem of need­ing a restart of the NW-server when mov­ing a tape library from one stor­age node to another stor­age node. It also seems that our prob­lem with the man­ual cloning of save sets is not solved. There are still some clone processes which do not get out of the “server busy” loop, no mat­ter how idle the NW-server is. In this case it can be seen that nsr­clone is wait­ing in nanosleep (use pstack or dtrace to see it). The strange thing is, that a safe set which is “fail­ing” with such behav­ior will always cause this behav­ior. We need to have a deeper look to see if we find sim­i­lar­i­ties between such safe sets and dif­fer­ences to safe sets which can be cloned with­out problems.

GD Star Rat­ing
GD Star Rat­ing

Tags: , , , , , , , , ,

No Responses to “EMC^2/Legato Net­worker status”

Leave a Reply