The update to 126.96.36.199 went fine. No major problems encountered. So far we did not see any regressions. The complete system feels a little bit more stable (no restarts necessary so far, before some where necessary from time to time). We still have to test all our problem cases:
- restart NW-server directly after deleting a client with index entries (manual copy of /nsr needed before, in case the mediadb corruption bug is not fixed as promised)
- shutdown a storage node to test if the NW-server still crashes in this case
- start with an empty mediadb but populated clients (empty /nsr/mm, but untouched /nsr/res) and scan some tapes to check if “shadow clients” (my term for clients which have the same client ID but get newly created during the scanning with a new client ID and a name of “~<original-name>-<number>”) still get created instead of populating the index of the correct client
The first two ones are supposed to be fixed, the last one is maybe not fixed.
Not fixed (according to the support) is the problem of needing a restart of the NW-server when moving a tape library from one storage node to another storage node. It also seems that our problem with the manual cloning of save sets is not solved. There are still some clone processes which do not get out of the “server busy” loop, no matter how idle the NW-server is. In this case it can be seen that nsrclone is waiting in nanosleep (use pstack or dtrace to see it). The strange thing is, that a safe set which is “failing” with such behavior will always cause this behavior. We need to have a deeper look to see if we find similarities between such safe sets and differences to safe sets which can be cloned without problems.
Tags: client id, cloning, complete system, corruption bug, index entries, legato networker, nsr, regressions, strange thing, tape library —