EMC^2/Legato Net­work­er 7.5.1.4 tests

Regard­ing our last prob­lems with NW:

  • OK: the “restart NW-server direct­ly after delet­ing a client with index entries”-crash is fixed
  • Most­ly OK: shut­ting down a stor­age node does not crash the NW-server any­more… most of the time (some­times there is some strange behav­ior in this regard, we do not have enough evi­dence, but there may be still some sleep­ing dragons)
  • ?: we did not yet check the dis­as­ter recov­er part
  • NOK: the post-cmd is still run one minute after the pre-cmd in some cas­es, maybe this is relat­ed to a session/save-set which is not yet start­ed but the pre-cmd is already run, if this is the case, this could maybe also affect the case where there is more than one minute of delay between the end of one session/save-set on a machine and the start of anoth­er session/save-set on the same machine (the sup­port is investigating)
  • NOK: some Oracle-RMAN back­ups (cus­tom save com­mand, perl script) show a run­ning ses­sion in the NW-monitoring and some do not, after the back­up mmin­fo some­times lists the group of a RMAN-save-set and some­times not (for the same client), under inves­ti­ga­tion by the support

So, for us 7.5.1.4 is still a beta version.

Ideas list page created

Yes­ter­day I cre­at­ed an ideas list page where I intend to list ran­dom ideas or idea-like thoughts I have. I already have an entry regard­ing auto­mat­ic trans­la­tion of food recipes and some entries with idea-like thoughts regard­ing alter­na­tive ener­gy there. If some­one reads them and decides to approach them, I would like to get some feed­back to see if my idea was crap or not.

WP plu­g­ins and PHP safe_mode

Obvi­ous­ly a lot of WP plu­g­in authors do not check if their plu­g­in is PHP safe_mode/open_basedir com­pat­i­ble. Yes, I know, it is dep­re­cat­ed and does not offer 100% safe­ty, but it is at least an addi­tion­al road-block in some cas­es and may pre­vent some mali­cious behav­ior… If I can choice between 100% break-in pos­si­bil­i­ty and <100% break-in pos­si­bil­i­ty, I chose the later.

I also think most of them also do not check with suhosin. They also fail to list oth­er PHP exten­sion require­ments most of the time, they just assume you have a full install.

  • quick­stats wants the PHP ctype exten­sion, does not seem to play well with sql.safe_mode while the rest of WP does not seem to have an obvi­ous prob­lem with it
  • wp-stats-dashboard wants the PHP curl and json exten­sion (curl does not play well with safe_mode or open_basedir => needs to be dis­abled), needs suhosin.executor.include.max_traversal set to 6; still does not work 100% cor­rect, I delet­ed the cache direc­to­ry con­tents to let it recre­ate the stats, but it still does not dis­play as much vis­its as I can see in the stats on the post­ings page
  • bot-tracker wants the PHP ses­sion extension
  • broken-link-checker tries to write to /var/tmp/ (safe_mode/open_basedir incompatible)
  • one-time-password does not play well with safe_mode/open_basedir
  • smartlink­er tells me that the vari­able cook­ieString is not defined

DNS prob­lem (DomainKey TXT entry)

The TXT entry outgoing-alex._domainkey.leidinger.net which worked for a long time sud­den­ly stopped being deliv­ered by named (thanks to Hen­ri Hen­nebert for the HEADS-UP). Every oth­er query I try works so far, it is real­ly just this one entry (resolve time­out). I noti­fied the respon­si­ble per­son, inves­ti­ga­tion is ongoing.

This means that any ver­i­fi­ca­tion of my out­go­ing mail will not work, as the data nec­es­sary to ver­i­fy the sig­natur is not available. 🙁

EMC^2/Legato Net­work­er 7.5.1.4 status

The update to 7.5.1.4 went fine. No major prob­lems encoun­tered. So far we did not see any regres­sions. The com­plete sys­tem feels a lit­tle bit more sta­ble (no restarts nec­es­sary so far, before some where nec­es­sary from time to time). We still have to test all our prob­lem cases:

  • restart NW-server direct­ly after delet­ing a client with index entries (man­u­al copy of /nsr need­ed before, in case the medi­adb cor­rup­tion bug is not fixed as promised)
  • shut­down a stor­age node to test if the NW-server still crash­es in this case
  • start with an emp­ty medi­adb but pop­u­lat­ed clients (emp­ty /nsr/mm, but untouched /nsr/res) and scan some tapes to check if “shad­ow clients” (my term for clients which have the same client ID but get new­ly cre­at­ed dur­ing the scan­ning with a new client ID and a name of “~<original-name>-<number>”) still get cre­at­ed instead of pop­u­lat­ing the index of the cor­rect client

The first two ones are sup­posed to be fixed, the last one is maybe not fixed.

Not fixed (accord­ing to the sup­port) is the prob­lem of need­ing a restart of the NW-server when mov­ing a tape library from one stor­age node to anoth­er stor­age node. It also seems that our prob­lem with the man­u­al cloning of save sets is not solved. There are still some clone process­es which do not get out of the “serv­er busy” loop, no mat­ter how idle the NW-server is. In this case it can be seen that nsr­clone is wait­ing in nanosleep (use pstack or dtrace to see it). The strange thing is, that a safe set which is “fail­ing” with such behav­ior will always cause this behav­ior. We need to have a deep­er look to see if we find sim­i­lar­i­ties between such safe sets and dif­fer­ences to safe sets which can be cloned with­out problems.