The problem I see at work: A T4-2 with 3 guest LDOMs, virtualized disks and networks lost the complete network connectivity “out of the blue” once, and maybe “sporadic” directly after a cold boot. After a lot of discussion with Oracle, I have the impression that we have two problems here.
Total network loss of the machine (no zone or guest LDOM or the primary LDOM was able to have receive or send IP packets). This happened once. No idea how to reproduce it. In the logs we see the message “[ID 920994 kern.warning] WARNING: vnetX: exceeded number of permitted handshake attempts (5) on channel xxx”. According to Oracle this is supposed to be fixed in 148677 – 01 which will come with Solaris 10u11. They suggested to use a vsw interface instead of a vnet interface on the primary domain to at least lower the probability of this problem hitting us. They were not able to tell us how to reproduce the problem (seems to be a race condition, at least I get this impression based upon the description of the Oracle engineer handling the SR). Only a reboot helped to get the problem solved. I was told we are the only client which reported this kind of problem, the patch for this problem is based upon an internal bugreport from internal tests.
After cold boots sometimes some machines (not all) are not able to connect to an IP on the T4. A reboot helps, as does removing an interface from an aggregate and directly adding it again (see below for the system config). To try to reproduce the problem, we did a lot of warm reboots of the primary domain, and the problem never showed up. We did some cold reboots, and the problem showed up once.
In case someone else sees one of those problems on his machines too, please get in contact with me to see what we have in common to try to track this down further and to share info which may help in maybe reproducing the problems.
- T4-2 with 4 HBAs and 8 NICs (4 * igb on-board, 4 * nxge on additional network card)
- 3 guest LDOMs and one io+control domain (both in the primary domain)
- the guest LDOMs use SAN disks over the 4 HBAs
- the primary domain uses a mirrored zpool on SSDs
- 5 vswitch in the hypervisor
- 4 aggregates (aggr1 — aggr4 with L2-policy), each one with one igb and one nxge NIC
- each aggregate is connected to a separate vswitch (the 5th vswitch is for machine-internal communication)
- each guest LDOM has three vnets, each vnets connected to a vswitch (1 guest LDOM has aggr1+2 only for zones (via vnets), 2 guest LDOMs have aggr 3+4 only for zones (via vnets), and all LDOMs have aggr2+3 (via vnets) for global-zone communication, all LDOMs are additionally connected to the machine-internal-only vswitch via the 3rd vnet)
- primary domain uses 2 vnets connected to the vswitch which is connected to aggr2 and aggr3 (consistency with the other LDOMs on this machine) and has no zones
- this means each entity (primary domain, guest LDOMs and each zone) has two vnets in and those two vnets are configured in a link-based IPMP setup (vnet-linkprop=phys-state)
- each vnet has VLAN tagging configured in the hypervisor (with the zones being in different VLANs than the LDOMs)
The proposed change by Oracle is to replace the 2 vnet interfaces in the primary domain with 2 vsw interfaces (which means to do VLAN tagging in the primary domain directly instead of in the vnet config). To have IPMP working this means to have vsw-linkprop=phys-state. We have two systems with the same setup, on one system we already changed this and it is working as before. As we don’t know how to reproduce the 1st problem, we don’t know if the problem is fixed or not, respectively what the probability is to get hit again by this problem.
Ideas / suggestions / info welcome.
GD Star Rating
GD Star Rating
Recently we had a strange performance problem at work. A web application was having slow response times from time to time and users complained. We did not see an uncommon CPU/mem/swap usage on any involved machine. I generated heat-maps from performance measurements and there where no obvious traces of slow behavior. We did not find any reason why the application should be slow for clients, but obviously it was.
Then someone mentioned two recent apache DoS problems. Number one — the cookie hash issue — did not seem to be the cause, we did not see a huge CPU or memory consumption which we would expect to see with such an attack. The second one — the slow reads problem (no max connection duration timeout in apache, can be exploited by a small receive window for TCP) — looked like it could be an issue. The slow read DoS problem can be detected by looking at the server-status page.
What you would see on the server-status page are a lot of worker threads in the ‘W’ (write data) state. This is supposed to be an indication of slow reads. We did see this.
As our site is behind a reverse proxy with some kind of IDS/IPS feature, we took the reverse proxy out of the picture to get a better view of who is doing what (we do not have X-Forwarded-For configured).
At this point we noticed still a lot of connection in the ‘W’ state from the rev-proxy. This was strange, it was not supposed to do this. After restarting the rev-proxy (while the clients went directly to the webservers) we had those ‘W’ entries still in the server-status. This was getting really strange. And to add to this, the duration of the ‘W’ state from the rev-proxy tells that this state is active since several thousand seconds. Ugh. WTF?
Ok, next step: killing the offenders. First I verified in the list of connections in the server-status (extended-status is activated) that all worker threads with the rev–proxy connection of a given PID are in this strange state and no client request is active. Then I killed this particular PID. I wanted to do this until I do not have those strange connections anymore. Unfortunately I arrived at PIDs which were listed in the server-status (even after a refresh), but not available in the OS. That is bad. Very bad.
So the next step was to move all clients away from one webserver, and then to reboot this webserver completely to be sure the entire system is in a known good state for future monitoring (the big hammer approach).
As we did not know if this strange state was due to some kind of mis-administration of the system or not, we decided to have the rev-proxy again in front of the webserver and to monitor the systems.
We survived about one and a half day. After that all worker threads on all webservers where in this state. DoS. At this point we where sure there was something malicious going on (some days later our management showed us a mail from a company which offered security consulting 2 months before to make sure we do not get hit by a DDoS during the holiday season… a coincidence?).
Next step, verification of missing security patches (unfortunately it is not us who decides which patches we apply to the systems). What we noticed is, that the rev-proxy is missing a patch for a DoS problem, and for the webservers a new fixpack was scheduled to be released not far in the future (as of this writing: it is available now).
Since we applied the DoS fix for the rev-proxy, we do not have a problem anymore. This is not really conclusive, as we do not really know if this fixed the problem or if the attacker stopped attacking us.
From reading what the DoS patch fixes, we would assume we should see some continuous traffic going on between the rev-rpoxy and the webserver, but there was nothing when we observed the strange state.
We are still not allowed to apply patches as we think we should do, but at least we have a better monitoring in place to watch out for this particular problem (activate the extended status in apache/IHS, look for lines with state ‘W’ and a long duration (column ‘SS’), raise an alert if the duration is higher than the max. possible/expected/desired duration for all possible URLs).
GD Star Rating
GD Star Rating
Tags: dos problem
, dos problems
, memory consumption
, performance measurements
, performance problem
, proxy connection
, reverse proxy
, slow response times
, swap usage
, worker threads
I have a little problem finding a clean solution to the following problem.
A machine with two network interfaces and no default route. The first interface gets an IP at boot time and the corresponding static route is inserted during boot into the routing table without problems. The second interface only gets an IP address when the shared-IP zones on the machine are started, during boot the interface is plumbed but without any address. The networks on those interfaces are not connected and the machine is not a gateway (this means we have a machine–administration network and a production-network). The static routes we want to have for the addresses of the zones are not added to the routing table, because the next hop is not reachable at the time the routing-setup is done. As soon as the zones are up (and the interface gets an IP), a re-run of the routing-setup adds the missing static routes.
Unfortunately I can not tell Solaris to keep the static route even if the next hop is not reachable ATM (at least I have not found an option to the route command which does this).
One solution to this problem would be to add an address at boot to the interface which does not have an address at boot-time ATM (probably with the deprecated flag set). The problem is, that this subnet (/28) has not enough free addresses anymore, so this is not an option.
Another solution is to use a script which re-runs the routing-setup after the zones are started. This is a pragmatic solution, but not a clean solution.
As I understand the in.routed man-page in.routed is not an option with the default config, because the machine shall not route between the networks, and shall not change the routing based upon RIP messages from other machines. Unfortunately I do not know enough about it to be sure, and I do not get the time to play around with this. I have seen some intersting options regarding this in the man-page, but playing around with this and sniffing the network to see what happens, is not an option ATM. Anyone with a config/tutorial for this “do not broadcast anything, do not accept anything from outside”-case (if possible)?
GD Star Rating
GD Star Rating
Tags: administration network
, boot time
, clean solution
, default config
, default route
, network interfaces
, pragmatic solution
, routing table
, static route
, static routes
I have the habit to chmod with the relative notation (e.g. g+w or a+r or go-w or similar) instead of the absolute one (e.g. 0640 or u=rw,g=r,o=). Recently I had to chmod a lot of files. As usual I was using the relative notation. With a lot of files, this took a lot of time. Time was not really an issue, so I did not stop it to restart with a better performing command (e.g. find /path –type f –print0 | xargs –0 chmod 0644; find /path –type d –print0 | xargs –0 chmod 0755), but I thought a little tips&tricks posting may be in order, as not everyone knows the difference.
The relative notation
When you specify g+w, it means to remove the write access for the group, but keep everything else like it is. Naturally this means that chmod first has to lookup the current access rights. So for each async write request, there has to be a read-request first.
The absolute notation
The absolute notation is what most people are used to (at least the numeric one). It does not need to read the access rights before changing them, so there is less I/O to be done to get what you want. The drawback is that it is not so nice for recursive changes. You do not want to have the x-bit for data files, but you need it for directories. If you only have a tree with data files where you want to have an uniform access, the example above via find is probably faster (for sure if the directory meta-data is still in RAM).
If you have a mix of binaries and data, it is a little bit more tricky to come up with a way which is faster. If the data has a name-pattern, you could use it in the find.
And if you have a non-uniform access for the group bits and want to make sure the owner has write access to everything, it may be faster to use the relative notation than to find a replacement command-sequence with the absolute notation.
GD Star Rating
GD Star Rating
, command sequence
, little bit
, meta data
, path type
, speed traps
, time time
Recently I switched to a new client where the Solaris Security Toolkit (JASS) is extensively used. I am now in the process of updating some things, among them are JET and JASS. As part of this work I reevaluate the local JASS modifications. Previously a custom JASS package was used, but in case JASS is updated by Oracle at some point in time (and an update is really needed, see below), this would need some amount of work to find out the differences and to forward port them to the new version. If everything is well documented, this should not be hard to do, but the person doing the work also needs to find the up-to-date docs.
To make it more easy I decided to change this. I now install the official JASS package via JET together with the latest patch for it, and then let JET copy our modifications over the installed package. Instead of modifying existing drivers, I created our own drivers with a reference to the driver which served as a base.
While doing this I encountered several shortcomings of JASS on Solaris 10.
There are several FS based checks which do not make sense to do for the FS of zones in a global zone (at least not the way I use JASS, so maybe a configurable way of changing the behavior should serve for everyone). If zones are installed in /zones, you do not need to check for files without valid UIDs (you surely find a lot of files, as the users are defined inside the zones and not in the global zone) or similar things (even not for world writable files, as the zones are installed in a root-access-only subtree and inside the zones there may be other security constraints configured inside JASS, read: it is the responsibility of JASS inside the zone to do this). An easy solution would be to exclude those FS which contain zones (and as we only have one subtree, I just hardcoded this in several scripts).
I also miss the possibility (maybe I overlooked a simple way) for the ssh check to limit the AllowRootLogin to specific hosts. JASS only checks yes or no, but can not limit it to specific hosts (e.g. via “Match IP/hostname”). Often you do not need to permit root-logins (RBAC/sudo/…), but sometimes it is the only way to handle a particular edge-case (or to speed up an action dramatically), and in such cases you do not want to allow root-logins more than necessary.
GD Star Rating
GD Star Rating
Tags: easy solution
, forward port
, point in time
, security constraints
, solaris 10
, solaris security
, world writable