Troubleshooting & Resolving GFS issues
In order for RHCS/GFS to function correctly the cluster must remain quorate. Once inquorate the cluster will halt all operations, which means all GFS operations are halted. The cluster manager does this in order to avoid data corruption (so its a good thing). The end goal of resolving RHCS/GFS issues is to get the cluster quorate.
Resolving Issues
GFS Deadlock condition
A deadlock condition can be identified if the cluster is quorate (clustat or cman_tool status) but file operations are extremely slow.
Method 1:
- First have the app team stop whatever is using GFS
- Then try to restart GFS services on all cluster nodes
/etc/init.d/gfs restart
Method 2:
- Identify the node that is having the issue. Most likely it is the one where one or more of the GFS filesystems are hung. On each cluster node issue the command “gfs_tool counters {mountpoint}” for every GFS filesystem.
$ for mnt in `grep gfs /proc/mounts | cut -f2 -d' '` ; do echo $mnt ; gfs_tool counters $mnt ; done
- If there are filesystems that are only hung on one node (using the above command) - that is most likely the hung node.
- Attempt to remove the suspected hung node from the cluster. On the suspected hung node issue the command “cman_tool leave remove” If this command does not return within 5 minutes continue to the next step.
- Set the GFS services to not autostart on reboot.
# for srv in ccsd cman fenced clvmd gfs; do /sbin/chkconfig $srv off; done
- Fence the suspected hung node from the cluster. Issue the command “fence_node {hung node fence name}” from one of the other nodes. The node name is not the same as the host name, get the correct node name from “clustat”. If this command fails or does not return within 5 minutes proceed to the next step.
- Reboot the node. If the node hangs while shutting down then power it down via the out-of-band console interface.
- Monitor the fence and cluster status by viewing the messages file on the other nodes. Once the fence is successful the cluster should resume activity, including GFS. If you set the services to not autostart, you will have to start them manually. Check GFS filesystem status via “gfs_tool counters {mountpoint}”.
/etc/init.d/ccsd start /etc/init.d/cman start /etc/init.d/fenced start /etc/init.d/clvmd start /etc/init.d/gfs start
- If you set the services to not autostart, you will have to re-enable them.
# for srv in ccsd cman fenced clvmd gfs; do /sbin/chkconfig $srv on; done
Method 3:
- Identify the filesystem that is having the issue. More than likely the support team will have done this for you, but you can also issue the command “gfs_tool counters {mountpoint}. Look for 'locks held' over 1M. If the gfs_tool command hangs move to Method 2.
- Issue the command 'gfs_tool {mountpoint} settune glock_purge 80'
- Monitor 'locks held' via 'gfs_tool counters -c {mountpoint}'
HW Failures
If a cluster node has a HW issue it must be evicted from the cluster. This either happens spontaneously by the failed server shutting down, or automatically by another cluster node fencing the failed node. In either case the cluster must be made quorate again in order for cluster operations to continue. There have been several instances where fencing does not complete successfully (or more correctly the cluster does not believe the fence completed successfully when in fact it had). You'll be able to tell if the cluster believes the fence failed because there will be numerous “fence failed” entries in /var/log/messages (several per second). Also the node that was fenced will probably be “stuck” in a powered off state – attempts to power on using the out-of-band console interface will fail. To resolve:
- Power on the fenced server
- Once the failed server has restarted and all of the cluster services are running the fence event is marked “successful” and cluster operations will continue.
Obviously the above procedure can result in a long outage, so to minimize the impact please note the following:
- Once quorate the cluster will pause for a time equal to post_join_delay before resuming cluster activities. Default post_join_delay is set to 600 seconds, so even after the cluster regains quorum it will wait 10 minutes before GFS will be available.
- It is possible to issue the command “fence_tool -c” which tells the cluster that all nodes are clean. Use this with caution as if it is not true data corruption could occur. But it could be used in the above instance where a server had a HW issue, the fence event was unsuccessful (but you've confirmed that in fact the failed node is powered down). Issuing “fence_tool -c” should resume cluster operations more quickly than waiting for the failed node to reboot and start cluster services.
- Also clvmd does not allow lvm changes if a cluster is inquorate or a node is missing.
Commands - clustat
Use this tool to see general cluster status. Run the command from all cluster nodes. All nodes should report the same status - either Quorate or Inquorate - the one that is different likely is the unhealthy nodes. Sample output:
clustat msg_open: No route to host ( this can be ignored when rgmanager is not running, such as GFS-only clusters ) Member Status: Quorate ( this is the state of the cluster from this node's perspective ) Resource Group Manager not running; no service information available. ( same as above, rgmanager does not run on GFS-only clusters ) Member Name Status gfsnode01 Online, Local ( this is the localhost ) gfsnode02 Online gfsnode03 Online
Command - cman_tool
This is the primary tool for cluster operations.
Similar to above but has greater detail, however lacks view of all cluster nodes. Run from all nodes, all should report similar information (noted below). Again the one that differs is likely the unhealthy node.
cman_tool status Protocol version: 5.0.1 Config version: 5 ( version in /etc/cluster/cluster.conf, must match on all nodes) Cluster name: GFSCLUSTER01 ( must match on all nodes ) Cluster ID: 14 Cluster Member: Yes ( if 'No' then node is unhealthy or not in cluster ) Membership state: Cluster-Member ( if anything but 'Cluster-Member' then node is likely unhealthy ) Nodes: 3 ( should match # of nodes from clustat ) Expected_votes: 3 ( when healthy same as 'Total_votes ) Total_votes: 3 ( when healthy same as 'Nodes' ) Quorum: 2 ( equal to expected_votes /2+1, this is the number of healthy nodes required to resume cluster operations ) Active subsystems: 54 ( if removing node from cluster this must be zero ) Node name: gfsnode01 Node ID: 1 Node addresses: 172.16.0.5
Changing Cluster Membership -- Removing Nodes
This is the proper method for cleanly removing a node from the cluster. If at all possible use this method versus stopping cluster services on the node. This method ensures that quorum is recalculated and should not result in an interruption of cluster operations. In order for this to complete successfully all cluster resources (above in status) must be released, so for GFS-only clusters stop gfs service to release those filesystem resources first.
cman_tool leave remove
Changing Cluster Membership - Adding Nodes
This operation is done automatically when cman service is started. Likely only used if a node is temporarily removed from the cluster (using cman_tool leave remove above).
cman_tool join -w (the '-w' says to wait for the join to complete successfully )
Viewing Cluster Resources
Used primarily when attempting to remove a node - cman_tool leave remove command will fail if node is still using cluster resources. Use this command to find what is still in use.
cman_tool services
Modifying Votes
cman_tool expected -e {votes} cman_tool votes -v {votes}
GFS Status
Print out statistics about a filesystem. If -c is used, gfs_tool continues to run printing out the stats once a second.
gfs_tool -c counters {mountpoint}
Print GFS-specific 'df' information (do not be alarmed if inodes shows 100%, GFS can dynamically allocate inodes)
gfs_tool df {mountpoint}
Growing a GFS filesystem
You can only grow a GFS filesystem, never shrink. The only way to shrink a GFS filesystem is to destroy and recreate it (not fun). To grow a GFS filesystem:
gfs_grow -vv {mountpoint}
Also note: sometimes 'df' does not show new filesystem size on all nodes, to fix:
/etc/init.d/gfs reload