Quantcast
Channel: AIX for System Administrators
Viewing all articles
Browse latest Browse all 67

Article 1

$
0
0
RSCT (Reliable Scalable Cluster Technology)

RSCT  (as its name says) is a sort of Cluster Technology. It comes with AIX by default (no additional installation is needed) and it consists of several low-level components (daemons, subsystems). These components create a basic cluster environment (with nodes and heartbeat between these nodes etc.) which is monitored by RSCT. If a node crashes an event is generated and RSCT informs the RSCT-aware client. (PowerHA, or more precisely the cluster manager (clstrmgrES) is itself an RSCT-aware client). Historically RSCT was a separate product, but starting with AIX 5.1 it is shipped with the operating system. On AIX 7.2 the actual RSCT fileset version is 3.2. It is possible to check/remove the RSCT filesets (lslpp ...), and as a comparison to CAA, CAA is built into AIX so inherently, that there are no separate CAA filesets available.

The key point here is that RSCT provides services, such as cluster monitoring, which is used by PowerHA and PowerHA provides "high availability services" to applications. For example, responding to an unexpected event, it is necessary to know when it occurs. This is the job of the RSCT to monitor for certain failure. Beside PowerHA, RSCT-aware clients are GPFS, SSP or the HMC too.

RSCT’s role in a PowerHA cluster is to provide:
- Failure detection and diagnosis for topology components (nodes, networks, and network adapters)
- Notification to the cluster manager of events
- Coordination of the recovery actions (fallovers, fallbacks and dealing with individual NIC failures by moving or swapping IP addresses)

We can use the ctversion command to finnd out which version of RSCT is running on a particular AIX (or lslpp):
# /opt/rsct/install/bin/ctversion


==================================

RSCT components



The main RSCT components are:
Resource:  A resource is the fundamental concept of the RSCT architecture; it is an instance of a physical or logical entity. Examples of resources include lv01 on node A, Ethernet device en0 on node B, and IP address 9.117.7.21. A set of resources that have similar characteristics is called a resource class.

Resource Monitoring and Control (RMC): This is the main component in RSCT. It creates events based on the messages recived from the Resource Managers, then client programs can use these event notifications to trigger recovery actions. It also coordinates between the various RSCT components.

RSCT resource managers: Resource Managers are software layers between a resource (for example a filesystem) and RMC. They are making the actual commands for each resource and based on the configuration they decide how the system should react to specific events. For example there are File System Resource Manager, Host Resource Manager, Audit Log Resource Manager, Event Response Resource Manger ...

RSCT  security services: This provides the security infrastructure that enables RSCT components to authenticate. (These days only RMC and the Resource Managers are using the RSCT security services)

Group Services:  This subsystem is responsible for coordinating and monitoring changes across all cluster nodes and ensures all of them finished properly. In a PowerHA setup, from Group Services point of view the "application running on multiple nodes" is the cluster manager (clstrmgrES). Group Services reports failures to the Cluster Manager as soon the Topology Services informs it. (On PowerHA 7.1 CAA informs the Group Services). Then the Cluster Manager makes cluster-wide coordinated responses to the failure. (The PowerHA cluster manager is an RSCT client and it registers itself with both the RSCT RMC Manager and the RSCT Group Services components. After an event has been reported to the PowerHA Cluster Manager, it responds to this event with recovery commands and event scripts. These scripts are coordinated via the RSCT group services component.)

Topology Services: This provides node and network monitoring and failure detection (heartbeats). It is responsible for building heartbeat rings for the purpose of detecting and reporting importsnt informations to the RSCT Group Services, which in turn reports them to the Cluster Manger.  In the heartbeat ring, each Topology Services daemon sends a heartbeat message to one of its neighbors and expects to receive a heartbeat from another. In this system of heartbeat messages, each member monitors one of its neighbors. If the neighbor stops responding, the member that is monitoring it will send a message to the "group leader". Topology Services is also responsible for the transmission of any RSCT-related messages between cluster nodes. After PowerHA 7.1.0, the RSCT topology service is deactivated and all its functions are performed by CAA topology services.

==================================

RSCT domains

RSCT can provide 2 types of "clusters", which are called in RSCT terminology: domains. Depending on the status of the nodes (if all of them are on equal level or if there is a special control node between them) these 2 RSCT domains exist: management domain and peer domain.

Management Domain: (set of nodes that is configured for manageability or monitoring)
An RSCT management domain is a set of nodes that can be managed and monitored from one of the nodes, which is designated as the management control point (MCP). Except the MCP all other nodes are considered to be managed nodes. Topology Services and Group Services are not used in a management domain.

Peer Domain: (set of nodes that is configured for high availability)
An RSCT peer domain is a set of nodes that have a knowledge of each other, and they share resources between each other. On each node within the peer domain, RMC depends on Topology Services, Group Services, and cluster security services.  If PowerHA V7 is installed, Topology Services are not used, and CAA is used instead. 

In order to understand how various RSCT components are used in a cluster, we need to keep in mind that nodes of a cluster can be configured for manageability, high availability or both.

Combination of management and peer domains
We can have a combination of management domains and peer domains. This example shows one Hardware Management Console (HMC) that is managing three LPARS. The HMC and Node A, Node B and NodeC are creating a Management Domain. Additionally on Node B an d on Node C PowerHA is installed, so these 2 nodes are making a peer domain too. In a Power Systems environment, the HMC is always the management server (MCP) in the RSCT management domain. LPARs are automatically configured as managed nodes.



==================================

RSCT and CAA


Cluster Aware AIX (CAA) introduces clustering capabilities to AIX (setup of a cluster, detecting the state of nodes and interfaces). When RSCT operates on nodes in a CAA cluster, a peer domain is created that is equivalent to the CAA cluster, and can be used to manage the cluster by using peer domain commands. 

Only one CAA cluster can be defined on a set of nodes. Therefore, if a CAA cluster is defined then the peer domain that represents it is the only peer domain which can exist there. If no CAA cluster is configured, then existing and new peer domains can also be used. 

A CAA cluster and the equivalent RSCT peer domain operate hand in hand such that a change made to the CAA cluster by using CAA commands, is reflected automatically in the corresponding peer domain; similarly the existing peer domain commands result in equivalent changes to the CAA cluster. So, for example, when you create a CAA cluster by using mkcluster command, the equivalent peer domain also gets created, the same way if we used the mkrpdomain RSCT command. Similarly node add and delete operations that use either peer domain or cluster commands are applied to both the CAA cluster and the peer domain.

Starting with RSCT version 3.1.0.0, the Group Services subsystem can operate in a Cluster Aware AIX (CAA) environment. In this environment, Group Services rely on the CAA to provide node and adapter liveness information and node-to-node communication, thus removing its dependency on RSCT Topology Services. Instead of connecting to the Topology Services daemon, it gets information directly from the low-level cluster services in the CAA environment.

RSCT version 3.1.2.0, or later, can be installed on the nodes and can coexist with prior RSCT releases. Because CAA delivers fundamental node and interface liveness information, the Topology Services subsystem is not active in a peer domain based on CAA. 

===========================

PowerHA, RSCT and CAA

When a PowerHA cluster is configured and synchronized, 3 different layers will work together in a coordinated way: PowerHA, RSCT and CAA.  We need to configure PowerHA only, and it will take care about the other 2 layers (RSCT and CAA). In traditional situations, there is no need to use CAA or RSCT commands at all, because they are all managed by PowerHA.  

To check whether the services of each layer are up, we can use different commands, like clmgr, lsrpdomain, and lscluster.



Cheking if PowerHA is running:
# clmgr -a state query cluster
STATE="STABLE“

Checking whether RSCT is running:
# lsrpdomain
Name              OpState RSCTActiveVersion MixedVersions TSPort GSPort
CL1_N1_cluster    Online  3.1.5.0           Yes           12347  12348

To check whether CAA is running:
# lscluster -m | egrep "Node name|State of node"
 Node name: powerha-c2n1
 State of node: UP
 Node name: powerha-c2n2
 State of node: UP NODE_LOCAL

If we stop PowerHA, then clmgr command will show "OFFLINE", but RSCT and CAA commands will still show that their services are running. CAA and RSCT are stopped and started together. By default, CAA and RSCT are automatically started as part of an operating system restart (if the system is configured by PowerHA). There are situations when we need to stop all three cluster components, for example, when we must change the RSCT or CAA software. 

For example, to stop all cluster components, use: clmgr off cluster STOP_CAA=yes

Then the status of each service will be:
# clmgr -a state query cluster
STATE="OFFLINE“

# lsrpdomain
Name              OpState RSCTActiveVersion MixedVersions TSPort GSPort
CL1_N1_cluster    Offline 3.1.5.0           Yes           12347  12348

# lscluster -m
lscluster: Cluster services are not active on this node because it has been
stopped.

The information when CAA is stopped manually is preserved across reboots. So, if you want to start PowerHA on a node where CAA and RSCT were stopped manually, you must use the START_CAA argument.

Starting with AIX 7.1 TL4 or AIX 7.2, we can use the clctrl command to stop or start CAA and RSCT.
clctrl -stop: stop CAA, RSCT and it will stop PowerHA too
clctrl -start: start CAA and RSCT. It will not start PowerHA, to start it use the clmgr or smitty

===========================


Viewing all articles
Browse latest Browse all 67

Trending Articles