본문 바로가기
HPC/RHEL7

SAP HANA system replication in pacemaker cluster

by mirhenge 2019. 1. 21.

https://access.redhat.com/articles/3004101#testing-manual-move-of-saphana-resource-to-another-node


SAP HANA system replication in pacemaker cluster

Updated  - 


2. SAP HANA System Replication

The following example shows how to set up system replication between 2 nodes running SAP HANA.

Configuration used in the example:

SID:                   RH2
Instance Number:       02
node1 FQDN:            node1.example.com
node2 FQDN:            node2.example.com
node1 HANA site name:  DC1
node2 HANA site name:  DC2
SAP HANA 'SYSTEM' user password: <HANA_SYSTEM_PASSWORD>
SAP HANA administrative user:    rh2adm

Ensure that both systems can resolve the FQDN of both systems without issues. To ensure that FQDNs can be resolved even without DNS you can place them into /etc/hosts like in the example below.

# /etc/hosts
192.168.0.11 node1.example.com node1
192.168.0.12 node2.example.com node2

For the system replication to work, the SAP HANA log_mode variable must be set to normal. This can be verified as HANA system user using the command below on both nodes.

[rh2adm]# hdbsql -u system -p <HANA_SYSTEM_PASSWORD> -i 02 "select value from "SYS"."M_INIFILE_CONTENTS" where key='log_mode'"
VALUE "normal"
1 row selected

Note that later configuration of primary and secondary node is used only during setup. The roles (primary/secondary) may change during cluster operation based on cluster configuration.

A lot of the configuration steps are performed from the SAP HANA administrative user on the system whose name was selected during installation. In examples we will use rh2adm as we use SID RH2. To become the SAP HANA administrative user you can use the command below.

[root]# sudo -i -u rh2adm
[rh2adm]#

2.1. Configure HANA primary node

SAP HANA system replication will only work after initial backup has been performed. The following command will create an initial backup in /tmp/foo directory. Please note that the size of the backup depends on the database size and may take some time to complete. The directory to which the backup will be placed must by writeable by the SAP HANA administrative user.

a) On single container systems following command can be used for backup:

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "BACKUP DATA USING FILE ('/tmp/foo')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)

b) On multiple container systems (MDC) SYSTEMDB and all tenant databases needs to be backed up:

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> -d SYSTEMDB "BACKUP DATA USING FILE ('/tmp/foo')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> -d SYSTEMDB "BACKUP DATA FOR RH2 USING FILE ('/tmp/foo-RH2')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)

After the initial backup, initialize the replication using the command below.

[rh2adm]# hdbnsutil -sr_enable --name=DC1
checking for active nameserver ...
nameserver is active, proceeding ...
successfully enabled system as system replication source site
done.

Verify that initialization is showing current node as 'primary' and that SAP HANA is running on it.

[rh2adm]# hdbnsutil -sr_state
checking for active or inactive nameserver ...
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: primary
site id: 1
site name: DC1
Host Mappings:

2.2. Configure HANA secondary node

Secondary node needs to be registered to, now running, primary node. SAP HANA on the secondary node must be shut down before using the command bellow.

[rh2adm]# HDB stop

(SAP HANA2.0 only) Copy the SAP HANA system PKI SSFS_RH2.KEY and SSFS_RH2.DAT files from primary node to secondary node.

[rh2adm]# scp root@node1:/usr/sap/RH2/SYS/global/security/rsecssfs/key/SSFS_RH2.KEY /usr/sap/RH2/SYS/global/security/rsecssfs/key/SSFS_RH2.KEY
[rh2adm]# scp root@node1:/usr/sap/RH2/SYS/global/security/rsecssfs/data/SSFS_RH2.DAT /usr/sap/RH2/SYS/global/security/rsecssfs/data/SSFS_RH2.DAT

To register secondary node use the command below.

[rh2adm]# hdbnsutil -sr_register --remoteHost=node1 --remoteInstance=02 --replicationMode=syncmem --name=DC2
adding site ...
checking for inactive nameserver ...
nameserver node2:30201 not responding.
collecting information ...
updating local ini files ...
done.

Start SAP HANA on the secondary node.

[rh2adm]# HDB start

Verify that the secondary node is running and that 'mode' is syncmem. Output should look similar to the output below.

[rh2adm]# hdbnsutil -sr_state
checking for active or inactive nameserver ...

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: syncmem
site id: 2
site name: DC2
active primary site: 1

Host Mappings:
~~~~~~~~~~~~~~
node2 -> [DC1] node1
node2 -> [DC2] node2

2.3. Testing SAP HANA System Replication

To manually test the SAP HANA System Replication setup you can follow the procedure described in following SAP documents:

2.4. Checking SAP HANA System Replication state

To check the current state of SAP HANA System Replication you can execute the following command as the SAP HANA administrative user on current primary SAP HANA node.

On single_container system:

[rh2adm]# python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py

| Host  | Port  | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary     | Replication | Replication | Replication    |
|       |       |              |           |         |           | Host      | Port      | Site ID   | Site Name | Active Status | Mode        | Status      | Status Details |
| ----- | ----- | ------------ | --------- | ------- | --------- | --------- | --------- | --------- | --------- | ------------- | ----------- | ----------- | -------------- |
| node1 | 30201 | nameserver   |         1 |       1 | DC1       | node2     |     30201 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| node1 | 30207 | xsengine     |         2 |       1 | DC1       | node2     |     30207 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| node1 | 30203 | indexserver  |         3 |       1 | DC1       | node2     |     30203 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |

status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: DC1

On multiple_containers system (MDC):

[rh2adm]# python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py
| Database | Host  | Port  | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary     | Replication | Replication | Replication    |
|          |       |       |              |           |         |           | Host      | Port      | Site ID   | Site Name | Active Status | Mode        | Status      | Status Details |
| -------- | ----- | ----- | ------------ | --------- | ------- | --------- | ----------| --------- | --------- | --------- | ------------- | ----------- | ----------- | -------------- |
| SYSTEMDB | node1 | 30201 | nameserver   |         1 |       1 | DC1       | node2     |     30201 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| RH2      | node1 | 30207 | xsengine     |         2 |       1 | DC1       | node2     |     30207 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| RH2      | node1 | 30203 | indexserver  |         3 |       1 | DC1       | node2     |     30203 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |

status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: DC1

3. Configuring monitoring account in SAP HANA for cluster resource agents (SAP HANA 1.0 SPS12 and earlier)

Starting with SAP HANA 2.0 SPS0 monitoring account is not needed
A technical user with CATALOG READ and MONITOR ADMIN privileges must exist in SAP HANA for the resource agents to be able to run queries on the system replication status. The example below shows how to create such a user, assign him the correct permissions and disable password expiration for this user.

monitoring user username: rhelhasync
monitoring user password: <MONITORING_USER_PASSWORD>

3.1. Creating monitoring user

When SAP HANA System replication is active then only the primary system is able to access the database. Accessing the secondary system will fail.

On the primary system run the following commands to create the monitoring user.

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "create user rhelhasync password \"<MONITORING_USER_PASSWORD>\""
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "grant CATALOG READ to rhelhasync"
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "grant MONITOR ADMIN to rhelhasync"
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "ALTER USER rhelhasync DISABLE PASSWORD LIFETIME"

3.2. Store monitoring user credentials on all nodes

The SAP HANA userkey allows the "root" user on OS level to access SAP HANA via monitoring user without asking for password. This is needed by resource agents so they can run queries on HANA System Replication status.

[root]# /usr/sap/RH2/HDB02/exe/hdbuserstore SET SAPHANARH2SR localhost:30215 rhelhasync "<MONITORING_USER_PASSWORD>"

To verify that the userkey has been created correctly in root's userstore, you can run hdbuserstore list command on each node and check if the monitoring account is present in the output as shown below:

[root]# /usr/sap/RH2/HDB02/exe/hdbuserstore list

DATA FILE      :  /root/.hdb/node1/SSFS_HDB.DAT
KEY FILE       :  /root/.hdb/node1/SSFS_HDB.KEY

KEY SAPHANARH2SR
  ENV : localhost:30215
  USER: rhelhasync

Please also verify that it is possible to run hdbsql commands as root using the SAPHANASR userkey without being prompted for a password by running the following command on the primary node of the SAP HANA SR setup:

[root]# /usr/sap/RH2/HDB02/exe/hdbsql -U SAPHANARH2SR -i 02 "select distinct REPLICATION_STATUS from SYS.M_SERVICE_REPLICATION"
REPLICATION_STATUS
"ACTIVE"
1 row selected

If you get an error message about issues with the password or if you are prompted for a password please verify with hdbsql command or HANA Studio that the password for the user created with the hdbsql commands above is not configured 'to be changed on first login' or that the password has not expired. You can use the command below.
(Note: be sure to use the name of monitoring user in capital letters)

[root]# /usr/sap/RH2/HDB02/exe/hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "select * from sys.users where USER_NAME='RHELHASYNC'"

USER_NAME,USER_ID,USER_MODE,EXTERNAL_IDENTITY,CREATOR,CREATE_TIME,VALID_FROM,VALID_UNTIL,LAST_SUCCESSFUL_CONNECT,LAST_INVALID_CONNECT_ATTEMPT,INVALID_CONNECT_A
TTEMPTS,ADMIN_GIVEN_PASSWORD,LAST_PASSWORD_CHANGE_TIME,PASSWORD_CHANGE_NEEDED,IS_PASSWORD_LIFETIME_CHECK_ENABLED,USER_DEACTIVATED,DEACTIVATION_TIME,IS_PASSWORD
_ENABLED,IS_KERBEROS_ENABLED,IS_SAML_ENABLED,IS_X509_ENABLED,IS_SAP_LOGON_TICKET_ENABLED,IS_SAP_ASSERTION_TICKET_ENABLED,IS_RESTRICTED,IS_CLIENT_CONNECT_ENABLE
D,HAS_REMOTE_USERS,PASSWORD_CHANGE_TIME
"RHELHASYNC",156529,"LOCAL",?,"SYSTEM","2017-05-12 15:10:49.971000000","2017-05-12 15:10:49.971000000",?,"2017-05-12 15:21:12.117000000",?,0,"TRUE","2017-05-12
 15:10:49.971000000","FALSE","FALSE","FALSE",?,"TRUE","FALSE","FALSE","FALSE","FALSE","FALSE","FALSE","TRUE","FALSE",?
1 row selected

4. Configuring SAP HANA in a pacemaker cluster

Please refer to Reference Document for the High Availability Add-On for Red Hat Enterprise Linux 7 documentation to first set up a pacemaker cluster. Note that the cluster must conform to article Support Policies for RHEL High Availability Clusters - General Requirements for Fencing/STONITH.

This guide will assume that following things are working properly:

  • Pacemaker cluster is configured according to documentation and has proper and working fencing
  • SAP HANA startup on boot is disabled on all cluster nodes as the start and stop will be managed by the cluster
  • SAP HANA system replication and takeover using tools from SAP are working properly between cluster nodes
  • SAP HANA contains monitoring account that can be used by the cluster from both cluster nodes
  • Both nodes are subscribed to 'High-availability' and 'RHEL for SAP HANA' (RHEL 6,RHEL 7) channels

4.1. Configure general cluster properties

When testing SAP HANA you may wish to limit the number of failovers by setting up stickiness and migration threshold using commands below. These settings are optional and so are not required for proper setup of SAP HANA in pacemaker. Commands should be executed only on one node but they will take effect in the whole cluster.

[root]# pcs resource defaults resource-stickiness=1000
[root]# pcs resource defaults migration-threshold=5000

To remove above options after testing you can use commands below.

[root]# pcs resource defaults resource-stickiness=
[root]# pcs resource defaults migration-threshold=

In previous versions of this guide you might find the recommendation to set up no-quorum-policy to ignore which is currently NOT supported. In the default configuration there is no need to change the no-quorum-policy property of cluster. If you would like to achieve behaviour provided by this option please check for more information in the article Can I configure pacemaker to continue to manage resources after a loss of quorum in RHEL 6 or 7?.

4.2. Create cloned SAPHanaTopology resource

SAPHanaTopology resource is gathering status and configuration of SAP HANA System Replication on each node. SAPHanaTopology requires following attributes to be configured.

Attribute NameDescription
SIDSAP System Identifier (SID) of SAP HANA installation. Must be same for all nodes.
InstanceNumber2-digit SAP Instance identifier.

Below is an example command to create the SAPHanaTopology cloned resource.

[root]# pcs resource create SAPHanaTopology_RH2_02 SAPHanaTopology SID=RH2 InstanceNumber=02 --clone clone-max=2 clone-node-max=1 interleave=true

Resulting resource should look like the following.

[root]# pcs resource show SAPHanaTopology_RH2_02-clone

 Clone: SAPHanaTopology_RH2_02-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: SAPHanaTopology_RH2_02 (class=ocf provider=heartbeat type=SAPHanaTopology)
   Attributes: SID=RH2 InstanceNumber=02
   Operations: start interval=0s timeout=180 (SAPHanaTopology_RH2_02-start-interval-0s)
               stop interval=0s timeout=60 (SAPHanaTopology_RH2_02-stop-interval-0s)
               monitor interval=60 timeout=60 (SAPHanaTopology_RH2_02-monitor-interval-60)

Once the resource is started you will see the collected information stored in the form of node attributes that can be viewed with the command crm_mon -A1. Below is an example of what attributes can look like when only SAPHanaTopology is started.

[root]# crm_mon -A1
...
Node Attributes:
* Node node1:
    + hana_rh2_remoteHost               : node2
    + hana_rh2_roles                    : 1:P:master1::worker:
    + hana_rh2_site                     : DC1
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node1
* Node node2:
    + hana_rh2_remoteHost               : node1
    + hana_rh2_roles                    : 1:S:master1::worker:
    + hana_rh2_site                     : DC2
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node2
...

4.3. Create Master/Slave SAPHana resource

SAPHana resource is responsible for starting, stopping and relocating the SAP HANA database. This resource must be run as a Master/Slave cluster resource. The resource has the following attributes.

Attribute NameRequired?Default valueDescription
SIDyesnoneSAP System Identifier (SID) of SAP HANA installation. Must be same for all nodes.
InstanceNumberyesnone2-digit SAP Instance identifier.
PREFER_SITE_TAKEOVERnoyesShould cluster prefer to switchover to slave instance instead of restarting master locally? ("no": Do prefer restart locally; "yes": Do prefer takeover to remote site)
AUTOMATED_REGISTERnofalseShould the former SAP HANA primary be registered as secondary after takeover and DUPLICATE_PRIMARY_TIMEOUT? ("false": no, manual intervention will be needed; "true": yes, the former primary will be registered by resource agent as secondary)
DUPLICATE_PRIMARY_TIMEOUTno7200Time difference (in seconds) needed between primary time stamps, if a dual-primary situation occurs. If the time difference is less than the time gap, then the cluster holds one or both instances in a "WAITING" status. This is to give an admin a chance to react on a failover. A failed former primary will be registered after the time difference is passed. After this registration to the new primary all data will be overwritten by the system replication.

Below is an example command to create the SAPHana Master/Slave resource.

[root]# pcs resource create SAPHana_RH2_02 SAPHana SID=RH2 InstanceNumber=02 PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false --master meta notify=true clone-max=2 clone-node-max=1 interleave=true

When running pcs-0.9.158-6.el7, or newer, use the command below to avoid deprecation warning. More information about the change is explained in What are differences between master and --master option in pcs resource create command?.

[root]# pcs resource create SAPHana_RH2_02 SAPHana SID=RH2 InstanceNumber=02 PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false master notify=true clone-max=2 clone-node-max=1 interleave=true

Resulting resource should look like the following.

[root]# pcs resource show SAPHana_RH2_02-master

 Master: SAPHana_RH2_02-master
  Meta Attrs: notify=true clone-max=2 clone-node-max=1 interleave=true
  Resource: SAPHana_RH2_02 (class=ocf provider=heartbeat type=SAPHana)
   Attributes: SID=RH2 InstanceNumber=02 PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
   Operations: start interval=0s timeout=180 (SAPHana_RH2_02-start-interval-0s)
               stop interval=0s timeout=240 (SAPHana_RH2_02-stop-interval-0s)
               monitor interval=120 timeout=60 (SAPHana_RH2_02-monitor-interval-120)
               monitor interval=121 role=Slave timeout=60 (SAPHana_RH2_02-monitor-interval-121)
               monitor interval=119 role=Master timeout=60 (SAPHana_RH2_02-monitor-interval-119)
               promote interval=0s timeout=320 (SAPHana_RH2_02-promote-interval-0s)
               demote interval=0s timeout=320 (SAPHana_RH2_02-demote-interval-0s)

Once the resource is started it will add additional node attributes describing the current state of SAP HANA databases on nodes as seen below.

[root]# crm_mon -A1
...
Node Attributes:
* Node node1:
    + hana_rh2_clone_state              : PROMOTED
    + hana_rh2_op_mode                  : delta_datashipping
    + hana_rh2_remoteHost               : node2
    + hana_rh2_roles                    : 4:S:master1:master:worker:master
    + hana_rh2_site                     : DC1
    + hana_rh2_sync_state               : PRIM
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node1
    + lpa_rh2_lpt                       : 1495204085
    + master-hana                       : 100
* Node node2:
    + hana_rh2_clone_state              : DEMOTED
    + hana_rh2_remoteHost               : node1
    + hana_rh2_roles                    : 4:P:master1:master:worker:master
    + hana_rh2_site                     : DC2
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_sync_state               : SOK
    + hana_rh2_vhost                    : node2
    + lpa_rh2_lpt                       : 30
    + master-hana                       : 150
...

4.4 Create Virtual IP address resource

Cluster will contain Virtual IP address in order to reach the Master instance of SAP HANA. Below is example command to create IPaddr2 resource with IP 192.168.0.15.

[root]# pcs resource create vip_RH2_02 IPaddr2 ip="192.168.0.15"

Resulting resource should look like one below.

[root]# pcs resource show vip_RH2_02

 Resource: vip_RH2_02 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.0.15
  Operations: start interval=0s timeout=20s (vip_RH2_02-start-interval-0s)
              stop interval=0s timeout=20s (vip_RH2_02-stop-interval-0s)
              monitor interval=10s timeout=20s (vip_RH2_02-monitor-interval-10s)

4.5. Create constraints

For correct operation we need to ensure that SAPHanaTopology resources are started before starting the SAPHana resources and also that the virtual IP address is present on the node where the Master resource of SAPHana is running. To achieve this, the following 2 constraints need to be created.

4.5.1. constraint - start `SAPHanaTopology` before `SAPHana`

Example command below will create the constraint that mandates the start order of these resources. There are 2 things worth mentioning here:

  • symmetrical=false attribute defines that we care only about the start of resources and they don't need to be stopped in reverse order.
  • Both resources (SAPHana and SAPHanaTopology) have the attribute interleave=true that allows parallel start of these resources on nodes. This permits that despite of ordering we will not wait for all nodes to start SAPHanaTopology but we can start the SAPHana resource on any of nodes as soon as SAPHanaTopology is running there.

Command for creating the constraint:

[root]# pcs constraint order SAPHanaTopology_RH2_02-clone then SAPHana_RH2_02-master symmetrical=false

The resulting constraint should look like the one in the example below.

[root]# pcs constraint
...
Ordering Constraints:
  start SAPHanaTopology_RH2_02-clone then start SAPHana_RH2_02-master (kind:Mandatory) (non-symmetrical)
...

4.5.2. constraint - colocate the `IPaddr2` resource with Master of `SAPHana` resource

Below is an example command that will colocate the IPaddr2 resource with SAPHana resource that was promoted as Master.

[root]# pcs constraint colocation add vip_RH2_02 with master SAPHana_RH2_02-master 2000

Note that the constraint is using a score of 2000 instead of the default INFINITY. This allows the IPaddr2 resource to be taken down by the cluster in case there is no Master promoted in the SAPHana resource so it is still possible to use this address with tools like SAP Management Console or SAP LVM that can use this address to query the status information about the SAP Instance.

The resulting constraint should look like one in the example below.

[root]# pcs constraint
...
Colocation Constraints:
  vip_RH2_02 with SAPHana_RH2_02-master (score:2000) (rsc-role:Started) (with-rsc-role:Master)
...

4.6. Testing the manual move of SAPHana resource to another node (SAP Hana takeover by cluster)

To test out the move of the SAPHana resource from one node to another, use the command below. Note that the option --master should NOT be used when running the below command due to the way how the SAPHana resource works internally.

[root]# pcs resource move SAPHana_RH2_02-master

IMPORTANT: After each pcs resource move command invocation the cluster creates location constraints to achieve the move of the resource. These constraints must be removed in order to allow automatic failover in the future. To remove them you can use the command pcs resource clear SAPHana_RH2_02-master.


'HPC > RHEL7' 카테고리의 다른 글

Ubuntu 14 server 설치 후 Desktop 설치  (0) 2019.10.07
SAP HANA resource 넘어가는 순서  (0) 2019.01.21
iptable nat outgoing  (0) 2018.07.01
Redhat 용어 정리  (0) 2018.05.18
멜트다운,스텍터  (0) 2018.05.18