1. Environment Preparation

Select two servers with identical configurations as load machines, and install the same OS: Redhat Enterprise Edition 6.5 on both nodes.

Disable the firewall on each node. Use the following commands to turn off the firewall:

# chkconfig iptables off
# chkconfig ip6tables off

After restarting, you can verify that the firewall is disabled with:

# chkconfig --list iptables
iptables        0:off  1:off  2:off  3:off  4:off  5:off  6:off

# chkconfig --list ip6tables
ip6tables       0:off  1:off  2:off  3:off  4:off  5:off  6:off

Disable SELINUX on each node. Modify the /etc/sysconfig/selinux file:

# vi /etc/sysconfig/selinux

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled

# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

After restarting, confirm SELINUX is disabled with:

# sestatus
SELinux status:                 disabled

2. Allocate Physical Partitions

Allocate a physical partition on each load machine, ensuring both partitions have the same capacity (e.g., /dev/sdb1).

3. Configure Hostnames

Configure the hostnames on both machines (IPs: 132.121.12.173 and 132.121.12.174):

# On 132.121.12.173
[root@132.121.12.173 ~]# vim /etc/hosts
132.121.12.173  OCSJZ13
132.121.12.174  OCSJZ14

# On 132.121.12.174
[root@132.121.12.174 ~]# vim /etc/hosts
132.121.12.173  OCSJZ13
132.121.12.174  OCSJZ14

4. Set Up SSH Trust

Establish SSH trust between the root users of the two load machines:

# On OCSJZ13
[root@OCSJZ13 ~]# ssh-keygen -t rsa (press Enter for all prompts)
[root@OCSJZ13 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@OCSJZ14 (enter the root password of OCSJZ14)

# On OCSJZ14
[root@OCSJZ14 ~]# ssh-keygen -t rsa (press Enter for all prompts)
[root@OCSJZ14 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@OCSJZ13 (enter the root password of OCSJZ13)

5. Set Up SSH Trust for Gbase User

Establish SSH trust between the gbase user on the load machines and all cluster nodes. This needs to be done for both load machines but only one-way (load machines can access all cluster nodes without a password).

6. Prepare Installation Packages

On both load machines:

Extract pkgs.tar.bz2 and install all RPM packages.
Extract toolkit.tar.bz2, copy all programs from the sbin directory to /usr/sbin, and copy all files from the service directory to /etc/init.d.

7. Configure DRBD

Configure DRBD on both load machines with the same configuration file /etc/drbd.d/global_common.conf:

[root@OCSJZ13 ~]# vim /etc/drbd.d/global_common.conf
global {
   usage-count yes;
}

common {
   handlers {
       pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
       pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
       local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
   }

   net {
       protocol C;
   }

   syncer {
       rate 100M;
   }
}

Create the resource file /etc/drbd.d/dispdrbd.res on both load machines:

[root@OCSJZ13 ~]# vim /etc/drbd.d/dispdrbd.res
resource dispdrbd {
   on OCSJZ13 {
       device    /dev/drbd0;
       disk      /dev/sdb1;
       address   192.168.0.173:7789;
       meta-disk internal;
   }

   on OCSJZ14 {
       device    /dev/drbd0;
       disk      /dev/sdb1;
       address   192.168.0.174:7789;
       meta-disk internal;
   }
}

Create the DRBD resources on both load machines:

[root@OCSJZ13 ~]# dd if=/dev/zero bs=1M count=1 of=/dev/sdb1
[root@OCSJZ13 ~]# drbdadm create-md dispdrbd

# Start DRBD service on both machines
[root@OCSJZ13 ~]# service drbd start

# If synchronization does not start, execute on either machine
[root@OCSJZ14 ~]# drbdadm invalidate dispdrbd

After synchronization (when cat /proc/drbd shows ds:Uptodate/Uptodate), set the primary node (assuming 132.121.12.173 is the primary):

[root@OCSJZ13 ~]# drbdsetup /dev/drbd0 primary

# If setting primary fails
[root@OCSJZ13 ~]# drbdadm -- --overwrite-data-of-peer primary all

# Format the shared device
[root@OCSJZ13 ~]# mkfs.ext4 /dev/drbd0

8. Configure NFS

On both load machines:

[root@OCSJZ13 ~]# mkdir /nfsshare
[root@OCSJZ13 ~]# vim /etc/exports
/nfsshare *(rw,sync,no_root_squash)

9. Configure Corosync

On the primary node:

[root@OCSJZ13 ~]# corosync-keygen
[root@OCSJZ13 ~]# chmod 0400 /etc/corosync/authkey
[root@OCSJZ13 ~]# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
[root@OCSJZ13 ~]# vim /etc/corosync/corosync.conf

compatibility: whitetank

totem {
   version: 2
   secauth: off
   threads: 0
   interface {
        member {
            memberaddr: 132.121.12.173
        }
        member {
            memberaddr: 132.121.12.174
        }
        ringnumber: 0
        bindnetaddr: 132.121.12.1
        mcastport: 5422
        ttl: 1
    }
    transport: udpu 
}

logging {
   fileline: off
   to_stderr: no
   to_logfile: yes
   to_syslog: yes
   logfile: /var/log/cluster/corosync.log
   debug: off
   timestamp: on
}

service {
   ver: 0
   name: pacemaker
   use_mgmtd: yes
}

Copy the configuration to the secondary node:

[root@OCSJZ13 ~]# scp /etc/corosync/authkey /etc/corosync/corosync.conf root@OCSJZ14:/etc/corosync

# On the secondary node
[root@OCSJZ14 ~]# chmod 0400 /etc/corosync/authkey
[root@OCSJZ14 ~]# vim /etc/corosync/corosync.conf

10. Start Services

Start the corosync service on both load machines:

[root@OCSJZ13 ~]# service corosync start

11. Configure Pacemaker

On one of the load machines, configure CRM:

[root@OCSJZ13 ~]# crm configure

Disable STONITH:

crm(live)configure# property stonith-enabled=false

Modify the cluster state check to ignore quorum not being met:

crm(live)configure# property no-quorum-policy=ignore

Specify the default stickiness value for resources:

crm(live)configure# rsc_defaults resource-stickiness=100
crm(live)configure# commit

Configure DRBD (primary/secondary mode):

crm(live)configure# primitive dispdrbd ocf:linbit:drbd params drbd_resource=dispdrbd op monitor role=Master interval=50s timeout=30s op monitor role=Slave interval=60s timeout=30s
crm(live)configure# master ms_dispdrbd dispdrbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

Configure the mount point:

crm(live)configure# primitive webfs ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/nfsshare" fstype="ext4"
crm(live)configure# colocation webfs_on_ms_dispdrbd inf: webfs ms_dispdrbd:Master
crm(live)configure# order webfs_after_ms_dispdrbd inf: ms_dispdrbd:promote webfs:start

Configure the virtual IP (this IP is used to provide external services and must not conflict with other IPs in the system; the user-provided IP for this setup is 132.121.12.175):

crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip="132.121.12.175" cidr_netmask="24" op monitor interval="5s"
crm(live)configure# colocation vip_on_ms_dispdrbd inf: vip ms_dispdrbd:Master

Configure NFS:

crm(live)configure# primitive rpcbind lsb:rpcbind op monitor interval="10s"
crm(live)configure# colocation rpcbind_on_ms_dispdrbd inf: rpcbind ms_dispdrbd:Master
crm(live)configure# primitive nfsshare lsb:nfs op monitor interval="30s"
crm(live)configure# colocation nfsshare_on_ms_dispdrbd inf: nfsshare ms_dispdrbd:Master
crm(live)configure# order nfsshare_after_rpcbind mandatory: rpcbind nfsshare:start
crm(live)configure# order nfsshare_after_vip mandatory: vip nfsshare:start
crm(live)configure# order nfsshare_after_webfs mandatory: webfs nfsshare:start

Check the status of the Pacemaker service using the crm status command:

[root@OCSJZ13 ~]# crm status
============
Last updated: Thu Feb  7 21:20:35 2013
Last change: Thu Feb  7 20:36:54 2013 via cibadmin on OCSJZ13
Stack: openais
Current DC: OCSJZ13 - partition with quorum
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
2 Nodes configured, 2 expected votes
5 Resources configured.
============

Online: [ OCSJZ13 OCSJZ14 ]

Master/Slave Set: ms_dispdrbd [dispdrbd]
    Masters: [ OCSJZ13 ]
    Slaves: [ OCSJZ14 ]
webfs  (ocf::heartbeat:Filesystem):    Started OCSJZ13
vip    (ocf::heartbeat:IPaddr):        Started OCSJZ13
nfsshare       (lsb:nfs):      Started OCSJZ13

12. Configure Related Configuration Files (All Configuration Files Must Be Consistent on Both Load Machines)

Configure `gciplist.conf`:

[root@OCSJZ13 ~]# gciplist HOSTNAME >/etc/gciplist.conf

(Note: HOSTNAME refers to the IP of any node in the cluster. This configuration needs to be reset each time the cluster topology changes—adding or removing nodes.)

Configure `dispmon.conf`:

There are two modes for loading monitoring services: background service mode and cron job mode. The configurations for the two modes are different.

1) Background Service Mode

[root@OCSJZ13 ~]# vim /etc/dispmon.conf
mode=daemon
exec=fetch_load.sh

(Note: fetch_load.sh refers to the filename of the user's load control program execution file, which needs to be deployed in the /usr/sbin directory.)

2) Cron Job Mode

[root@OCSJZ13 ~]# vim /etc/dispmon.conf
mode=crontab

Configure `dispcron.conf` (Only Needed for Cron Job Mode):

This file must be placed in the /etc directory and is configured in the same format as a crontab configuration file.

Note: The dispmon.conf and dispcron.conf configuration files must be set while the dispmon service on both load machines is stopped.

13. Complete the Configuration of Related Services

[root@OCSJZ13 ~]# crm configure

Configure the `dispserver` Service:

crm(live)configure# primitive dispserver lsb:dispsvr op monitor interval="10s"
crm(live)configure# colocation dispsvr_on_ms_dispdrbd inf: dispserver ms_dispdrbd:Master
crm(live)configure# order dispsvr_after_nfsshare mandatory: nfsshare dispserver:start

Configure the `monitor` Service:

crm(live)configure# primitive dispmon lsb:dispmon op monitor interval="10s"
crm(live)configure# colocation dispmon_on_ms_dispdrbd inf: dispmon ms_dispdrbd:Master
crm(live)configure# order dispmon_after_dispserver mandatory: dispserver dispmon:start
crm(live)configure# commit
crm(live)configure# quit

Data Load High Availability Case - Log Management System Configuration Manual