There are two masters in an Oushu Database cluster– a primary master and a standby master. Clients connect to the primary master and queries can be executed only on the primary master.
You deploy a backup or mirror of the master instance on a separate host machine from the primary master so that the cluster can tolerate a single host failure. A backup master or standby master serves as a warm standby if the primary master becomes non-operational. You create a standby master from the primary master while the primary is online.
The primary master continues to provide services to users while Oushu Database takes a transactional snapshot of the primary master instance. In addition to taking a transactional snapshot and deploying it to the standby master, Oushu Database also records changes to the primary master. After Oushu Database deploys the snapshot to the standby master, Oushu Database deploys the updates to synchronize the standby master with the primary master.
After the primary master and standby master are synchronized, Oushu Database keeps the standby master up to date using walsender and walreceiver, write-ahead log (WAL)-based replication processes. The walreceiver is a standby master process. The walsender process is a primary master process. The two processes use WAL-based streaming replication to keep the primary and standby masters synchronized.
Since the master does not house user data, only system catalog tables are synchronized between the primary and standby masters. When these tables are updated, changes are automatically copied to the standby master to keep it current with the primary.
If the primary master fails, the replication process stops, and an administrator can activate the standby master. Upon activation of the standby master, the replicated logs reconstruct the state of the primary master at the time of the last successfully committed transaction. The activated standby then functions as the Oushu Database master, accepting connections on the port specified when the standby master was initialized.
If the master fails, the administrator uses command line tools to instruct the standby master to take over as the new primary master.
- (Important) When Administrator modifies host-based authentication file pg_hba.conf in primary master, it’s demanded that updates are synchronized manually to the standby master so that the activated standby master can deals with authentication correctly after master fails.
- You can configure a virtual IP address for the master and standby so that client programs do not have to switch to a different network address when the ‘active’ master changes. If the master host fails, the virtual IP address can be swapped to the actual acting master.
- When you use JDBC to connect Oushu Database cluster, it’s suggested that both master address and standby address are added in the url unless virtual IP is provided, which enables client programs to connect Oushu Database cluster after standby master activated. For example:
String url = "jdbc:postgresql://master:port,standby:port/dbname";
Configuring Master Mirroring¶
You can configure a new Oushu Database system with a standby master during Oushu Database’s installation process, or you can add a standby master later. This topic assumes you are adding a standby master to an existing node in your Oushu Database cluster.
- Ensure the host machine for the standby master has been installed with Oushu Database and configured accordingly:
- The gpadmin system user has been created.
- Oushu Database binaries are installed.
- Oushu Database environment variables are set.
- SSH keys have been exchanged.
- Oushu Database Master Data directory has been created.
- Initialize the Oushu Database master standby:
log in to the Oushu Database master and re-initialize the Oushu Database master standby node:
$ ssh gpadmin@<oushu_master> oushu_master$ source /usr/local/hawq/greenplum_path.sh oushu_master$ hawq config -c hawq_standby_address_host -v <new_standby_master> oushu_master$ hawq init standby
where <new_standby_master> identifies the hostname of the standby master.
Failing Over to a Standby Master¶
If the primary master fails, log replication stops. You must explicitly activate the standby master in this circumstance. Upon activation of the standby master, Oushu Database reconstructs the state of the master at the time of the last successfully committed transaction.
Manually activate the standby master¶
- Ensure that a standby master host has been configured for the system.
- log in to the Oushu Database master and activate the Oushu Database standby master node:
oushu_master$ hawq activate standby
After you activate the standby master, it becomes the active or primary master for the Oushu Database cluster.
- (Optional, but recommended.) Configure a new standby master. See Configuring Master Mirroring .
- Check the status of the Oushu Database cluster by executing the following command on the master:
oushu_master$ hawq state
The newly-activated master’s status should be Active. If you configured a new standby master, its status is Passive. When a standby master is not configured, the command displays -No entries found, the message indicating that no standby master instance is configured.
- To check on the status of master mirroring, query the gp_master_mirroring system view:
oushu_master$ psql dbname -c 'SELECT * FROM gp_segment_configuration;'
- Finally, check the status of master mirroring by querying the gp_master_mirroring system view. See Checking on the State of Master Mirroring for instructions.
Automatically activate the standby master¶
The latest version of Oushu Database support automatic activation of standby master，you can enable this feature following these steps:
- Ensure that a standby master host has been configured for the system.
- Add these two properties into hawq-site.xml before starting cluster，or you can modify their values later：
<property> <name>enable_master_auto_ha</name> <value>on</value> </property> <property> <name>ha_zookeeper_quorum</name> <value>ZKHOST1:2181,ZKHOST2:2181,ZKHOST3:2181</value> </property>
|GUC name||default value||description|
|enable_master_auto_ha||off||whether to allow failing over automatically|
|ha_zookeeper_quorum||localhost:2181||locations of the zookeeper service in current cluster|
We strongly recommend you to set ha_zookeeper_quorum manually and correctly. In fact you can reference the value of ha.zookeeper.quorum in core-site.xml when install hdfs .
（WARNING：if enable_master_auto_ha is on，it’s demanded that ha_zookeeper_quorum exists hawq-site.xml. Otherwise you will see error information when Oushu Database start.）
- After starting Oushu Database, you can check whether these GUC values are set correcly：
oushu_master$ hawq config -s enable_master_auto_ha GUC : enable_master_auto_ha Value : on oushu_master$ hawq config -s ha_zookeeper_quorum GUC : ha_zookeeper_quorum Value : x.x.x.x:2181,y.y.y.y:2181,z.z.z.z:2181
- You can modify the GUC value in master, then restart the cluster to activate new settings：
oushu_master$ hawq config -c enable_master_auto_ha -v on oushu_master$ hawq config -c ha_zookeeper_quorum -v x.x.x.x:2181,y.y.y.y:2181,z.z.z.z:2181 oushu_master$ hawq restart cluster
- As long as zookeeper service is healthy, Oushu Database will automatically fail over to standby master in 3 minutes after master fails.
- (Optional, but recommended.) Configure a new standby master manually. See Configuring Master Mirroring for instructions.
Checking on the State of Master Mirroring¶
To check on the status of master mirroring, query the gp_master_mirroring system view:
oushu_master$ psql dbname -c 'SELECT * FROM gp_master_mirroring;'
If a standby master has not been set up for the cluster, you will see the following output:
summary_state | detail_state | log_time | error_message ----------------+--------------+----------+--------------- Not Configured | | | (1 row)
If the standby is configured and in sync with the master, you will see output similar to the following:
summary_state | detail_state | log_time | error_message ---------------+--------------+------------------------+--------------- Synchronized | | 2018-03-30 21:53:47+00 | (1 row)
Resynchronizing Standby with the Master¶
The standby can become out-of-date if the log synchronization process between the master and standby has stopped or has fallen behind. If this occurs, you will observe output similar to the following after querying the gp_master_mirroring view:
summary_state | detail_state | log_time | error_message ------------------+--------------+------------------------+--------------- Not Synchronized | | | (1 row)
To resynchronize the standby with the master:
oushu_master$ hawq init standby -n
This command stops and restarts the master and then synchronizes the standby.