Manual Installation

Installation Preparation

This section requires to prepare four hosts before the installation, the components will be installed on each node are shown in the following table:

Host oushum1 oushum2 oushus1 oushus2
IP 192.168.1.11 192.168.1.12 192.168.1.21 192.168.1.22
HAWQ Master primary standby no no
HAWQ Segment no no yes yes
NameNode standby active no no
DataNode no no yes yes
Zookeeper yes yes yes no

** Note: Each node needs to configure ‘/etc/hosts’ file, the host names in the table above need to be appended to the file ‘/etc/hosts’ on each server **

192.168.1.11 oushum1
192.168.1.12 oushum2
192.168.1.21 oushus1
192.168.1.22 oushus2

Configure YUM repo

First login to the oushum1, and then switch to the root user:

ssh oushum1
su - root

There are two ways to configure YUM repo, the first is to configure Oushu repo, install it through the Internet; the second is to download the Oushu installation package, and then configure local repo to install.

The first way:If it is networked, you can download Oushu Database Yum repo directly

if the system is Redhat/CentOS 7.0, 7.1, 7.2 and contain command avx, please configure the following YUM repo :

wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database.repo

If the system is Redhat/CentOS 7.0, 7.1, 7.2 but don’t contain command avx, please configure the following YUM repo:

wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-noavx.repo

If the system is Redhat/CentOS 7.3 and contain command avx, please configure the following YUM repo:

wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-cent73.repo

If the system is Redhat/CentOS 7.3 but don’t contain command avx, please configure the following YUM repo:

wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-cent73-noavx.repo

If the system is Redhat/CentOS 7.4 and contain command avx, please configure the following YUM repo:

wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-cent74.repo

If the system is Redhat/CentOS 7.4 but don’t contain command avx, please configure the following YUM repo:

wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-cent74-noavx.repo

The second way:If it is non-networking, you can download the installation package and configure local source

Download the installation package from cloud.oushu-tech.com and copy to the following directory(or you can copy it through a physical media):

mkdir -p /data1/localrepo
cd /data1/localrepo

Unpack the software package and install httpd:

tar xzf oushu-database-full-3.1.1.0-rhel7-x86_64.tar.gz
yum -y install httpd
systemctl start httpd
chown -R apache:root /data1/localrepo

Install the local repo:

/data1/localrepo/oushu-database-full-3.1.1.0/setup_repo.sh

Disabled selinux:

setenforce 0

Rebuild the YUM cache:

yum clean all
yum makecache

After completing the above steps successfully, please delete the compressed package to save disk space:

rm -f /data1/localrepo/oushu-database-full-3.1.1.0-rhel7-x86_64.tar.gz

Exchange public key and set password-free to login

Create file ‘hostfile’ which contains all the server names in the cluster.

cd /root
touch hostfile

Edit file ‘hostfile’ as follows (following are hostnames of each server in the cluster)

oushum1
oushum2
oushus1
oushus2

As we need to use HAWQ command “hawq ssh-exkeys” to simplify the password-free login settings and use “hawq ssh” to simplify the cluster operations, so first we should install HAWQ on oushum1:

yum install -y hawq
source /usr/local/hawq/greenplum_path.sh  #Set environment variable of HAWQ

Execute the following commands on each server of cluster:

sudo sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config & sudo service sshd restart

Execute on oushum1:

hawq ssh-exkeys -f hostfile  #Exchange public key, so that you do not need to enter password when using command 'ssh' (In this step, you need to enter the root user password)

Note:If the passwords of the root user in the cluster is inconsistent, modify the root passwords to keep them consistent. If you want to execute the command ‘ssh-exkeys’ by using non-root users, you need to make sure that each server in the cluster has this user, and the passwords are consistent.

How to exchange the password of root user:

sudo echo 'password' | sudo passwd  --stdin root

Please use the sudo authority to execute, enter the password that you want to change to in ‘ ‘.

Install YUM repo of other nodes

On oushum1, install the YUM repo of other nodes by using command ‘hawq scp’:

hawq scp -f hostfile /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d

Disable selinux

hawq ssh -f hostfile -e 'sed -i "s/^SELINUX\=enforcing/SELINUX\=disabled/g" /etc/selinux/config'
hawq ssh -f hostfile -e 'setenforce 0'

The command “hawq ssh -f hostfile” will execute the commands with ‘-e’ option on all nodes.

Close the firewall

If the firewall is already open, you need to close the firewall:

hawq ssh -f hostfile -e 'systemctl stop iptables'
hawq ssh -f hostfile -e 'systemctl disable iptables'
hawq ssh -f hostfile -e 'systemctl stop firewalld'
hawq ssh -f hostfile -e 'systemctl disable firewalld'

Install and configure NTP

hawq ssh -f hostfile -e 'yum install -y ntp'
hawq ssh -f hostfile -e 'systemctl enable ntpd' # Configure NTPD service
hawq ssh -f hostfile -e 'systemctl start ntpd' # Start NTPD service

Install Java

Using “hawq ssh” to install JDk of all nodes, configure JAVA_HOME in oushum1. In this installation, JAVA_HOME path is ‘/use/java/default’. If you have installed JDK already, you only need to let JDK’s installation directory (such as /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64/) soft connect to ‘/usr/java/default’.

hawq ssh -f hostfile -e 'yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel'
hawq ssh -f hostfile -e 'mkdir /usr/java'
hawq ssh -f hostfile -e 'ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.144-0.b01.el7_4.x86_64/ /usr/java/default'

About the data directory

In this installation, we assume that our machine has two data disks, mount in /data1 and /data2 directory respectively. If you have multiple disks, you need to do some appropriate changes in the creation of dierctory and configuration files,especially HDFS data directory and the location of HAWQ temporary file directory.

Install Zookeeper

Create a zkhostfile containing the server names which will install zookeeper:

touch zkhostfile

Add the following hostnames to zkhostfile:

oushum1
oushum2
oushus1

Install Zookeeper:

source /usr/local/hawq/greenplum_path.sh

hawq ssh -f zkhostfile -e 'yum install -y zookeeper'

Create zookeeper data directory

hawq ssh -f zkhostfile -e 'mkdir -p /data1/zookeeper/data'
hawq ssh -f zkhostfile -e 'chown -R zookeeper:zookeeper /data1/zookeeper'

Configure the file ‘myid’ on the three configured zookeeper nodes. Create file ‘myid’ on node oushum1, node oushum2 and node oushus1, and the contents are 1,2,and 3. And then add permissions for zookeeper users:

source /usr/local/hawq/greenplum_path.sh

echo 1 > myid
hawq scp -h oushum1 ./myid =:/data1/zookeeper/data

echo 2 > myid
hawq scp -h oushum2 ./myid =:/data1/zookeeper/data

echo 3 > myid
hawq scp -h oushus1 ./myid =:/data1/zookeeper/data


hawq ssh -f zkhostfile -e "sudo chown zookeeper:zookeeper /data1/zookeeper/data"

Copy ‘zoo.cfg’ to oushum1 file ‘/etc/zookeeper/conf/zoo.cfg’, add the following to the file ‘/etc/zookeeper/conf/zoo.cfg’:

server.1=oushum1:2888:3888
server.2=oushum2:2888:3888
server.3=oushus1:2888:3888

Note: The value ‘n’ in server.n means the content of myid file. Pay attention when configure zookeeper node:

Modify the dataDir in the file ‘/etc/zookeeper/conf/zoo.cfg’ on oushum1:

dataDir=/data1/zookeeper/data

Copy zoo.cfg to all zookeeper nodes:

cd ~
hawq scp -f zkhostfile /etc/zookeeper/conf/zoo.cfg =:/etc/zookeeper/conf/zoo.cfg

On oushum1, use “hawq ssh” to start all zookeeper nodes :

hawq ssh -f zkhostfile -e 'sudo -u zookeeper /usr/hdp/current/zookeeper-server/bin/zkServer.sh start'

After starting zookeeper, use the following command to check if zookeeper start successfully:

hawq ssh -f zkhostfile -e 'sudo -u zookeeper /usr/hdp/current/zookeeper-server/bin/zkServer.sh status'

Install HDFS

Since hadoop depends on a particular version of snappy, please uninstall snappy to ensure the smooth installation:

hawq ssh -f hostfile -e 'yum remove -y snappy'

HAWQ HDFS uses HA to install and configure. Install HADOOP executable file:

hawq ssh -f hostfile -e 'yum install -y hadoop hadoop-hdfs'

Configure the NameNode directory. There are two nodes to configure, oushum1 and oushum2. Similar to the previous hostfile, create nnhostfile, :

touch nnhostfile

Edit the nnhostfile, the content of it are hadoop namenodes’ hostname:

oushum1
oushum2

Create DataNode host file dnhostfile, similar to the previous nnhostfile:

touch dnhostfile

Edit the dnhostfile, the content of it are hadoop datanodes’ hostname:

oushus1
oushus2

Create NameNode directory:

hawq ssh -f nnhostfile -e 'mkdir -p /data1/hdfs/namenode'
hawq ssh -f nnhostfile -e 'chmod -R 755 /data1/hdfs'
hawq ssh -f nnhostfile -e 'chown -R hdfs:hadoop /data1/hdfs'

Create DataNode directory:

hawq ssh -f dnhostfile -e 'mkdir -p /data1/hdfs/datanode'
hawq ssh -f dnhostfile -e 'mkdir -p /data2/hdfs/datanode'
hawq ssh -f dnhostfile -e 'chmod -R 755 /data1/hdfs'
hawq ssh -f dnhostfile -e 'chmod -R 755 /data2/hdfs'
hawq ssh -f dnhostfile -e 'chown -R hdfs:hadoop /data1/hdfs'
hawq ssh -f dnhostfile -e 'chown -R hdfs:hadoop /data2/hdfs'

Copy the following files to ‘etc/file/conf’ on oushum1:

According to the configuration of each node, modify the hadoop configuration files. You can refer to the following, mainly ‘core-site.xml’, ‘hdfs-site.xml’, and ‘hadoop-env.xml’ in the directory ‘/etc/hadoop/conf’.

Modify the configuration file ‘/etc/hadoop/conf/core-site.xml’ in oushum1. You need to open HA and remove the comments as shown below:

<!-- HA
...
HA -->

Remove the following:

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hdfs-nn:9000</value>
</property>

Modify the following:

<configuration>
 <property>
    <name>fs.defaultFS</name>
    <value>hdfs://oushu/</value>
 </property>
 <property>
    <name>ha.zookeeper.quorum</name>
    <value>oushum1:2181,oushum2:2181,oushus1:2181</value>
 </property>
 ...
 <property>
    <name>ipc.server.listen.queue.size</name>
    <value>3300</value>
 </property>
 ...
<configuration>

Modify the configuration file ‘/etc/hadoop/conf/hdfs-site.xml’ in oushum1. You need to open HA and remove the comments as shown below:

<!-- HA
...
HA -->

Open HA, and the changes are as follows:

<configuration>
 <property>
    <name>dfs.name.dir</name>
    <value>file:/data1/hdfs/namenode</value>
    <final>true</final>
 </property>
 <property>
    <name>dfs.data.dir</name>
    <value>file:/data1/hdfs/datanode,file:/data2/hdfs/datanode</value>
    <final>true</final>
 </property>
 ...
 <property>
    <name>dfs.block.local-path-access.user</name>
    <value>gpadmin</value>
 </property>
 ...
 <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/lib/hadoop-hdfs/dn_socket</value>
 </property>
 ...
 <property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
    <description>
        If "true", access tokens are used as capabilities for accessing
        datanodes.
        If "false", no access tokens are checked on accessing datanodes.
    </description>
 </property>
 ...
 <property>
    <name>dfs.nameservices</name>
    <value>oushu</value>
 </property>
 <property>
    <name>dfs.ha.namenodes.oushu</name>
    <value>nn1,nn2</value>
 </property>
 <property>
    <name>dfs.namenode.rpc-address.oushu.nn1</name>
    <value>oushum2:9000</value>
 </property>
 <property>
    <name>dfs.namenode.http-address.oushu.nn1</name>
    <value>oushum2:50070</value>
 </property>
 <property>
    <name>dfs.namenode.rpc-address.oushu.nn2</name>
    <value>oushum1:9000</value>
 </property>
 <property>
    <name>dfs.namenode.http-address.oushu.nn2</name>
    <value>oushum1:50070</value>
 </property>
 <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://oushum1:8485;oushum2:8485;oushus1:8485/oushu</value>
 </property>
 <property>
    <name>dfs.ha.automatic-failover.enabled.oushu</name>
    <value>true</value>
 </property>
 <property>
    <name>dfs.client.failover.proxy.provider.oushu</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 </property>
 <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/data1/hdfs/journaldata</value>
 </property>
 ...

Modify the file ‘/etc/hadoop/conf/hadoop-env.sh’ in oushum1:

...
export JAVA_HOME="/usr/java/default"
...
export HADOOP_CONF_DIR="/etc/hadoop/conf"
...
export HADOOP_NAMENODE_OPTS="-Xmx6144m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70"
export HADOOP_DATANODE_OPTS="-Xmx2048m -Xss256k"
...
export HADOOP_LOG_DIR=/var/log/hadoop/$USER
...

Copy the configuration file ‘/etc/hadoop/conf’ in oushum1 to all nodes:

hawq scp -r -f hostfile /etc/hadoop/conf =:/etc/hadoop/

On node oushum1, format ZKFailoverController

sudo -u hdfs -E hdfs zkfc -formatZK

On all nodes where journal is configured, start journalnode. Create jhostfile, similar to the previous hostfile, the content of it are the journal nodes’ hostname:

oushum1
oushum2
oushus1

Execute the following command to start journalnode:

hawq ssh -f jhostfile -e 'sudo -u hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start journalnode'

Format and start namenode on ouhum1:

sudo -u hdfs -E hdfs namenode -format -clusterId ss
sudo -u hdfs -E /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenode

Do the same thing to another namenode on oushum2 and start namenode:

hawq ssh -h oushum2 -e 'sudo -u hdfs hdfs namenode -bootstrapStandby'
hawq ssh -h oushum2 -e 'sudo -u hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenode'

Start all the datanode by command ‘hawq ssh’:

hawq ssh -f dnhostfile -e 'sudo -u hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start datanode'

Start zkfc process on oushum2 by hawq ssh, make it an active NameNode:

hawq ssh -h oushum2 -e 'sudo -u hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start zkfc'

Start zkfc process on oushum1 by hawq ssh, make it a standby NameNode:

hawq ssh -h oushum1 -e 'sudo -u hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start zkfc'

Check if hdfs run successfully:

su - hdfs
hdfs dfsadmin -report
hdfs dfs -mkdir /testnode
hdfs dfs -put /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh /testnode/
hdfs dfs -ls -R /

You can also check : http://oushum1:50070/

Install HAWQ

Add followings to configuration file ‘/etc/sysctl.conf’ on oushum1:

kernel.shmmax = 1000000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 200000
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 10000 65535
net.core.netdev_max_backlog = 200000
net.netfilter.nf_conntrack_max = 524288
fs.nr_open = 3000000
kernel.threads-max = 798720
kernel.pid_max = 798720
# increase network
net.core.rmem_max=2097152
net.core.wmem_max=2097152

Copy configuration file ‘/etc/sysctl.conf’ on oushum1 to all nodes:

hawq scp -r -f hostfile /etc/sysctl.conf =:/etc/

Using command “hawq ssh” on oushum1 to make the system configuration file ‘/etc/sysctl.conf’ effective:

hawq ssh -f hostfile -e "sysctl -p"

Create file ‘/etc/security/limits.d/gpadmin.conf’ on oushum1:

* soft nofile 1048576
* hard nofile 1048576
* soft nproc 131072
* hard nproc 131072

Copy configuration file ‘/etc/security/limits.d/gpadmin.conf’ on oushum1 to all nodes:

hawq scp -r -f hostfile /etc/security/limits.d/gpadmin.conf =:/etc/security/limits.d

On oushum1, create directory ‘/hawq/default_filespace’ on Hadoop, and give it gpadmin permission:

sudo -u hdfs hdfs dfs -mkdir -p /hawq/default_filespace
sudo -u hdfs hdfs dfs -chown -R gpadmin /hawq

On oushum1, similar to hostfile, we create mhostfile to records the hostname of master and standby master.

touch mhostfile

The content of mhostfile:

oushum1
oushum2

On oushum1, similar to hostfile, we create shostfile to records all segments’ hostname of hawq.

touch shostfile

The content of shostfile:

oushus1
oushus2

On oushum1, create master’s metadata directory and temporary file directory on master and standby by using “hawq ssh”, and give it gpadmin permission:

#create master's metedata directory
hawq ssh -f mhostfile -e 'mkdir -p /data1/hawq/masterdd'

#create temporary file directory
hawq ssh -f mhostfile -e 'mkdir -p /data1/hawq/tmp'
hawq ssh -f mhostfile -e 'mkdir -p /data2/hawq/tmp'

hawq ssh -f mhostfile -e 'chown -R gpadmin:gpadmin /data1/hawq'
hawq ssh -f mhostfile -e 'chown -R gpadmin:gpadmin /data2/hawq'

On oushum1, create segment’s metadata directory and temporary file directory on all segment nodes by using “hawq ssh”, and give it gpadmin permission:

#create segment's metedata directory
hawq ssh -f shostfile -e 'mkdir -p /data1/hawq/segmentdd'

#create temporary file directory
hawq ssh -f shostfile -e 'mkdir -p /data1/hawq/tmp'
hawq ssh -f shostfile -e 'mkdir -p /data2/hawq/tmp'

hawq ssh -f shostfile -e 'chown -R gpadmin:gpadmin /data1/hawq'
hawq ssh -f shostfile -e 'chown -R gpadmin:gpadmin /data2/hawq'

On oushum1, switch to user gpadmin, All related configuration files of hawq need to use this permission:

su - gpadmin

On oushum1, modify the file ‘/usr/local/hawq/etc/slaves’, and write the hostname of hawq segment nodes’ to ‘/usr/local/hawq/etc/slaves’. In this installation,oushus1 and oushus2 should be written to it:

oushus1
oushus2

Modify the file ‘/usr/local/hawq/etc/hdfs-client.xml’, similar to hdfs, remove HA comments first:

<configuration>
 <property>
    <name>dfs.nameservices</name>
    <value>oushu</value>
 </property>
 <property>
    <name>dfs.ha.namenodes.oushu</name>
    <value>nn1,nn2</value>
 </property>
 <property>
    <name>dfs.namenode.rpc-address.oushu.nn1</name>
    <value>oushum2:9000</value>
 </property>
 <property>
    <name>dfs.namenode.rpc-address.oushu.nn2</name>
    <value>oushum1:9000</value>
 </property>
 <property>
    <name>dfs.namenode.http-address.oushu.nn1</name>
    <value>oushum2:50070</value>
 </property>
 <property>
    <name>dfs.namenode.http-address.oushu.nn2</name>
    <value>oushum1:50070</value>
 </property>
 ...
 <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/lib/hadoop-hdfs/dn_socket</value>
    <description>Optional.  This is a path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients.If the string "_PORT" is present in this path, it will be replaced by the TCP port of the DataNode.</description>
 </property>
 ...
</configuration>

On oushum1, modify file ‘/usr/local/hawq/etc/hawq-site.xml’. Note: The hawq_dfs_url is the vaule of dfs.nameservices which is configured in file ‘hdfs-site.xml’.

<configuration>
 <property>
    <name>hawq_master_address_host</name>
    <value>oushum1</value>
 </property>
 ...
 <property>
    <name>hawq_standby_address_host</name>
    <value>oushum2</value>
    <description>The host name of hawq standby master.</description>
 </property>
 ...
 <property>
    <name>hawq_dfs_url</name>
    <value>oushu/hawq/default_filespace</value>
    <description>URL for accessing HDFS.</description>
 </property>
 <property>
    <name>hawq_master_directory</name>
    <value>/data1/hawq/masterdd</value>
    <description>The directory of hawq master.</description>
 </property>
 <property>
    <name>hawq_segment_directory</name>
    <value>/data1/hawq/segmentdd</value>
    <description>The directory of hawq segment.</description>
 </property>
 <property>
    <name>hawq_master_temp_directory</name>
    <value>/data1/hawq/tmp,/data2/hawq/tmp</value>
    <description>The temporary directory reserved for hawq master. NOTE: please DONOT add " " between directories. </description>
 </property>
 <property>
    <name>hawq_segment_temp_directory</name>
    <value>/data1/hawq/tmp,/data2/hawq/tmp</value>
    <description>The temporary directory reserved for hawq segment. NOTE: please DONOT add " " between directories. </description>
 </property>
 ...
 <property>
    <name>hawq_rm_yarn_address</name>
    <value>oushum1:8032</value>
    <description>The address of YARN resource manager server.</description>
 </property>
 <property>
    <name>hawq_rm_yarn_scheduler_address</name>
    <value>oushum1:8030</value>
    <description>The address of YARN scheduler server.</description>
 </property>
 ...
 <property>
    <name>hawq_rm_yarn_app_name</name>
    <value>hawq</value>
    <description>The application name to register hawq resource manager in YARN.</description>
 </property>
 ...
 <property>
    <name>hawq_re_cgroup_hierarchy_name</name>
    <value>hawq</value>
    <description>The name of the hierarchy to accomodate CGroup directories/files for resource enforcement.For example, /sys/fs/cgroup/cpu/hawq for CPU sub-system.</description>
 </property>
 ...
</configuration>

On oushum1, switch to root user :

su - root

Copy all configuration files in ‘/usr/local/hawq/etc’ in oushum1 to all nodes:

source /usr/local/hawq/greenplum_path.sh

hawq scp -r -f hostfile /usr/local/hawq/etc =:/usr/local/hawq

On oushum1, switch to gpadmin user. Create hhostfile:

su - gpadmin
source /usr/local/hawq/greenplum_path.sh  #Set hawq envionment variable
touch hhostfile

The hhostfile records the host severs’ name of all hawq nodes.

oushum1
oushum2
oushus1
oushus2

Log in to each server via root user, change the password of gpadmin user:

sudo echo 'password' | sudo passwd  --stdin gpadmin

Exchange key for gpadmin user, and follow the prompts to enter gpadmin user’s password.

su - gpadmin
source /usr/local/hawq/greenplum_path.sh  #Set hawq envionment variable
hawq ssh-exkeys -f hhostfileord

On oushum1, init hawq cluster by using gpadmin user. If meets prompts “Continue with HAWQ init”, please enter Y:

hawq init cluster

Note:

#When you are init hawq cluster, you should ensure that directory masterdd and directory segmentdd under the directory '/data*/hawq/' are null, and also ensure that '/hawq/default_filespace' created on hadoop is an empty directory.
#In addition, if hawq init cluster fails, you can execute the following command to stop hawq cluster, find the reason and reinitialize it.

hawq stop cluster

#On hawq master node, using the following commands to clear all hawq directories and rebuild the hawq subdirectory.

hawq ssh -f hhostfile -e 'rm -fr /data1/hawq/masterdd/*'
hawq ssh -f hhostfile -e 'rm -fr /data1/hawq/segmentdd/*'

#On the HDFS namenode, use the following command to clear directory /hawq/default_filespace. If /hawq/default_filespace has the user's data already, pay attention to backup data to avoid losses:

hdfs dfs -rm -f -r /hawq/default_filespace/*

You also need to check if HDFS parameter configuration is correct, and you’d better use gpadmin to check. If the parameter configuration is not correct, although sometimes HDFS can start normally, but there will have error under high load.

su - gpadmin

source /usr/local/hawq/greenplum_path.sh

hawq check -f hostfile --hadoop /usr/hdp/current/hadoop-client/ --hdfs-ha

Check if HAWQ can run successfully:

1
2
3
4
5
6
7
8
su - gpadmin
source /usr/local/hawq/greenplum_path.sh
psql -d postgres
select * from gp_segment_configuration;  #Make sure that all nodes are up

create table t(i int);
insert into t select generate_series(1,1000);
select count(*) from t;

Install PostGIS (optional)

Install PostGIS on each node.

Install PostGIS yum repo:

wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/yumrepo/release/postgis/centos7/1.2.3/oushu-postgis.repo

On oushum1, install all yum repo of the other nodes by using command ‘hawq scp’:

source /usr/local/hawq/greenplum_path.sh


hawq scp -f hostfile /etc/yum.repos.d/oushu-postgis.repo =:/etc/yum.repos.d

Install PostGIS:

hawq ssh -f hostfile -e 'yum install -y oushu-postgis'

Register the PostGIS component

Register the PostGIS component on needed database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
su - gpadmin

source /usr/local/hawq/greenplum_path.sh

psql -d postgres -f $GPHOME/share/postgresql/contrib/postgis-2.0/postgis.sql

psql -d postgres -f $GPHOME/share/postgresql/contrib/postgis-2.0/postgis_comments.sql

psql -d postgres -f $GPHOME/share/postgresql/contrib/postgis-2.0/spatial_ref_sys.sql

psql -d postgres -f $GPHOME/share/postgresql/contrib/postgis-2.0/legacy.sql

psql -d postgres -f $GPHOME/share/postgresql/contrib/postgis-2.0/legacy_minimal.sql

Once the registration is complete, you can use the PostGIS component to analyze the data in the database.

Installation confirmation

PostGIS is installed under public schema by default:

1
2
3
4
5
6
7
8
9
postgres=# \d


                     List of relations
  Schema |       Name        | Type  |  Owner  |   Storage
---------+-------------------+-------+---------+-------------
  public | geography_columns | view  | gpadmin | none
  public | geometry_columns  | view  | gpadmin | none
  public | spatial_ref_sys   | table | gpadmin | append only

For examples:

First create a regular table to store information about the cities (cities). “the_goem” stores two-dimensional space coordinates of each city:

1
CREATE TABLE cities ( id int4, name varchar(50) ,the_geom geometry);

Insert city data:

1
2
3
INSERT INTO cities (id, the_geom, name) VALUES (1,ST_GeomFromText('POINT(-0.1257 51.508)',4326),'London, England');
INSERT INTO cities (id, the_geom, name) VALUES (2,ST_GeomFromText('POINT(-81.233 42.983)',4326),'London, Ontario');
INSERT INTO cities (id, the_geom, name) VALUES (3,ST_GeomFromText('POINT(27.91162491 -33.01529)',4326),'East London,SA');

Standard SQL operations all can be used for PostGIS forms, but coordinates are unreadable hexadecimal numbers.

1
2
3
4
5
6
7
8
SELECT * FROM cities;

 id |      name       |                      the_geom
----+-----------------+------------------------------------------------------
  1 | London, England | 0101000020E6100000BBB88D06F016C0BF1B2FDD2406C14940
  2 | London, Ontario | 0101000020E6100000F4FDD478E94E54C0E7FBA9F1D27D4540
  3 | East London,SA  | 0101000020E610000040AB064060E93B4059FAD005F58140C0
(3 rows)

You can use ST_X(the_geom) and ST_Y(the_geom) to display the coordinates of one dimension

1
2
3
4
5
6
7
8
SELECT id, ST_AsText(the_geom), ST_AsEwkt(the_geom), ST_X(the_geom), ST_Y(the_geom) FROM cities;

 id |          st_astext           |               st_asewkt                |    st_x     |   st_y
----+------------------------------+----------------------------------------+-------------+------------
  2 | POINT(-81.233 42.983)        | SRID=4326;POINT(-81.233 42.983)        |     -81.233 |    42.983
  3 | POINT(27.91162491 -33.01529) | SRID=4326;POINT(27.91162491 -33.01529) | 27.91162491 | -33.01529
  1 | POINT(-0.1257 51.508)        | SRID=4326;POINT(-0.1257 51.508)        |     -0.1257 |    51.508
(3 rows)

Configure HAWQ for Secure HDFS (Optional)

** Attention: HDFS pluggable storage doesn’t work with kerberos **

Prerequisites

Setup Secure HDFS

Setup MIT Kerveros KDC

Synchronize system time between KDC and HAWQ cluster

Procedure

Login KDC server, distribute /etc/krb5.conf Kerberos configuration file to each HAWQ node

scp /etc/krb5.conf <hawq-node>:/etc/krb5.conf

Create a Kerberos service principal for postgres user

kadmin.local -q "addprinc -randkey postgres@REALM"
kadmin.local -q "xst -k /etc/security/keytabs/hawq.keytab postgres@REALM"

distribute keytab file to HAWQ Master

scp /etc/security/keytabs/hawq.keytab <hawq-master>:/etc/security/keytabs/hawq.keytab

Login HAWQ Master, change file owner and permission of keytab

chown gpadmin:gpadmin /etc/security/keytabs/hawq.keytab
chmod 400 /etc/security/keytabs/hawq.keytab

Modify $GPHOME/etc/hawq-site.xml, add following contents:

<property>
   <name>enable_secure_filesystem</name>
   <value>ON</value>
</property>
<property>
   <name>krb_server_keyfile</name>
   <value>/etc/security/keytabs/hawq.keytab</value>
</property>

Modify $GPHOME/etc/hdfs-client.xml, uncomment KDC part:

<!-- KDC
...
KDC -->

distribute to each node of HAWQ cluster

Change owner of HAWQ HDFS data directory to postgres

sudo -u hdfs hadoop fs -chown -R postgres:gpadmin /<dfs_url>