Quick Start

This section demoes how to install hawq cluster in a Centos/Redhat 7.x single-node physical server via RPM.In below section, we suppose the hostname of the cluster is oushu (the hostname can be obtained by the command:hostname, please replace all “oushu” appear in this text with the actual hostname in your server). This deployment takes you about 30 minutes.

Installation Preparation

First log in the server via root user. Check whether the avx command exists in this server.

cat /proc/cpuinfo | grep avx

Install oushu yum repo:

#If the system is Redhat/CentOS 7.0, 7.1, 7.2 and contain command avx, please configure the following YUM repo:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database.repo

#If the system is Redhat/CentOS 7.0, 7.1, 7.2 but don't contain command avx, please configure the following YUM repo:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-noavx.repo

#If the system is Redhat/CentOS 7.3 and contain command avx, please configure the following YUM repo:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-cent73.repo

#If the system is Redhat/CentOS 7.3 but don't contain command avx, please configure the following YUM repo:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-cent73-noavx.repo

#If the system is Redhat/CentOS 7.4 and contain command avx, please configure the following YUM repo:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-cent74.repo

#If the system is Redhat/CentOS 7.4 but don't contain command avx, please configure the following YUM repo:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/3.1.2.0/oushu-database-cent74-noavx.repo

Disable selinux:

sed -i "s/^SELINUX\=enforcing/SELINUX\=disabled/g" /etc/selinux/config
setenforce 0

Turn off the firewall:

systemctl stop iptables
systemctl disable iptables
systemctl stop firewalld
systemctl disable firewalld

Install Java:

yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
mkdir -p /usr/java
//check the Java version of your machine
ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-2.b16.el7_4.x86_64  /usr/java/default

Install HDFS

Install HDFS and create the needed directory, here we assume that our machine has two data disks, mount in /data1 and /data2 directory respectively. If you have multiple disks, you need to do some appropriate changes in creation of dierctory and configuration files,especially HDFS data directory and the location of HAWQ temporary file directory.

#Since hadoop depends on a particular version of snappy, please uninstall snappy to ensure the smooth installation
yum -y remove snappy

#Install HDFS RPM, and RPM will create HDFS users automatically
yum install -y hadoop hadoop-hdfs

#Create a NameNode directory in /data1
mkdir -p /data1/hdfs/namenode

#Create a DataNode directory on each disk and change permissions
mkdir -p /data1/hdfs/datanode
chmod -R 755 /data1/hdfs
chown -R hdfs:hadoop /data1/hdfs

mkdir -p /data2/hdfs/datanode
chmod -R 755 /data2/hdfs
chown -R hdfs:hadoop /data2/hdfs

Copy the following files to ‘/etc/hadoop/conf’ (if appears overlay prompts, enter y)

Edit the fs.defaultFS attribute in file ‘/etc/hadoop/conf/core-site.xml’. Note: Please make sure that the value of fs.defaultFS in core-site.xml has been replaced from oushu in to hostname before doing format.Other systems use this url to access HDFS:

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://oushu:9000</value>
</property>

Edit file ‘/etc/hadoop/conf/hadoop-env.sh’, add the following parameters which configure Java Home, Hadoop configuration files, log files and JVM options.The value of -Xmx in NameNode needs to be modified appropriately depending on the size of stored HDFS data, . The larger the amount of HDFS data, the greater the value of -Xmx.

export JAVA_HOME="/usr/java/default"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export HADOOP_NAMENODE_OPTS="-Xmx6144m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70"
export HADOOP_DATANODE_OPTS="-Xmx2048m -Xss256k"
export HADOOP_LOG_DIR=/var/log/hadoop/$USER

Since the file ‘/etc/hadoop/conf/hdfs-site.xml’ uses /data1 and /data2 disks by default, if you have multiple disks, you need to change dfs.data.dir attribute to make HDFS using all the disks.

<property>
    <name>dfs.data.dir</name>
    <value>/data1/hdfs/datanode,/data2/hdfs/datanode</value>
    <final>true</final>
</property>

Format NameNode and start NameNode and DataNode:

Note: If you are asked whether to format, please enter y to verify.

sudo -u hdfs -E hdfs namenode -format
sudo -u hdfs -E /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenode
sudo -u hdfs -E /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start datanode

HDFS logs are in file ‘/var/log/hadoop/hdfs/’. If there has any error during configure, you can view the log and correct.

Check if HDFS is running successfully:

su - hdfs
hdfs dfsadmin -report
hdfs dfs -mkdir /testnode
hdfs dfs -put /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh /testnode/
hdfs dfs -ls -R /

You can also check HDFS web: http://oushu:50070/

Install HAWQ

Install HAWQ RPM, and HAWQ RPM will create gpadmin user automatically

yum install -y hawq

Add the following to configuration file ‘/etc/sysctl.conf’

kernel.shmmax = 1000000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 200000
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 10000 65535
net.core.netdev_max_backlog = 200000
net.netfilter.nf_conntrack_max = 524288
fs.nr_open = 3000000
kernel.threads-max = 798720
kernel.pid_max = 798720
# increase network
net.core.rmem_max=2097152
net.core.wmem_max=2097152

Make the system configuration effective:

sysctl -p

Create HAWQ local metadata directory and temporary file directory:

#Create the HAWQ local metadata directory, and the following two directories are used by master and segment
mkdir -p /data1/hawq/masterdd
mkdir -p /data1/hawq/segmentdd

#Create temporary file directory for HAWQ, each disk needs to create temporary file directory so that HAWQ can use all disks.
mkdir -p /data1/hawq/tmp
chmod -R 755 /data1/hawq
chown -R gpadmin:gpadmin /data1/hawq

mkdir -p /data2/hawq/tmp
chmod -R 755 /data2/hawq
chown -R gpadmin:gpadmin /data2/hawq

Create the HAWQ data directory on HDFS:

sudo -u hdfs hdfs dfs -mkdir -p /hawq_default
sudo -u hdfs hdfs dfs -chown -R gpadmin /hawq_default

Edit the file ‘/usr/local/hawq/etc/slaves’, remove the localhost in the file and add oushu to it. The file includes the location of all slave nodes, one node per line.The rivised file is:

oushu

Edit the file ‘/usr/local/hawq/etc/hawq-site.xml’. Since the file ‘/usr/local/hawq/etc/hawq-site.xml’ uses /data1 and /data2 disks by default, so if you have multiple disks, you need to change the vaule of hawq_master_temp_directory and hawq_segment_temp_directory to ensure using all disks.

<property>
    <name>hawq_master_address_host</name>
    <value>oushu</value>
    <description>The host name of hawq master.</description>
</property>

<property>
    <name>hawq_dfs_url</name>
    <value>oushu:9000/hawq_default</value>
    <description>URL for accessing HDFS.</description>
</property>

<property>
    <name>hawq_master_directory</name>
    <value>/data1/hawq/masterdd</value>
    <description>The directory of hawq master.</description>
</property>

<property>
    <name>hawq_segment_directory</name>
    <value>/data1/hawq/segmentdd</value>
    <description>The directory of hawq segment.</description>
</property>

<property>
    <name>hawq_master_temp_directory</name>
    <value>/data1/hawq/tmp,/data2/hawq/tmp</value>
    <description>The temporary directory reserved for hawq master. Note: please DONOT add " " between directries.</description>
</property>

<property>
    <name>hawq_segment_temp_directory</name>
    <value>/data1/hawq/tmp,/data2/hawq/tmp</value>
    <description>The temporary directory reserved for hawq segment. Note: please DONOT add " " between directories.</description>
</property>

Log in the server via gpadmin user:

su - gpadmin

Set the password-free ssh:

source /usr/local/hawq/greenplum_path.sh
hawq ssh-exkeys -h oushu

Init hawq. If asked whether to init, enter y to vertify.

hawq init cluster

The management logs are in ‘/home/gpadmin/hawqAdminLogs/’. The logs of HAWQ master are in ‘/data1/hawq/masterdd/pg_log’ and the segment’s are in ‘/data1/hawq/segmentdd/pg_log’. If there has any error during the configuration process, you can check the logs and correct.

Check if HAWQ can work properly:

1
2
3
4
5
6
7
8
su - gpadmin
source /usr/local/hawq/greenplum_path.sh
psql -d postgres
select * from gp_segment_configuration;  //Make sure that all nodes are up

create table t(i int);
insert into t select generate_series(1,1000);
select count(*) from t;

Try new executor

This section shows how the new executor works through TPCH lineitem.

Create external lineitem table to generate TPCH lineitem data,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
CREATE EXTERNAL WEB TABLE E_LINEITEM ( L_ORDERKEY    INT8 ,
                                       L_PARTKEY     INTEGER ,
                                       L_SUPPKEY     INTEGER ,
                                       L_LINENUMBER  INTEGER ,
                                       L_QUANTITY    FLOAT ,
                                       L_EXTENDEDPRICE  FLOAT ,
                                       L_DISCOUNT    FLOAT ,
                                       L_TAX         FLOAT ,
                                       L_RETURNFLAG  VARCHAR(1) ,
                                       L_LINESTATUS  VARCHAR(1) ,
                                       L_SHIPDATE    TEXT ,
                                       L_COMMITDATE  TEXT ,
                                       L_RECEIPTDATE TEXT ,
                                       L_SHIPINSTRUCT CHAR(25) ,
                                       L_SHIPMODE     VARCHAR(10) ,
                                       L_COMMENT      VARCHAR(44) )
                                       EXECUTE 'bash -c "$GPHOME/bin/dbgen -b $GPHOME/bin/dists.dss -T L -s 1 -N 6 -n $((GP_SEGMENT_ID + 1))"'
                                       on 6 format 'text' (delimiter '|');

Create orc table.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
CREATE TABLE lineitem
       (  L_ORDERKEY    INT8,
          L_PARTKEY     INTEGER,
          L_SUPPKEY     INTEGER,
          L_LINENUMBER  INTEGER,
          L_QUANTITY    FLOAT,
          L_EXTENDEDPRICE  FLOAT,
          L_DISCOUNT    FLOAT,
          L_TAX         FLOAT,
          L_RETURNFLAG  TEXT,
          L_LINESTATUS  TEXT,
          L_SHIPDATE    TEXT,
          L_COMMITDATE  TEXT,
          L_RECEIPTDATE TEXT,
          L_SHIPINSTRUCT TEXT,
          L_SHIPMODE     TEXT,
          L_COMMENT      TEXT)
   FORMAT 'orc';

Insert the data:

1
INSERT INTO lineitem SELECT * FROM e_lineitem;

Try new executor(the executor can only support ORC format, if there has unsupported function, it will fallback to old executor automatically)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
-----get the number of rows in the table------
postgres=# set new_executor = on;
SET
postgres=# SELECT COUNT(*) FROM lineitem;
count
---------
6001215
(1 row)
Time: 17.006 ms

postgres=# set new_executor = off;
SET
postgres=# SELECT COUNT(*) FROM lineitem;
count
---------
6001215
(1 row)
Time: 213.248 ms

-----TPCH inquiry 1 ------
postgres=# set new_executor = on;
SET
postgres=#  SELECT
        l_returnflag,
        l_linestatus,
        sum(l_quantity)::bigint as sum_qty,
        sum(l_extendedprice)::bigint as sum_base_price,
        sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge,
        avg(l_quantity)::bigint as avg_qty,
        avg(l_extendedprice)::bigint as avg_price,
        avg(l_discount)::bigint as avg_disc,
        count(*) as count_order
FROM
        lineitem
WHERE
        l_shipdate <= '1998-08-20'
GROUP BY
        l_returnflag,
        l_linestatus;

l_returnflag | l_linestatus | sum_qty  | sum_base_price | sum_disc_price |  sum_charge  | avg_qty | avg_price | avg_disc | count_order
--------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+-------------
R            | F            | 37719753 |    56568041381 |    53741292685 |  55889619120 |      26 |     38251 |        0 |     1478870
N            | F            |   991417 |     1487504710 |     1413082168 |   1469649223 |      26 |     38284 |        0 |       38854
A            | F            | 37734107 |    56586554401 |    53758257135 |  55909065223 |      26 |     38273 |        0 |     1478493
N            | O            | 73808911 |   110700990251 |   105167436999 | 109377979031 |      26 |     38248 |        0 |     2894278
(4 rows)

Time: 234.376 ms

postgres=# set new_executor = off;
SET
postgres=#  SELECT
        l_returnflag,
        l_linestatus,
        sum(l_quantity)::bigint as sum_qty,
        sum(l_extendedprice)::bigint as sum_base_price,
        sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge,
        avg(l_quantity)::bigint as avg_qty,
        avg(l_extendedprice)::bigint as avg_price,
        avg(l_discount)::bigint as avg_disc,
        count(*) as count_order
FROM
        lineitem
WHERE
        l_shipdate <= '1998-08-20'
GROUP BY
        l_returnflag,
        l_linestatus;

l_returnflag | l_linestatus | sum_qty  | sum_base_price | sum_disc_price |  sum_charge  | avg_qty | avg_price | avg_disc | count_order
--------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+-------------
R            | F            | 37719753 |    56568041381 |    53741292685 |  55889619120 |      26 |     38251 |        0 |     1478870
N            | F            |   991417 |     1487504710 |     1413082168 |   1469649223 |      26 |     38284 |        0 |       38854
A            | F            | 37734107 |    56586554401 |    53758257135 |  55909065223 |      26 |     38273 |        0 |     1478493
N            | O            | 73808911 |   110700990251 |   105167436999 | 109377979031 |      26 |     38248 |        0 |     2894278
(4 rows)

Time: 2341.147 ms