快速入手

本节将通过RPM安装物理机版本的一个Centos/Redhat 7.x单节点集群。假设我们安装的服务器hostname为oushu(可以通过命令:hostname 直接获取,请将文中所有出现的oushu替换为实际的hostname)。此次部署大约需要您30分钟时间。

安装准备

首先使用root登录。 查看有无avx指令:

cat /proc/cpuinfo | grep avx

安装oushu yum源:

#Redhat/CentOS 7.0, 7.1, 7.2系统并且包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/latest/oushu-database.repo

#Redhat/CentOS 7.0, 7.1, 7.2系统但不包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/latest/oushu-database-noavx.repo

#Redhat/CentOS 7.3系统并且包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/latest/oushu-database-cent73.repo

#Redhat/CentOS 7.3系统但不包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/latest/oushu-database-cent73-noavx.repo

#Redhat/CentOS 7.4系统并且包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/latest/oushu-database-cent74.repo

#Redhat/CentOS 7.4系统但不包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ http://yum.oushu-tech.com/oushurepo/oushudatabaserepo/centos7/latest/oushu-database-cent74-noavx.repo

禁用selinux:

sed -i "s/^SELINUX\=enforcing/SELINUX\=disabled/g" /etc/selinux/config
setenforce 0

关闭防火墙:

systemctl stop iptables
systemctl disable iptables
systemctl stop firewalld
systemctl disable firewalld

安装Java:

yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
mkdir -p /usr/java
//注意查看本机的java版本
ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-2.b16.el7_4.x86_64  /usr/java/default

安装HDFS

安装HDFS并且创建其使用的目录,这里我们假设我们的机器上有两个数据盘,分别mount在/data1和/data2目录,如果您有多块盘,下面的目录创建以及配置文件需要做相应的更改。尤其对HDFS的数据目录以及OushuDB的临时文件目录位置。

#由于hadoop依赖于特定版本的snappy,请先卸载snappy确保安装的顺利进行
yum -y remove snappy

#安装HDFS RPM,RPM安装会自动创建hdfs用户
yum install -y hadoop hadoop-hdfs

#在/data1上创建NameNode目录
mkdir -p /data1/hdfs/namenode

#在每块盘上创建DataNode数据目录,并更改权限
mkdir -p /data1/hdfs/datanode
chmod -R 755 /data1/hdfs
chown -R hdfs:hadoop /data1/hdfs

mkdir -p /data2/hdfs/datanode
chmod -R 755 /data2/hdfs
chown -R hdfs:hadoop /data2/hdfs

复制下列文件到/etc/hadoop/conf/中(遇到覆盖提示,请输入y,表示确认覆盖)

编辑/etc/hadoop/conf/core-site.xml文件中的fs.defaultFS属性,其他系统通过这个url来访问HDFS,注:在做format之前,请确认已经将core-site.xml中fs.defaultFS的值由oushu替换成hostname。:

<property>
   <name>fs.defaultFS</name>
   <value>hdfs://oushu:9000</value>
</property>

编辑 /etc/hadoop/conf/hadoop-env.sh,加入下面参数。这些参数配置了Java Home,Hadoop配置文件,日志文件目录,以及JVM选项。根据存储的HDFS数据量大小,需要适当修改NameNode的-Xmx值。HDFS数据量越大,-Xmx值应该设的越大。

export JAVA_HOME="/usr/java/default"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export HADOOP_NAMENODE_OPTS="-Xmx6144m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70"
export HADOOP_DATANODE_OPTS="-Xmx2048m -Xss256k"
export HADOOP_LOG_DIR=/var/log/hadoop/$USER

因为/etc/hadoop/conf/hdfs-site.xml中默认使用/data1和/data2两块盘,如果你有多块盘,你需要更改dfs.data.dir属性,使得HDFS用到所有盘:

<property>
    <name>dfs.data.dir</name>
    <value>/data1/hdfs/datanode,/data2/hdfs/datanode</value>
    <final>true</final>
</property>

格式化NameNode,并启动NameNode和DataNode。

注:在format过程中如果询问是否要format,请输入y,表示确认。

sudo -u -E hdfs hdfs namenode -format
sudo -u -E hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenode
sudo -u -E hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start datanode

HDFS日志在/var/log/hadoop/hdfs/中。如果因为配置出错,可以查看错误日志,并依据改正。

检查hdfs是否成功运行:

su - hdfs
hdfs dfsadmin -report
hdfs dfs -mkdir /testnode
hdfs dfs -put /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh /testnode/
hdfs dfs -ls -R /

你也可以查看HDFS web界面:http://oushu:50070/

安装OushuDB

安装OushuDB RPM,OushuDB RPM安装会自动创建gpadmin用户。

yum install -y hawq

在配置文件/etc/sysctl.conf添加内容

kernel.shmmax = 1000000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 200000
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 10000 65535
net.core.netdev_max_backlog = 200000
fs.nr_open = 3000000
kernel.threads-max = 798720
kernel.pid_max = 798720
# increase network
net.core.rmem_max=2097152
net.core.wmem_max=2097152

使系统配置生效:

sysctl -p

创建OushuDB本地元数据目录和临时文件目录:

#创建OushuDB本地元数据目录,下面两个目录分别为master和segment使用
mkdir -p /data1/hawq/masterdd
mkdir -p /data1/hawq/segmentdd

#创建OushuDB临时文件目录,每块盘需要创建一个临时文件目录,这样可以让OushuDB使用到所有盘。
mkdir -p /data1/hawq/tmp
chmod -R 755 /data1/hawq
chown -R gpadmin:gpadmin /data1/hawq

mkdir -p /data2/hawq/tmp
chmod -R 755 /data2/hawq
chown -R gpadmin:gpadmin /data2/hawq

在HDFS上创建OushuDB数据目录:

sudo -u hdfs hdfs dfs -mkdir -p /hawq_default
sudo -u hdfs hdfs dfs -chown -R gpadmin /hawq_default

编辑/usr/local/hawq/etc/slaves,去掉文件中的localhost,并加入oushu。slaves文件中存放所有slave节点的地址,每行一个节点。修改后文件为:

oushu

编辑/usr/local/hawq/etc/hawq-site.xml, 因为/usr/local/hawq/etc/hawq-site.xml中默认使用/data1和/data2两块盘,如果你有多块盘,你需要更改hawq_master_temp_directory和hawq_segment_temp_directory值,用到所有盘:

<property>
      <name>hawq_master_address_host</name>
      <value>oushu</value>
      <description>The host name of hawq master.</description>
</property>

<property>
      <name>hawq_dfs_url</name>
      <value>oushu:9000/hawq_default</value>
      <description>URL for accessing HDFS.</description>
</property>

<property>
       <name>magma_nodes_url</name>
       <value>localhost:6666</value>
       <description>urls for accessing magma.</description>
</property>

<property>
      <name>hawq_master_directory</name>
      <value>/data1/hawq/masterdd</value>
      <description>The directory of hawq master.</description>
</property>

<property>
      <name>hawq_segment_directory</name>
      <value>/data1/hawq/segmentdd</value>
      <description>The directory of hawq segment.</description>
</property>

<property>
      <name>hawq_master_temp_directory</name>
      <value>/data1/hawq/tmp,/data2/hawq/tmp</value>
      <description>The temporary directory reserved for hawq master. Note: please DONOT add " " between directries.</description>
</property>

<property>
      <name>hawq_segment_temp_directory</name>
      <value>/data1/hawq/tmp,/data2/hawq/tmp</value>
      <description>The temporary directory reserved for hawq segment. Note: please DONOT add " " between directories.</description>
</property>

OushuDB4.0版本新增Magma的单独配置和启停功能,使用magam服务时,首先创建magma node数据目录:

# 创建mamga node数据目录
mkdir -p /data1/hawq/magma_segmentdd
mkdir -p /data2/hawq/magma_segmentdd

chown -R gpadmin:gpadmin /data1/hawq
chown -R gpadmin:gpadmin /data2/hawq

然后编辑配置/usr/local/hawq/etc/magma-site.xml:

<property>
    <name>nodes_file</name>
    <value>slaves</value>
    <description>The magma nodes file name at GPHOME/etc</description>
</property>

<property>
    <name>node_data_directory</name>
    <value>file:///data1/hawq/magma_segmentdd,file:///data2/hawq/magma_segmentdd</value>
    <description>The data directory for magma node</description>
</property>

<property>
    <name>node_log_directory</name>
    <value>~/hawq-data-directory/segmentdd/pg_log</value>
    <description>The log directory for magma node</description>
</property>

<property>
    <name>node_address_port</name>
    <value>6666</value>
    <description>The port magma node listening</description>
</property>

<property>
    <name>magma_range_number</name>
    <value>2</value>
</property>

<property>
    <name>magma_replica_number</name>
    <value>3</value>
</property>

<property>
    <name>magma_datadir_capacity</name>
    <value>3</value>
</property>

<property>
    <name>compact_trigger_ap_ratio_limit</name>
    <value>0.2</value>
    <description>The threshold of triggering compact in MAGMAAP format.</description>
</property>

<property>
    <name>compact_trigger_tp_ratio_limit</name>
    <value>0.5</value>
    <description>The threshold of triggering compact in MAGMAAP catalog</description>
</property>

以gpadmin用户登录:

su - gpadmin

设置免密码ssh:

source /usr/local/hawq/greenplum_path.sh
hawq ssh-exkeys -h oushu

初始化OushuDB,在询问是否初始化时,请输入y,表示确认初始化。

hawq init cluster   //OushuDB4.0 默认不启动magma服务
hawq init cluster --with_magma  //OushuDB4.0新增,3.X版本不支持该选项

// OushuDB4.0版本新增--with_magma选项,但只有hawq init|start|stop cluster命令可以带--with_magma选项。

OushuDB管理工具日志在/home/gpadmin/hawqAdminLogs/中,OushuDB master日志和segment日志分别在/data1/hawq/masterdd/pg_log/ 和/data1/hawq/segmentdd/pg_log/中。如果因为配置出错,可以查看错误日志,并依据改正。

检查OushuDB是否运行正常:

su - gpadmin
source /usr/local/hawq/greenplum_path.sh
psql -d postgres
select * from gp_segment_configuration;  //确定所有节点是up状态

create table t(i int);
insert into t select generate_series(1,1000);
select count(*) from t;

体验新执行器

本章节通过TPCH lineitem 表来展示新执行器的使用。

建立e_lineitem外部表用来生成TPCH lineitem 数据,

CREATE EXTERNAL WEB TABLE E_LINEITEM ( L_ORDERKEY    INT8 ,
                                       L_PARTKEY     INTEGER ,
                                       L_SUPPKEY     INTEGER ,
                                       L_LINENUMBER  INTEGER ,
                                       L_QUANTITY    FLOAT ,
                                       L_EXTENDEDPRICE  FLOAT ,
                                       L_DISCOUNT    FLOAT ,
                                       L_TAX         FLOAT ,
                                       L_RETURNFLAG  VARCHAR(1) ,
                                       L_LINESTATUS  VARCHAR(1) ,
                                       L_SHIPDATE    TEXT ,
                                       L_COMMITDATE  TEXT ,
                                       L_RECEIPTDATE TEXT ,
                                       L_SHIPINSTRUCT CHAR(25) ,
                                       L_SHIPMODE     VARCHAR(10) ,
                                       L_COMMENT      VARCHAR(44) )
                                       EXECUTE 'bash -c "$GPHOME/bin/dbgen -b $GPHOME/bin/dists.dss -T L -s 1 -N 6 -n $((GP_SEGMENT_ID + 1))"'
                                       on 6 format 'text' (delimiter '|');

创建ORC 表

CREATE TABLE lineitem
       (  L_ORDERKEY    INT8,
          L_PARTKEY     INTEGER,
          L_SUPPKEY     INTEGER,
          L_LINENUMBER  INTEGER,
          L_QUANTITY    FLOAT,
          L_EXTENDEDPRICE  FLOAT,
          L_DISCOUNT    FLOAT,
          L_TAX         FLOAT,
          L_RETURNFLAG  TEXT,
          L_LINESTATUS  TEXT,
          L_SHIPDATE    TEXT,
          L_COMMITDATE  TEXT,
          L_RECEIPTDATE TEXT,
          L_SHIPINSTRUCT TEXT,
          L_SHIPMODE     TEXT,
          L_COMMENT      TEXT)
   WITH (APPENDONLY = true, OIDS = FALSE, ORIENTATION = orc);

插入数据

INSERT INTO lineitem SELECT * FROM e_lineitem;

从下面的例子可以看到新执行器对于性能的大幅改进。

-----获取表行数------
postgres=# set new_executor = on;
SET
postgres=# SELECT COUNT(*) FROM lineitem;
count
---------
6001215
(1 row)
Time: 17.006 ms

postgres=# set new_executor = off;
SET
postgres=# SELECT COUNT(*) FROM lineitem;
count
---------
6001215
(1 row)
Time: 213.248 ms

-----TPCH 查询 1 ------
postgres=# set new_executor = on;
SET
postgres=#  SELECT
        l_returnflag,
        l_linestatus,
        sum(l_quantity)::bigint as sum_qty,
        sum(l_extendedprice)::bigint as sum_base_price,
        sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge,
        avg(l_quantity)::bigint as avg_qty,
        avg(l_extendedprice)::bigint as avg_price,
        avg(l_discount)::bigint as avg_disc,
        count(*) as count_order
FROM
        lineitem
WHERE
        l_shipdate <= '1998-08-20'
GROUP BY
        l_returnflag,
        l_linestatus;

l_returnflag | l_linestatus | sum_qty  | sum_base_price | sum_disc_price |  sum_charge  | avg_qty | avg_price | avg_disc | count_order
--------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+-------------
R            | F            | 37719753 |    56568041381 |    53741292685 |  55889619120 |      26 |     38251 |        0 |     1478870
N            | F            |   991417 |     1487504710 |     1413082168 |   1469649223 |      26 |     38284 |        0 |       38854
A            | F            | 37734107 |    56586554401 |    53758257135 |  55909065223 |      26 |     38273 |        0 |     1478493
N            | O            | 73808911 |   110700990251 |   105167436999 | 109377979031 |      26 |     38248 |        0 |     2894278
(4 rows)

Time: 234.376 ms

postgres=# set new_executor = off;
SET
postgres=#  SELECT
        l_returnflag,
        l_linestatus,
        sum(l_quantity)::bigint as sum_qty,
        sum(l_extendedprice)::bigint as sum_base_price,
        sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge,
        avg(l_quantity)::bigint as avg_qty,
        avg(l_extendedprice)::bigint as avg_price,
        avg(l_discount)::bigint as avg_disc,
        count(*) as count_order
FROM
        lineitem
WHERE
        l_shipdate <= '1998-08-20'
GROUP BY
        l_returnflag,
        l_linestatus;

l_returnflag | l_linestatus | sum_qty  | sum_base_price | sum_disc_price |  sum_charge  | avg_qty | avg_price | avg_disc | count_order
--------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+-------------
R            | F            | 37719753 |    56568041381 |    53741292685 |  55889619120 |      26 |     38251 |        0 |     1478870
N            | F            |   991417 |     1487504710 |     1413082168 |   1469649223 |      26 |     38284 |        0 |       38854
A            | F            | 37734107 |    56586554401 |    53758257135 |  55909065223 |      26 |     38273 |        0 |     1478493
N            | O            | 73808911 |   110700990251 |   105167436999 | 109377979031 |      26 |     38248 |        0 |     2894278
(4 rows)

Time: 2341.147 ms