OuShu Database 集群安装

准备

hadoop集群,zookeeper集群,ntp服务,java环境等

主机及服务:

角色主机 master01 master02 slave01 slave02 slave03
HAWQ Master primary standby no no no
HAWQ Segment no no yes yes yes
NameNode standby active no no no
DataNode no no yes yes yes
Zookeeper yes yes yes no no

配置YUM源

例:先配置master01,ssh master01;su root.

在线

  • Redhat/CentOS 7.0, 7.1, 7.2系统并且包含avx指令请配置以下YUM源:

    1
    wget -P /etc/yum.repos.d/ http://yum.oushu.io/oushurepo/oushudatabaserepo/centos7/3.0.1.0/oushu-database.repo
  • Redhat/CentOS 7.0, 7.1, 7.2系统但是不包含avx指令请配置以下YUM源:

    1
    wget -P /etc/yum.repos.d/ http://yum.oushu.io/oushurepo/oushudatabaserepo/centos7/3.0.1.0/oushu-database-noavx.repo
  • Redhat/CentOS 7.3系统的用户并且包含avx命令请配置如下YUM源:

    1
    wget -P /etc/yum.repos.d/ http://yum.oushu.io/oushurepo/oushudatabaserepo/centos7/3.0.1.0/oushu-database-cent73.repo
  • Redhat/CentOS 7.3系统的用户但是不包含avx命令请配置如下YUM源:

    1
    wget -P /etc/yum.repos.d/ http://yum.oushu.io/oushurepo/oushudatabaserepo/centos7/3.0.1.0/oushu-database-cent73-noavx.repo

离线

下载或拷贝安装包(目录:/data1/localrepo/)

1
wget http://yum.oushu.io/oushurepo/tarball/release/oushu-database/centos7/3.0.1.0/oushu-database-full-3.0.1.0-rhel7-x86\_64.tar.gz

解压软件包并安装httpd

1
2
3
4
tar xzf oushu-database-full-3.0.1.0-rhel7-x86_64.tar.gz
yum -y install httpd
systemctl start httpd
chown -R root:root /data1/localrepo //下载或拷贝的安装包目录

安装本地源

1
/data1/localrepo/oushu-database-full-3.0.1.0/setup_repo.sh

关闭selinux,重建yum缓存

1
2
3
4
setenforce 0
yum clean all
yum makecache
rm -f /data1/localrepo/oushu-database-full-3.0.1.0-rhel7-x86_64.tar.gz // 可选

其余节点同样按照该方法配置

注意关闭selinux时执行:

1
2
sed -i "s/^SELINUX\=enforcing/SELINUX\=disabled/g" /etc/selinux/config
setenforce 0

关闭防火墙:

1
2
3
systemctl disable iptables
systemctl stop firewalld
systemctl disable firewalld

安装HAWQ

master01节点上,在配置文件/etc/sysctl.conf添加内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
kernel.shmmax = 1000000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 200000
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1281 65535
net.core.netdev_max_backlog = 200000
fs.nr_open = 3000000
kernel.threads-max = 798720
kernel.pid_max = 798720
# increase network
net.core.rmem_max=2097152
net.core.wmem_max=2097152

复制该配置文件到所有节点,并使其生效(sysctl -p)

master01节点上,创建文件/etc/security/limits.d/gpadmin.conf:

1
2
3
4
* soft nofile 1048576
* hard nofile 1048576
* soft nproc 131072
* hard nproc 131072

拷贝该配置文件到所有节点/etc/security/limits.d/目录下

install hawq

1
2
yum install -y hawq
source /usr/local/hawq/greenplum_path.sh #设置hawq环境变量

创建hdfs目录,并赋予gpadmin权限

1
2
hdfs dfs -mkdir -p /hawq/default_filespace
hdfs dfs -chown -R gpadmin /hawq

在master01上,创建文件mhostfile,记录所有hawq的master和standby master的hostname,类似hostfile:

1
2
master01
master02

在master01上,创建shostfile,记录所有hawq的segment的hostname,类似hostfile:

1
2
3
slave01
slave02
slave03

在master01上,使用“hawq ssh”在master和standby节点创建master元数据目录和临时文件目录,并授予gpadmin权限:

1
2
3
4
5
6
7
8
9
#创建master元数据目录
hawq ssh -f mhostfile -e 'mkdir -p /data1/hawq/masterdd'
#创建临时文件目录
hawq ssh -f mhostfile -e 'mkdir -p /data1/hawq/tmp'
hawq ssh -f mhostfile -e 'mkdir -p /data2/hawq/tmp'
hawq ssh -f mhostfile -e 'chown -R gpadmin:gpadmin /data1/hawq'
hawq ssh -f mhostfile -e 'chown -R gpadmin:gpadmin /data2/hawq'

在master01,使用“hawq ssh”在所有segment创建segment元数据目录和临时文件目录,并授予gpadmin权限:

1
2
3
4
5
6
7
8
9
#创建segment元数据目录
hawq ssh -f shostfile -e 'mkdir -p /data1/hawq/segmentdd'
#创建临时文件目录
hawq ssh -f shostfile -e 'mkdir -p /data1/hawq/tmp'
hawq ssh -f shostfile -e 'mkdir -p /data2/hawq/tmp'
hawq ssh -f shostfile -e 'chown -R gpadmin:gpadmin /data1/hawq'
hawq ssh -f shostfile -e 'chown -R gpadmin:gpadmin /data2/hawq'

在master01,切换hawq用户,hawq相关的配置文件都需要使用该用户权限:

1
su - gpadmin

在master01,修改/usr/local/hawq/etc/slaves,将所有HAWQ的segment节点的hostname写入slaves中,在本次安装中,应该写入slaves的有slave01,slave02,slave03内容为:

1
2
3
slave01
slave02
slave03

修改/usr/local/hawq/etc/hdfs-client.xml,去掉ha注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<configuration>
<property>
<name>dfs.nameservices</name>
<value>oushu</value>
</property>
<property>
<name>dfs.ha.namenodes.oushu</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.oushu.nn1</name>
<value>oushum2:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.oushu.nn2</name>
<value>oushum1:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.oushu.nn1</name>
<value>oushum2:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.oushu.nn2</name>
<value>oushum1:50070</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
<description>Optional. This is a path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients.If the string "_PORT" is present in this path, it will be replaced by the TCP port of the DataNode.</description>
</property>
</configuration>

在master01,修改/usr/local/hawq/etc/hawq-site.xml 注意:hawq_dfs_url中的oushu是dfs.nameservices的值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
<configuration>
<property>
<name>hawq_master_address_host</name>
<value>oushum1</value>
</property>
<property>
<name>hawq_standby_address_host</name>
<value>oushum2</value>
<description>The host name of hawq standby master.</description>
</property>
<property>
<name>hawq_dfs_url</name>
<value>oushu/hawq/default_filespace</value>
<description>URL for accessing HDFS.</description>
</property>
<property>
<name>hawq_master_directory</name>
<value>/data1/hawq/masterdd</value>
<description>The directory of hawq master.</description>
</property>
<property>
<name>hawq_segment_directory</name>
<value>/data1/hawq/segmentdd</value>
<description>The directory of hawq segment.</description>
</property>
<property>
<name>hawq_master_temp_directory</name>
<value>/data1/hawq/tmp,/data2/hawq/tmp</value>
<description>The temporary directory reserved for hawq master. NOTE: please DONOT add " " between directories. </description>
</property>
<property>
<name>hawq_segment_temp_directory</name>
<value>/data1/hawq/tmp,/data2/hawq/tmp</value>
<description>The temporary directory reserved for hawq segment. NOTE: please DONOT add " " between directories. </description>
</property>
<property>
<name>hawq_rm_yarn_address</name>
<value>oushum1:8032</value>
<description>The address of YARN resource manager server.</description>
</property>
<property>
<name>hawq_rm_yarn_scheduler_address</name>
<value>oushum1:8030</value>
<description>The address of YARN scheduler server.</description>
</property>
<property>
<name>hawq_rm_yarn_app_name</name>
<value>hawq</value>
<description>The application name to register hawq resource manager in YARN.</description>
</property>
<property>
<name>hawq_re_cgroup_hierarchy_name</name>
<value>hawq</value>
<description>The name of the hierarchy to accomodate CGroup directories/files for resource enforcement.For example, /sys/fs/cgroup/cpu/hawq for CPU sub-system.</description>
</property>
</configuration>

切换成root用户 su root
拷贝master01上/usr/local/hawq/etc中的配置文件到所有节点
在master01上,切换到gpadmin用户,创建hhostfile:

1
2
3
su - gpadmin
source /usr/local/hawq/greenplum_path.sh #设置hawq环境变量
touch hhostfile

hhostfile文件记录所有HAWQ节点主机名称,内容如下:

1
2
3
4
5
master01
master02
slave01
slave02
slave03

使用root用户登录到每台机器,修改gpadmin用户密码:

1
sudo echo 'password' | sudo passwd --stdin gpadmin

针对gpadmin用户交换key,并且按照提示输入相应节点的gpadmin用户密码:

1
2
3
su - gpadmin
source /usr/local/hawq/greenplum_path.sh #设置hawq环境变量
hawq ssh-exkeys -f hhostfile

在master01,使用gpadmin用户权限,初始化HAWQ集群, 当提示“Continue with HAWQ init”时,输入 Y:

1
hawq init cluster

在做HAWQ集群初始化的时候,需要保证在创建的/data*/hawq/目录下,masterdd和segmentdd都是空目录,在hadoop上创建的/hawq/default_filespace确保是空目录
另外,如果hawq init cluster失败,可以先执行下面的命令停止hawq集群,清空目录,找出问题原因后重新初始化。

1
hawq stop cluster

在HAWQ master节点,根据本次安装的配置,s使用下面的命令清空所有hawq目录,然后重建hawq子目录:

1
2
hawq ssh -f hhostfile -e 'rm -fr /data1/hawq/masterdd/*'
hawq ssh -f hhostfile -e 'rm -fr /data1/hawq/segmentdd/*'

在HDFS namenode节点,使用下面的命令,清空/hawq/default_filespace,如果/hawq/default_filespace中有用户数据,注意备份数据,避免造成损失:

1
hdfs dfs -rm -f -r /hawq/default_filespace/*

检查HDFS的参数配置是否正确,最好以gpadmin用户来检查。如果参数配置不正确的话,虽然有时HDFS可以正常启动,但在高负载情况下HDFS会出现错误。

1
2
3
su - gpadmin
source /usr/local/hawq/greenplum_path.sh
hawq check -f hostfile --hadoop /usr/hdp/current/hadoop-client/ --hdfs-ha

检查HAWQ是否运行正常:

1
2
3
4
5
6
7
8
su - gpadmin
source /usr/local/hawq/greenplum_path.sh
psql -d postgres
select * from gp_segment_configuration; #确定所有节点是up状态
create table t(i int);
insert into t select generate_series(1,1000);
select count(*) from t;