环境
1、操作系统: CentOS 7.2 64位
网路设置
hostname |
IP |
cluster-master |
172.18.0.2 |
cluster-slave1 |
172.18.0.3 |
cluster-slave2 |
172.18.0.4 |
cluster-slave3 |
172.18.0.5 |
Docker 安装
1 2 3 4 5 6 7 8 9
| curl -sSL https:
## 换源 ###这里可以参考这篇文章http: curl -sSL https:
## 开启自启动 systemctl enable docker systemctl start docker
|
拉去Centos镜像
1
| docker pull daocloud.io/library/centos:latest
|
使用docker ps
查看下载的镜像
创建容器
按照集群的架构,创建容器时需要设置固定IP,所以先要在docker使用如下命令创建固定IP的子网
1
| docker network create --subnet=172.18.0.0/16 netgroup
|
docker的子网创建完成之后就可以创建固定IP的容器了
1 2 3 4 5 6 7 8 9 10
|
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-master -h cluster-master -p 18088:18088 -p 9870:9870 --net netgroup --ip 172.18.0.2 daocloud.io/library/centos /usr/sbin/init
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave1 -h cluster-slave1 --net netgroup --ip 172.18.0.3 daocloud.io/library/centos /usr/sbin/init
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave2 -h cluster-slave2 --net netgroup --ip 172.18.0.4 daocloud.io/library/centos /usr/sbin/init
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave3 -h cluster-slave3 --net netgroup --ip 172.18.0.5 daocloud.io/library/centos /usr/sbin/init
|
启动控制台并进入docker
容器中:
1
| docker exec -it cluster-master /bin/bash
|
安装OpenSSH免密登录
1、cluster-master
安装:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
|
[root@cluster-master /]# yum -y install openssh openssh-server openssh-clients
[root@cluster-master /]# systemctl start sshd
[root@cluster-master /]# vi /etc/ssh/ssh_config
[root@cluster-master /]# systemctl restart sshd
|
2、分别对slaves安装OpenSSH
1 2 3 4
| #安装openssh [root@cluster-slave1 /]#yum -y install openssh openssh-server openssh-clients
[root@cluster-slave1 /]# systemctl start sshd
|
3、cluster-master公钥分发
在master机上执行
ssh-keygen -t rsa
并一路回车,完成之后会生成~/.ssh目录,目录下有id_rsa(私钥文件)和id_rsa.pub(公钥文件),再将id_rsa.pub重定向到文件authorized_keys
1 2 3 4
| ssh-keygen -t rsa
[root@cluster-master /]# cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
|
文件生成之后用scp将公钥文件分发到集群slave主机
1 2 3 4 5 6
| [root@cluster-master /]# ssh root@cluster-slave1 'mkdir ~/.ssh' [root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave1:~/.ssh [root@cluster-master /]# ssh root@cluster-slave2 'mkdir ~/.ssh' [root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave2:~/.ssh [root@cluster-master /]# ssh root@cluster-slave3 'mkdir ~/.ssh' [root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave3:~/.ssh
|
分发完成之后测试(ssh root@cluster-slave1)是否已经可以免输入密码登录
Ansible安装
1 2 3
| [root@cluster-master /]# yum -y install epel-release [root@cluster-master /]# yum -y install ansible
|
此时我们再去编辑ansible的hosts文件
1 2 3 4 5 6 7 8 9 10 11 12 13
| [cluster] cluster-master cluster-slave1 cluster-slave2 cluster-slave3
[master] cluster-master
[slaves] cluster-slave1 cluster-slave2 cluster-slave3
|
配置docker容器hosts
由于/etc/hosts文件在容器启动时被重写,直接修改内容在容器重启后不能保留,为了让容器在重启之后获取集群hosts,使用了一种启动容器后重写hosts的方法。
需要在~/.bashrc中追加以下指令
1 2 3 4 5 6 7 8
| :>/etc/hosts cat >>/etc/hosts<<EOF 127.0.0.1 localhost 172.18.0.2 cluster-master 172.18.0.3 cluster-slave1 172.18.0.4 cluster-slave2 172.18.0.5 cluster-slave3 EOF
|
使配置文件生效,可以看到/etc/hosts文件已经被改为需要的内容
1 2 3 4 5 6
| [root@cluster-master ansible]# cat /etc/hosts 127.0.0.1 localhost 172.18.0.2 cluster-master 172.18.0.3 cluster-slave1 172.18.0.4 cluster-slave2 172.18.0.5 cluster-slave3
|
用ansible分发.bashrc至集群slave下
1
| ansible cluster -m copy -a "src=~/.bashrc dest=~/"
|
软件环境配置
下载JDK1.8并解压缩至/opt
目录下
下载hadoop3 到/opt
目录下,解压安装包,并创建链接文件
1 2
| tar -xzvf hadoop-3.2.0.tar.gz ln -s hadoop-3.2.0 hadoop
|
配置java和hadoop环境变量
编辑 ~/.bashrc
文件
1 2 3 4 5 6 7
| export HADOOP_HOME=/opt/hadoop-3.2.0 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export JAVA_HOME=/opt/jdk8 export PATH=$HADOOP_HOME/bin:$PATH
|
使文件生效:
配置hadoop运行所需配置文件
1
| cd $HADOOP_HOME/etc/hadoop/
|
1、修改core-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://cluster-master:9000</value> </property> <property> <name>fs.trash.interval</name> <value>4320</value> </property> </configuration>
|
2、修改hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| <configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>staff</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> </configuration>
|
3、修改mapred-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.job.tracker</name> <value>cluster-master:9001</value> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>cluster-master:50030</value> </property> <property> <name>mapreduce.jobhisotry.address</name> <value>cluster-master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>cluster-master:19888</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/jobhistory/done</value> </property> <property> <name>mapreduce.intermediate-done-dir</name> <value>/jobhisotry/done_intermediate</value> </property> <property> <name>mapreduce.job.ubertask.enable</name> <value>true</value> </property> </configuration>
|
4、yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
| <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>cluster-master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>cluster-master:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>cluster-master:18030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>cluster-master:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>cluster-master:18141</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>cluster-master:18088</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property> <property> <name>yarn.log-aggregation.retain-check-interval-seconds</name> <value>86400</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/tmp/logs</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir-suffix</name> <value>logs</value> </property> </configuration>
|
打包hadoop 向slaves分发
1
| tar -cvf hadoop-dis.tar hadoop hadoop-3.2.0
|
使用ansible-playbook分发.bashrc和hadoop-dis.tar至slave主机
1 2 3 4 5 6 7 8 9 10 11 12 13
| --- - hosts: cluster tasks: - name: copy .bashrc to slaves copy: src=~/.bashrc dest=~/ notify: - exec source - name: copy hadoop-dis.tar to slaves unarchive: src=/opt/hadoop-dis.tar dest=/opt
handlers: - name: exec source shell: source ~/.bashrc
|
将以上yaml保存为hadoop-dis.yaml,并执行
1
| ansible-playbook hadoop-dis.yaml
|
hadoop-dis.tar会自动解压到slave主机的/opt目录下
Hadoop 启动
格式化namenode
如果看到storage format success等字样,即可格式化成功
启动集群
1 2
| cd $HADOOP_HOME/sbin start-all.sh
|
启动后可使用jps命令查看是否启动成功
注意:
在实践中遇到节点slaves 上的datanode服务没有启动,查看slave上目录结构发现
没有生成配置文件中设置的文件夹,比如:core-site.xml中
1 2 3 4 5
| <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property>
|
hdfs-site.xml文件中:
1 2 3 4 5 6 7 8
| <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/data</value> </property>
|
手动到节点中生成这些文件夹,之后删除master中这些文件夹和$HADOOP_HOME下的logs文件夹,之后重新格式化namenode
再次启动集群服务:
这时在到从节点观察应该会看到节点服务
验证服务
访问
来查看服务是否启动
转载:https://www.jianshu.com/p/d7fa21504784