apache-spark-2.4.0-bin-hadoop2.7集群安装

1. 环境准备及版本介绍

1.Linux系统版本

CentOS release 6.8 (Final)
Linux version 2.6.32-642.el6.x86_64 ([email protected]) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 10 17:27:01 UTC 2016
镜像版本:CentOS-6.8-x86_64-minimal.iso

2.JDK版本

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
环境变量配置:
vim /etc/profile 添加以下文件内容 然后source /etc/profile
export JAVA_HOME=/home/software/jdk/jdk1.8.0_131
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/rt.jar

3.Hadoop和Spark版本

hadoop-2.7.7
spark-2.4.0-bin-hadoop2.7

2. Spark Standalone Mode HA

1.节点安排

apache-spark-2.4.0-bin-hadoop2.7集群安装

备注/etc/hosts配置:
192.168.1.211 z1
192.168.1.212 z2
192.168.1.213 z3
192.168.1.214 z4

2.开始安装

  • spark.env.sh配置
#export SPARK_MASTER_HOST=z1
export SPARK_MASTER_PORT=7077
export JAVA_HOME=/home/software/jdk/jdk1.8.0_131
#高可用
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=z1:2181,z2:2181,z3:2181 -Dspark.deploy.zookeeper.dir=/spark"
#export SPARK_WORKER_MEMORY=2g
#export SPARK_EXECUTOR_MEMORY=2g
#export SPARK_DRIVER_MEMORY=2g
#export SPARK_WORKER_CORES=1
  • slaves配置
z2
z3
z4
  • 配置History Server(很重要)
spark.eventLog.enabled true
spark.eventLog.dir hdfs://bigdata/user/spark/historyLog
spark.history.fs.logDirectory hdfs://bigdata/user/spark/historyLog

3.启动

第一步:启动三个zookeeper 在其bin目录下执行./zkServer.sh start 

第二步:在z1主机spark的sbin目录下 执行 ./start-all.sh
第三步:选择一个节点(z2主机)启动一个master作为standby节点 执行./start-master.sh
第四步:浏览器访问 http://z1:8080 打开spark的web ui监控页面

4.提交任务

bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://z1:7077 --deploy-mode client examples/jars/spark-examples_2.11-2.4.0.jar 10000
集群方式:
bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://z1:7077 --deploy-mode cluster examples/jars/spark-examples_2.11-2.4.0.jar 10000
如果是集群方式提交确保jar包在所有节点上否则会出错
建议将jar包上传至hdfs
bin/spark-submit --class cn.com.spark.GroupTest --master spark://z1:7077 --deploy-mode cluster hdfs://z2:8020/spark/examples/simple-spark-master-1.0-SNAPSHOT-jar-with-dependencies.jar

5. 配置项

配置详解参考官方网站:版本可能会有所差异
http://spark.apache.org/docs/latest/configuration.html

6.代码当中可以自己修改配置项

In some cases, you may want to avoid hard-coding certain configurations in a SparkConf. For instance, Spark allows you to simply create an empty conf and set spark/spark hadoop properties.

val conf = new SparkConf().set("spark.hadoop.abc.def","xyz")
val sc = new SparkContext(conf)
Also, you can modify or add configurations at runtime:
./bin/spark-submit \\
--name "My app" \\
--master local[4] \\
--conf spark.eventLog.enabled=false \\
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \\
--conf spark.hadoop.abc.def=xyz \\
myApp.jar

3.1. Running Spark on YARN

说明:Spark就可以跑在YARN上了,也没必要启动spark的master和slaves服务,因为是靠yarn进行任务调度,所以直接提交任务即可。

1.spark-env.sh配置

#spark on yarn setting
export JAVA_HOME=/home/software/jdk/jdk1.8.0_131
export HADOOP_CONF_DIR=/home/software/hadoop/hadoop-2.7.7/etc/hadoop

2.提交任务方式

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster examples/jars/spark-examples_2.11-2.4.0.jar 1000

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client examples/jars/spark-examples_2.11-2.4.0.jar 1000

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster hdfs://z2:8020/spark/examples/jars/spark-examples_2.11-2.4.0.jar 1000


依赖外部jar
Cluster集群模式,只能通过yarn的执行日志查看结果
./spark-submit --class cn.com.spark.sql.JdbcTest --master yarn --deploy-mode cluster --driver-class-path /root/mysql-connector-java-5.1.39.jar --jars /root/mysql-connector-java-5.1.39.jar --conf spark.executor.extraClassPath=/root/mysql-connector-java-5.1.39.jar hdfs://z2:8020/spark/examples/simple-spark-master-JdbcTest.jar

./spark-submit --class cn.com.spark.sql.JdbcTest --master yarn --deploy-mode cluster --driver-class-path /root/mysql-connector-java-5.1.39.jar --conf spark.executor.extraClassPath=/root/mysql-connector-java-5.1.39.jar hdfs://z2:8020/spark/examples/simple-spark-master-JdbcTest.jar


../bin/spark-submit --class cn.com.spark.sql.JdbcTest --master yarn --deploy-mode cluster --driver-class-path /root/mysql-connector-java-5.1.39.jar --jars /root/mysql-connector-java-5.1.39.jar hdfs://bigdata/spark/examples/simple-spark-master-JdbcTest.jar
Client:客户端模式,可控制台查看结果
spark-submit --class cn.com.spark.sql.JdbcTest --master yarn --deploy-mode client --driver-class-path /root/mysql-connector-java-5.1.39.jar --jars /root/mysql-connector-java-5.1.39.jar hdfs://bigdata/spark/examples/simple-spark-master-JdbcTest.jar

3.spark on yarn 进入shell命令

./spark-shell –master yarn –deploy-mode client

4.spark.yarn.jars 或spark.yarn.archive 配置项讲解

# spark.yarn.jars hdfs://bigdata/spark/excute/jars/*.jar 
或者将SPARK_HOME/jars进行压缩上传至hdfs可以通过下面的配置

#spark.yarn.archive hdfs://bigdata/spark/excute/archive/spark-libs.zip
由于我们每次向yarn提交任务时,都会将$SPARK_HOME/jars下的上传至hdfs
例如:spark-submit --class cn.com.spark.sql.JdbcTest --master yarn --deploy-mode cluster --driver-class-path /root/mysql-connector-java-5.1.39.jar --jars /root/mysql-connector-java-5.1.39.jar --num-executors 3 hdfs://bigdata/spark/examples/simple-spark-master-JdbcTest.jar
执行日志如下图:
apache-spark-2.4.0-bin-hadoop2.7集群安装


分享到:


相關文章: