zeppelin是个很不错的工具,先看用途吧。
1.Zeppelin用途:
(1)提供了web版的类似ipython的notebook,用于做数据分析和可视化。
(2)可以接入不同的数据处理引擎,包括spark, hive, tajo等,原生支持scala, java, shell, markdown等。
(3)查询结果数据可视化。
官方:
Zeppelin supports Spark, PySpark, Spark R, Spark SQL with dependency loader.
Zeppelin lets you connect any JDBC data sources seamlessly. Postgresql, Mysql, MariaDB, Redshift, Apache Hive and so on.
Python is supported with Matplotlib, Conda, Pandas SQL and PySpark integrations.
2.版本兼容问题:
https://zeppelin.apache.org/supported_interpreters.html
3.解释器和解释器组
解释器:
The concept of Zeppelin interpreter allows any language/data-processing-backend to be plugged into Zeppelin.
解释器组:
By default, every interpreter is belonged to a single group, but the group might contain more interpreters.
Spark interpreter group is including Spark support, pySpark, Spark SQL and the dependency loader.
Zeppelin interpreters from the same group are running in the same JVM. For more information about this, please checkout here.
4.Zeppelin 安装
官方安装地址:https://zeppelin.apache.org/docs/0.8.2/quickstart/install.html#downloading-binary-package
#配置ip端口 conf/zeppelin-env.sh
export ZEPPELIN_ADDR=xxx.xxx.xxx.xxx # Bind address (default 127.0.0.1)
export ZEPPELIN_PORT=9099
启动:
bin/zeppelin-daemon.sh start
访问登陆:
http://172.21.xx.xx:9099/#/
#配置shiro.ini 认证登陆的用户
[users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections
# To enable admin user, uncomment the following line and set an appropriate password.
admin = password1, admin #默认的用户
user1 = password2, role1, role2
#/** = anon #匿名,无需登陆
/** = authc #需要认证登陆
#重启
bin/zeppelin-daemon.sh restart
5.spark on yarn 配置zeppelin
地址:https://zeppelin.apache.org/docs/0.8.2/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin
#配置:conf/zeppelin-env.sh
export MASTER=yarn-client
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=$SPARK_HOME
#在页面配置interpreter
masteryarn-cluster
spark.app.namexxx-Zeppelin
spark.cores.max2
spark.driver.memory4g
spark.executor.instances30
spark.executor.memory5g
spark.yarn.queuedefault
spark.jars/home/xx/lib/xx-xxx-1.0.0.jar
#测试运行
閱讀更多 大數據科學家 的文章