Apache zeppelin 入門

Apache zeppelin 入門

zeppelin是個很不錯的工具,先看用途吧。

1.Zeppelin用途:

(1)提供了web版的類似ipython的notebook,用於做數據分析和可視化。

(2)可以接入不同的數據處理引擎,包括spark, hive, tajo等,原生支持scala, java, shell, markdown等。

(3)查詢結果數據可視化。

官方:

Zeppelin supports Spark, PySpark, Spark R, Spark SQL with dependency loader.

Zeppelin lets you connect any JDBC data sources seamlessly. Postgresql, Mysql, MariaDB, Redshift, Apache Hive and so on.

Python is supported with Matplotlib, Conda, Pandas SQL and PySpark integrations.

2.版本兼容問題:

https://zeppelin.apache.org/supported_interpreters.html

3.解釋器和解釋器組

解釋器:

The concept of Zeppelin interpreter allows any language/data-processing-backend to be plugged into Zeppelin.

解釋器組:

By default, every interpreter is belonged to a single group, but the group might contain more interpreters.

Spark interpreter group is including Spark support, pySpark, Spark SQL and the dependency loader.

Zeppelin interpreters from the same group are running in the same JVM. For more information about this, please checkout here.

4.Zeppelin 安裝

官方安裝地址:https://zeppelin.apache.org/docs/0.8.2/quickstart/install.html#downloading-binary-package

#配置ip端口 conf/zeppelin-env.sh

export ZEPPELIN_ADDR=xxx.xxx.xxx.xxx # Bind address (default 127.0.0.1)

export ZEPPELIN_PORT=9099

啟動:

bin/zeppelin-daemon.sh start

訪問登陸:

http://172.21.xx.xx:9099/#/

#配置shiro.ini 認證登陸的用戶

[users]

# List of users with their password allowed to access Zeppelin.

# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections

# To enable admin user, uncomment the following line and set an appropriate password.

admin = password1, admin #默認的用戶

user1 = password2, role1, role2

#/** = anon #匿名,無需登陸

/** = authc #需要認證登陸

#重啟

bin/zeppelin-daemon.sh restart

5.spark on yarn 配置zeppelin

地址:https://zeppelin.apache.org/docs/0.8.2/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin

#配置:conf/zeppelin-env.sh

export MASTER=yarn-client

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_HOME=$SPARK_HOME

#在頁面配置interpreter

masteryarn-cluster

spark.app.namexxx-Zeppelin

spark.cores.max2

spark.driver.memory4g

spark.executor.instances30

spark.executor.memory5g

spark.yarn.queuedefault

spark.jars/home/xx/lib/xx-xxx-1.0.0.jar

#測試運行

Apache zeppelin 入門


分享到:


相關文章: