Apache zeppelin 入门

Apache zeppelin 入门

zeppelin是个很不错的工具,先看用途吧。

1.Zeppelin用途:

(1)提供了web版的类似ipython的notebook,用于做数据分析和可视化。

(2)可以接入不同的数据处理引擎,包括spark, hive, tajo等,原生支持scala, java, shell, markdown等。

(3)查询结果数据可视化。

官方:

Zeppelin supports Spark, PySpark, Spark R, Spark SQL with dependency loader.

Zeppelin lets you connect any JDBC data sources seamlessly. Postgresql, Mysql, MariaDB, Redshift, Apache Hive and so on.

Python is supported with Matplotlib, Conda, Pandas SQL and PySpark integrations.

2.版本兼容问题:

https://zeppelin.apache.org/supported_interpreters.html

3.解释器和解释器组

解释器:

The concept of Zeppelin interpreter allows any language/data-processing-backend to be plugged into Zeppelin.

解释器组:

By default, every interpreter is belonged to a single group, but the group might contain more interpreters.

Spark interpreter group is including Spark support, pySpark, Spark SQL and the dependency loader.

Zeppelin interpreters from the same group are running in the same JVM. For more information about this, please checkout here.

4.Zeppelin 安装

官方安装地址:https://zeppelin.apache.org/docs/0.8.2/quickstart/install.html#downloading-binary-package

#配置ip端口 conf/zeppelin-env.sh

export ZEPPELIN_ADDR=xxx.xxx.xxx.xxx # Bind address (default 127.0.0.1)

export ZEPPELIN_PORT=9099

启动:

bin/zeppelin-daemon.sh start

访问登陆:

http://172.21.xx.xx:9099/#/

#配置shiro.ini 认证登陆的用户

[users]

# List of users with their password allowed to access Zeppelin.

# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections

# To enable admin user, uncomment the following line and set an appropriate password.

admin = password1, admin #默认的用户

user1 = password2, role1, role2

#/** = anon #匿名,无需登陆

/** = authc #需要认证登陆

#重启

bin/zeppelin-daemon.sh restart

5.spark on yarn 配置zeppelin

地址:https://zeppelin.apache.org/docs/0.8.2/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin

#配置:conf/zeppelin-env.sh

export MASTER=yarn-client

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_HOME=$SPARK_HOME

#在页面配置interpreter

masteryarn-cluster

spark.app.namexxx-Zeppelin

spark.cores.max2

spark.driver.memory4g

spark.executor.instances30

spark.executor.memory5g

spark.yarn.queuedefault

spark.jars/home/xx/lib/xx-xxx-1.0.0.jar

#测试运行

Apache zeppelin 入门


分享到:


相關文章: