最近測試thrift JDBC Server 使用Spark的beeline腳本與JDBC Server進行交互使用,實現即席查詢。Thrift JDBC Server默認監聽端口是10000。
本文是基於spark sql 1.6 + parquet 做的測試,緊接著還會做spark sql 2.3 + carbondata 做測試。
使用Thrift JDBC Server前需要注意:
1、首先把hive-site.xml配置文件拷貝到$SPARK_HOME/conf目錄下;
2、修改$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驅動的jar包
export SPARK_CLASSPATH=$HIVE_HOME/lib/mysql-connector-java-5.1.36.jar:$SPARK_CLASSPATH
1.在hive中創建外部表
create external table asset_standard(
std_time string,
action string,
std_cnt bigint,
std_product string,
std_category string,
std_brand string,
offset bigint)
STORED AS PARQUET
location '/mclxxx/standard';
2.spark啟動thriftserver
start-thriftserver.sh --help 查看幫助,與spark-sql/spakr-shell的幫助相同。
cd $SPARK_HOME
sbin/start-thriftserver.sh --master yarn-client \
--queue default \
--driver-memory 4g \
--num-executors 12 \
--executor-memory 15g \
--executor-cores 2 \
--hiveconf hive.server2.thrift.port=10005
3.beeline連接Spark on hive:
cd $SPARK_HOME
[172.21.x.7:mcloud@bj-yh-mc-jobcenter-001:/usr/local/spark]$ bin/beeline
Beeline version 1.6.1 by Apache Hive
0: jdbc:hive2://localhost:10005> show tables;
+-------------------------------+--------------+--+
| tableName | isTemporary |
+-------------------------------+--------------+--+
| apptype | false |
| asset_standard | false |
+-------------------------------+--------------+--+
4.測試結果:
普通的sql查詢速度還是可以的。但是有多個group by的查詢速度並不是太快, 不能滿足我們現在的需求,但是做離線分析還是很不錯的。
select temp.cnt, count(temp.offset) from (Select offset, sum(std_cnt) as cnt from asset_standard where action = 1 group by offset) temp group by cnt
掃10億記錄,耗時68.814 seconds.
5.Java客戶端連接測試:
import java.sql.Connectionimport java.sql.DriverManagerimport java.sql.ResultSetimport java.sql.Statement/*** Created by tend on 2018/5/8.*/object SqlTest {def main(args: Array[String]) {val url:String = "jdbc:hive2://172.21.x.7:10005/default";Class.forName("org.apache.hive.jdbc.HiveDriver");val conn:Connection = DriverManager.getConnection(url,"mcloud","");val stmt:Statement = conn.createStatement();val sql:String = "select temp.cnt, count(temp.offset) from (Select offset, sum(std_cnt) as cnt from asset_standard where action = 1 group by offset) temp group by cnt";System.out.println("Running"+sql);val res:ResultSet = stmt.executeQuery(sql);while(res.next()){println("id: "+res.getLong(1)+"\ttype: "+res.getLong(2));}}}
6.問題
start-thriftserver.sh 啟動時報錯:
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address hostname1/192.168.65.7:10000.
at org.apache.thrift.transport.TServerSocket.
at org.apache.thrift.transport.TServerSocket.
at org.apache.thrift.transport.TServerSocket.
at org.apache.hive.service.auth.HiveAuthFactory.getServerSocket(HiveAuthFactory.java:241)
at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:66)
at java.lang.Thread.run(Thread.java:744)
指定:--hiveconf hive.server2.thrift.port=10005 解決。
閱讀更多 從大數據說起 的文章