JDBC連接Spark SQL 打造數據分析平臺(即席查詢)

最近測試thrift JDBC Server 使用Spark的beeline腳本與JDBC Server進行交互使用,實現即席查詢。Thrift JDBC Server默認監聽端口是10000。

本文是基於spark sql 1.6 + parquet 做的測試,緊接著還會做spark sql 2.3 + carbondata 做測試。

使用Thrift JDBC Server前需要注意:

1、首先把hive-site.xml配置文件拷貝到$SPARK_HOME/conf目錄下;

2、修改$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驅動的jar包

export SPARK_CLASSPATH=$HIVE_HOME/lib/mysql-connector-java-5.1.36.jar:$SPARK_CLASSPATH

1.在hive中創建外部表

create external table asset_standard(

std_time string,

action string,

std_cnt bigint,

std_product string,

std_category string,

std_brand string,

offset bigint)

STORED AS PARQUET

location '/mclxxx/standard';

2.spark啟動thriftserver

start-thriftserver.sh --help 查看幫助,與spark-sql/spakr-shell的幫助相同。

cd $SPARK_HOME

sbin/start-thriftserver.sh --master yarn-client \

--queue default \

--driver-memory 4g \

--num-executors 12 \

--executor-memory 15g \

--executor-cores 2 \

--hiveconf hive.server2.thrift.port=10005

3.beeline連接Spark on hive:

cd $SPARK_HOME

[172.21.x.7:mcloud@bj-yh-mc-jobcenter-001:/usr/local/spark]$ bin/beeline

Beeline version 1.6.1 by Apache Hive

0: jdbc:hive2://localhost:10005> show tables;

+-------------------------------+--------------+--+

| tableName | isTemporary |

+-------------------------------+--------------+--+

| apptype | false |

| asset_standard | false |

+-------------------------------+--------------+--+

4.測試結果:

普通的sql查詢速度還是可以的。但是有多個group by的查詢速度並不是太快, 不能滿足我們現在的需求,但是做離線分析還是很不錯的。

select temp.cnt, count(temp.offset) from (Select offset, sum(std_cnt) as cnt from asset_standard where action = 1 group by offset) temp group by cnt

掃10億記錄,耗時68.814 seconds.

5.Java客戶端連接測試:

import java.sql.Connectionimport java.sql.DriverManagerimport java.sql.ResultSetimport java.sql.Statement/*** Created by tend on 2018/5/8.*/object SqlTest {def main(args: Array[String]) {val url:String = "jdbc:hive2://172.21.x.7:10005/default";Class.forName("org.apache.hive.jdbc.HiveDriver");val conn:Connection = DriverManager.getConnection(url,"mcloud","");val stmt:Statement = conn.createStatement();val sql:String = "select temp.cnt, count(temp.offset) from (Select offset, sum(std_cnt) as cnt from asset_standard where action = 1 group by offset) temp group by cnt";System.out.println("Running"+sql);val res:ResultSet = stmt.executeQuery(sql);while(res.next()){println("id: "+res.getLong(1)+"\ttype: "+res.getLong(2));}}} 

6.問題

start-thriftserver.sh 啟動時報錯:

org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address hostname1/192.168.65.7:10000.

at org.apache.thrift.transport.TServerSocket.(TServerSocket.java:109)

at org.apache.thrift.transport.TServerSocket.(TServerSocket.java:91)

at org.apache.thrift.transport.TServerSocket.(TServerSocket.java:87)

at org.apache.hive.service.auth.HiveAuthFactory.getServerSocket(HiveAuthFactory.java:241)

at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:66)

at java.lang.Thread.run(Thread.java:744)

指定:--hiveconf hive.server2.thrift.port=10005 解決。


分享到:


相關文章: