SparkSql要符合sql標準,在兩個表作union的時候,字段的順序需要一一對應,否則結果會錯誤。同樣spark sql也要遵守。兩個parquet作union的時候字段一定要對應上。否則最後結果會是錯誤的。
下面兩個例子:
1.兩個parquet的字段順序一致
val conf = new SparkConf().setAppName("UDAF").setMaster("local")
val idtype = "idfa:imei"
val sc = new SparkContext(conf)
sc.setLogLevel("ERROR")
val sqlContext = new SQLContext(sc)
import org.apache.spark.sql.functions._
import sqlContext.implicits._
val names = Array((1L,"peter","peter",""),(2L,"Leo","Leo",""),
(3L,"Marry","","Marry"), (4L,"Jack","","Jack"),
(5L,"Tom","Tom",""), (6L,"id1","id1",""),
(5L,"Tom","Tom",""), (2L,"Leo","","Leo"),
(2L,"Leo","Leo",""))
val numsDF = sc.parallelize(names, 1).toDF("offset","mcId","idfa","imei")
val names2 = Array((11L,"peter2","peter2",""),(12L,"Leo2","Leo2",""))
val numsDF2 = sc.parallelize(names2, 1).toDF("offset","mcId","idfa","imei")
numsDF2.show(20,false)
numsDF.unionAll(numsDF2).show(20,false)
+------+------+------+----+
|offset|mcId |idfa |imei|
+------+------+------+----+
|11 |peter2|peter2| |
|12 |Leo2 |Leo2 | |
+------+------+------+----+
+------+------+------+-----+
|offset|mcId |idfa |imei |
+------+------+------+-----+
|1 |peter |peter | |
|2 |Leo |Leo | |
|3 |Marry | |Marry|
|4 |Jack | |Jack |
|5 |Tom |Tom | |
|6 |id1 |id1 | |
|5 |Tom |Tom | |
|2 |Leo | |Leo |
|2 |Leo |Leo | |
|11 |peter2|peter2| |
|12 |Leo2 |Leo2 | |
+------+------+------+-----+
2.兩個parquet字段的順序
val names = Array((1L,"peter","peter",""),(2L,"Leo","Leo",""),
(3L,"Marry","","Marry"), (4L,"Jack","","Jack"),
(5L,"Tom","Tom",""), (6L,"id1","id1",""),
(5L,"Tom","Tom",""), (2L,"Leo","","Leo"),
(2L,"Leo","Leo",""))
val numsDF = sc.parallelize(names, 1).toDF("offset","mcId","idfa","imei")
val names2 = Array((10L,"peter2","","peter2"),(11L,"Leo2","","Leo2"))
val numsDF2 = sc.parallelize(names2, 1).toDF("offset","idfa","imei","mcId")
numsDF2.show(20,false)
numsDF.unionAll(numsDF2).show(20,false)
輸出:
+------+------+----+------+
|offset|idfa |imei|mcId |
+------+------+----+------+
|10 |peter2| |peter2|
|11 |Leo2 | |Leo2 |
+------+------+----+------+
發生錯誤的地方在下面。
+------+------+-----+------+
|offset|mcId |idfa |imei |
+------+------+-----+------+
|1 |peter |peter| |
|2 |Leo |Leo | |
|3 |Marry | |Marry |
|4 |Jack | |Jack |
|5 |Tom |Tom | |
|6 |id1 |id1 | |
|5 |Tom |Tom | |
|2 |Leo | |Leo |
|2 |Leo |Leo | |
|10 |peter2| |peter2|
|11 |Leo2 | |Leo2 |
+------+------+-----+------+
閱讀更多 從大數據說起 的文章