Spark的unionAll我可能用錯了!

Spark的unionAll我可能用錯了!

SparkSql要符合sql標準,在兩個表作union的時候,字段的順序需要一一對應,否則結果會錯誤。同樣spark sql也要遵守。兩個parquet作union的時候字段一定要對應上。否則最後結果會是錯誤的。

下面兩個例子:

1.兩個parquet的字段順序一致

val conf = new SparkConf().setAppName("UDAF").setMaster("local")

val idtype = "idfa:imei"

val sc = new SparkContext(conf)

sc.setLogLevel("ERROR")

val sqlContext = new SQLContext(sc)

import org.apache.spark.sql.functions._

import sqlContext.implicits._

val names = Array((1L,"peter","peter",""),(2L,"Leo","Leo",""),

(3L,"Marry","","Marry"), (4L,"Jack","","Jack"),

(5L,"Tom","Tom",""), (6L,"id1","id1",""),

(5L,"Tom","Tom",""), (2L,"Leo","","Leo"),

(2L,"Leo","Leo",""))

val numsDF = sc.parallelize(names, 1).toDF("offset","mcId","idfa","imei")

val names2 = Array((11L,"peter2","peter2",""),(12L,"Leo2","Leo2",""))

val numsDF2 = sc.parallelize(names2, 1).toDF("offset","mcId","idfa","imei")

numsDF2.show(20,false)

numsDF.unionAll(numsDF2).show(20,false)

+------+------+------+----+

|offset|mcId |idfa |imei|

+------+------+------+----+

|11 |peter2|peter2| |

|12 |Leo2 |Leo2 | |

+------+------+------+----+

+------+------+------+-----+

|offset|mcId |idfa |imei |

+------+------+------+-----+

|1 |peter |peter | |

|2 |Leo |Leo | |

|3 |Marry | |Marry|

|4 |Jack | |Jack |

|5 |Tom |Tom | |

|6 |id1 |id1 | |

|5 |Tom |Tom | |

|2 |Leo | |Leo |

|2 |Leo |Leo | |

|11 |peter2|peter2| |

|12 |Leo2 |Leo2 | |

+------+------+------+-----+

2.兩個parquet字段的順序

val names = Array((1L,"peter","peter",""),(2L,"Leo","Leo",""),

(3L,"Marry","","Marry"), (4L,"Jack","","Jack"),

(5L,"Tom","Tom",""), (6L,"id1","id1",""),

(5L,"Tom","Tom",""), (2L,"Leo","","Leo"),

(2L,"Leo","Leo",""))

val numsDF = sc.parallelize(names, 1).toDF("offset","mcId","idfa","imei")

val names2 = Array((10L,"peter2","","peter2"),(11L,"Leo2","","Leo2"))

val numsDF2 = sc.parallelize(names2, 1).toDF("offset","idfa","imei","mcId")

numsDF2.show(20,false)

numsDF.unionAll(numsDF2).show(20,false)

輸出:

+------+------+----+------+

|offset|idfa |imei|mcId |

+------+------+----+------+

|10 |peter2| |peter2|

|11 |Leo2 | |Leo2 |

+------+------+----+------+

發生錯誤的地方在下面。

+------+------+-----+------+

|offset|mcId |idfa |imei |

+------+------+-----+------+

|1 |peter |peter| |

|2 |Leo |Leo | |

|3 |Marry | |Marry |

|4 |Jack | |Jack |

|5 |Tom |Tom | |

|6 |id1 |id1 | |

|5 |Tom |Tom | |

|2 |Leo | |Leo |

|2 |Leo |Leo | |

|10 |peter2| |peter2|

|11 |Leo2 | |Leo2 |

+------+------+-----+------+


分享到:


相關文章: