hive如何实现不等值连接

由于 hive 与传统关系型数据库面对的业务场景及底层技术架构都有着很大差异,因此,传统数据库领域的一些技能放到 Hive 中可能已不再适用。因为 hive 受限于 MapReduce 算法模型,只支持 equi-joins(等值 join),低版本的hive不支持非等值连接,那么如何实现非等值连接,今天我们就来讨论下!

hive如何实现不等值连接

一对一或一对多连接

1.准备数据

create table test1 
(
id int
,age int
,name string
,sex string
)
;
insert into test1 values
(1,12,'zs','m')
,(2,15,'ls','m')
,(3,20,'ww','m')
,(4,20,'ll','f')
,(5,23,'xh','f')
;
create table test2
(
id int
,age int
,name string
,class int
)
;
insert into test2 values
(1,11,'zsang',1)
,(2,15,'lsa',1)
,(3,22,'ww',2)
,(4,40,'lling',1)
,(4,45,'llan',1)
,(5,25,'cxh',2)
;

2.想要实现以下需求:

1) 

select
t1.id as id
, t1.name as name1
, t2.name as name2
from test1 t1
left join test2 t2
on t2.name like concat(t1.name,'%')
and t1.name <> t2.name
;
2)
select
t1.id as id
, t1.age as age1
, t2.age as age2
from test1 t1
left join test2 t2
on t1.id=t2.id
and t1.age < t2.age
;

因为hive 不支持不等值连接,所以需要想办法来实现不等值连接,决定先过滤后关联

1)

select 
t1.id
,t1.name as name1
, t2.name as name2
from test1 t1
left join
(
select
t1.id
,t2.name
from test1 t1
inner join test2 t2
where t2.name like concat(t1.name,'%')
and t1.name <> t2.name
) t2
on t1.id =t2.id
;

结果:

hive如何实现不等值连接

2)

select 
t1.id
,t1.age as age1
, t2.age as age2
from test1 t1
left join
(
select
t1.id
,t2.age
from test1 t1
inner join test2 t2
on t1.id =t2.id
where t1.age < t2.age
) t2
on t1.id =t2.id
;

结果:

hive如何实现不等值连接

以上方法只支持一对一或一对多的连接,如果多对多连接就不支持了

多对多连接

1.准备数据

在以上test1表中增加一条数据

insert into test1 values (4,30,'li','f')

此时test1中数据为:

select * from test1;

id age name sex

1 12 zs m

2 15 ls m

3 20 ww m

4 20 ll f

4 30 li f

5 23 xh f

;

select * from test2;

id age name class

1 11 zsang 1

2 15 lsa 1

3 22 ww 2

4 40 lling 1

4 45 llan 1

5 25 cxh 2

若还用上面方法来实现该需求的话

select 
t1.id
,t1.age as age1
, t2.age as age2
from test1 t1
left join
(
select
t1.id
,t2.age
from test1 t1
inner join test2 t2
on t1.id =t2.id
where t1.age < t2.age
) t2
on t1.id =t2.id
;

结果为:

id age1 age2

1 12 null

2 15 null

3 20 22

4 20 40

4 20 40

4 20 45

4 20 45

5 23 25

4 30 40

4 30 40

4 30 45

4 30 45

;

以上结果明显不对,那么该如何实现呢?

select 
t1.id as id
,t1.age as age1
,case when t1.age < t2.age then t2.age else null end as age2
from test1 t1
left join test2 t2
on t1.id=t2.id

结果:

hive如何实现不等值连接

这是我暂时想到的办法,如果大家有什么好的办法,欢迎留言评论,我也会将好的办法更新!!!


分享到:


相關文章: