本文介绍了使用Impala获取连续旅行的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

样本数据

touristid|day
ABC|1
ABC|1
ABC|2
ABC|4
ABC|5
ABC|6
ABC|8
ABC|10

输出应为

touristid|trip
ABC|4

4后的逻辑是连续天数,连续天数sqq 1,1,2为第一,然后4,5,6为第二,然后8为第三,而10为第四我想要使​​用impala查询的输出

Logic behind 4 is count of consecutive days distinct consecutive days sqq 1,1,2 is 1st then 4,5,6 is 2nd then 8 is 3rd and 10 is 4thI want this output using impala query

推荐答案

使用lag()函数获取前一天,如果day-prev_day> 1,则计算new_trip_flag,然后计数(new_trip_flag).

Get previous day using lag() function, calculate new_trip_flag if the day-prev_day>1, then count(new_trip_flag).

演示:

with table1 as (
select 'ABC' as touristid, 1  as day union all
select 'ABC' as touristid, 1  as day union all
select 'ABC' as touristid, 2  as day union all
select 'ABC' as touristid, 4  as day union all
select 'ABC' as touristid, 5  as day union all
select 'ABC' as touristid, 6  as day union all
select 'ABC' as touristid, 8  as day union all
select 'ABC' as touristid, 10 as day
)

select touristid, count(new_trip_flag) trip_cnt
  from
       ( -- calculate new_trip_flag
         select touristid,
                case when (day-prev_day) > 1 or prev_day is NULL then true end  new_trip_flag
           from
                ( -- get prev_day
                  select touristid, day,
                         lag(day) over(partition by touristid order by day) prev_day
                    from table1
                )s
        )s
 group by touristid;

结果:

touristid   trip_cnt
ABC         4

在Hive中也一样.

这篇关于使用Impala获取连续旅行的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 07:03