问题:

现有一张员工的销售记录表,表样式如下。现在需要统计每个员工在2023年10月份,截止到每天的月累计销售额。注意:存在有的员工在某几天是没有销售记录的。

   sale_date   emp_id  emp_name sale_amount
 '2023-10-02' ,'101' ,'张三' , 1000 
 '2023-10-03' ,'101' ,'张三' , 3000 
 '2023-10-05' ,'101' ,'张三' , 4000 
 '2023-10-10' ,'101' ,'张三' , 2000 
 '2023-10-13' ,'101' ,'张三' , 5000 
 '2023-10-15' ,'101' ,'张三' , 4000 
 '2023-10-27' ,'101' ,'张三' , 2000  
 '2023-10-01' ,'102' ,'李四' , 1111 
 '2023-10-03' ,'102' ,'李四' , 2222 
 '2023-10-08' ,'102' ,'李四' , 3333 
 '2023-10-11' ,'102' ,'李四' , 111 
 '2023-10-23' ,'102' ,'李四' , 4550 
 '2023-10-28' ,'102' ,'李四' , 6666 

SQL解决:

如果是每个员工在每天都有销售记录,那么直接开窗就可以计算出来当月截至到每天的累计销售额。现在由于销售记录有缺失,他虽然在某天没有销售记录,但是他是有当月累计销售额的,且当月累计销售额与前一天的累计销售额一样,这种依然需要统计出来。所以我们首先考虑将每个人每天的销售记录补齐,当天没有销售记录的那么销售额给0,然后就可以开窗计算当月截至到每天的累计销售额了。

with temp as (
select '2023-10-02' as sale_date,'101' as emp_id,'张三' as emp_name, 1000 as sale_amount
union all
select '2023-10-03' as sale_date,'101' as emp_id,'张三' as emp_name, 3000 as sale_amount
union all
select '2023-10-05' as sale_date,'101' as emp_id,'张三' as emp_name, 4000 as sale_amount
union all
select '2023-10-10' as sale_date,'101' as emp_id,'张三' as emp_name, 2000 as sale_amount
union all
select '2023-10-13' as sale_date,'101' as emp_id,'张三' as emp_name, 5000 as sale_amount
union all
select '2023-10-15' as sale_date,'101' as emp_id,'张三' as emp_name, 4000 as sale_amount
union all
select '2023-10-27' as sale_date,'101' as emp_id,'张三' as emp_name, 2000 as sale_amount
union all 
select '2023-10-01' as sale_date,'102' as emp_id,'李四' as emp_name, 1111 as sale_amount
union all
select '2023-10-03' as sale_date,'102' as emp_id,'李四' as emp_name, 2222 as sale_amount
union all
select '2023-10-08' as sale_date,'102' as emp_id,'李四' as emp_name, 3333 as sale_amount
union all 
select '2023-10-11' as sale_date,'102' as emp_id,'李四' as emp_name, 111 as sale_amount
union all
select '2023-10-23' as sale_date,'102' as emp_id,'李四' as emp_name, 4550 as sale_amount
union all
select '2023-10-28' as sale_date,'102' as emp_id,'李四' as emp_name, 6666 as sale_amount
)


select 
    dt
    ,t1.emp_id
    ,t1.emp_name
    ,nvl(t2.sale_amount,0) as sale_amount
    ,sum(if(t2.sale_amount is null ,0,t2.sale_amount))over(partition by t1.emp_id order by t1.dt) as total_sale_amount
from  
(select
    date_add(t.start_date,tab.pos) as dt
    ,t.emp_id
    ,t.emp_name
    from
    (
        select
        emp_id
        ,emp_name
        ,'2023-10-01' as start_date
        ,'2023-10-31' as end_date
        from temp
        group by emp_id,emp_name
    )t
    lateral view posexplode(split(repeat(',',datediff(end_date,start_date)),',')) tab as pos,val
) t1
left join
(
    select
    sale_date
    ,emp_id
    ,emp_name
    ,sale_amount
    from temp
)t2
on t1.dt=t2.sale_date and t1.emp_id=t2.emp_id and t1.emp_name=t2.emp_name
;

补充:repeat函数和space函数

1、repeat函数
在 Hive 中,REPEAT 函数用于将指定字符串重复多次,返回一个新的字符串。它的语法如下:

REPEAT(str, n)

其中,str 表示要重复的字符串,n 表示要重复的次数。

sql1:
SELECT REPEAT('hello', 3) as words;
---结果为
words
hellohellohello

sql2:
SELECT repeat(',',datediff('2023-10-05','2023-10-01')) as symbol;
---结果为:
symbol
,,,,

sql3:
select posexplode(split(repeat(',',datediff('2023-10-05','2023-10-01')),',')) as (pos,val); ----4个逗号,其实是5个位置
---结果为:
pos                val	
0	
1 	
2
3
4

备注:repeat 多和posexplod或explode等函数一起使用;

2、space函数
在 Hive 中,SPACE 函数用于返回由多个空格组成的字符串。它的语法如下:

SPACE(n)

其中,n 表示空格的数量。

sql1:
SELECT SPACE(5) as symbol;
---结果为
symbol

sql2:
select posexplode(split(space(datediff('2023-10-05','2023-10-01')),'')) as (pos,val); ----4个空格,其实是5个位置
---结果为:
pos                val	
0	
1 	
2
3
4

备注:space多和posexplod或explode等函数一起使用;

12-28 03:12