本文介绍了BigQuery使用WHERE子句将rank/percent_rank应用于列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当大的bigquery表,其中包含约20-30个不同的列,每个列都需要接收一个互补的 percentile 列,该列显示了该列与表中所有其他行相比的百分位值.但是,如果另一列中的值满足某个阈值,则列中的每个列都将收到百分位数值.为了展示这一点,我在下面创建了一个可复制的示例:

I have a fairly wide bigquery table with ~20-30 different columns, each of which needs to receive a complementary percentile column, that shows the column's percentile value compared to all other rows in the table. However, each of the columns should only receive a percentile value if the value in another column meets a certain threshold. To showcase this, I created a reproducible example below:

WITH
  correct_games_played AS
    (
      SELECT "a" as name, 7 as num1, 0.4 as num2, 0.55 as num3
      UNION ALL SELECT "b" as name, 13 as num1, 0.53 as num2, 0.37 as num3
      UNION ALL SELECT "c" as name, 4 as num1, 0.42 as num2, 0.32 as num3
      UNION ALL SELECT "d" as name, 17 as num1, 0.6 as num2, 0.23 as num3
      UNION ALL SELECT "e" as name, 7 as num1, 0.3 as num2, 0.25 as num3
      UNION ALL SELECT "f" as name, 16 as num1, 0.7 as num2, 0.43 as num3
      UNION ALL SELECT "g" as name, 10 as num1, 0.53 as num2, 0.52 as num3
      UNION ALL SELECT "h" as name, 5 as num1, 0.54 as num2, 0.21 as num3
      UNION ALL SELECT "i" as name, 9 as num1, 0.56 as num2, 0.17 as num3
      UNION ALL SELECT "j" as name, 3 as num1, 0.75 as num2, 0.53 as num3
    )

  SELECT 
    a.*,
    -- RANK() OVER(ORDER BY a.num1 DESC) AS num1_rank,
    -- RANK() OVER(ORDER BY a.num2 DESC) AS num2_rank,
    -- RANK() OVER(ORDER BY a.num3 DESC) AS num3_rank
    RANK() OVER(ORDER BY a.num1 DESC) AS num1_rank,
    RANK() OVER(ORDER BY a.num2 WHERE a.num1 > 4 DESC) AS num2_rank
    RANK() OVER(ORDER BY a.num3 WHERE a.num1 > 3 DESC) AS num3_rank
  FROM correct_games_played as a

此脚本引发错误语法错误:预期为)",但在[22:37] 处获得了关键字WHERE,但是如果我替换 rank(),此脚本将起作用加上注释掉的 rank().我的目标真的很简单:

This script throws the error Syntax error: Expected ")" but got keyword WHERE at [22:37], however this works if i replace the rank() with the commented out rank(). My objective is really quite simple:

  • num2_rank :如果 a.num1 大于4,则仅对 a.num2 中的值进行排名,否则显示 null
  • num3_rank :如果 a.num1 大于3,则仅对 a.num3 中的值进行排名,否则显示 null
  • num2_rank: only rank values in a.num2 if a.num1 is greater than 4, otherwise display a null value
  • num3_rank: only rank values in a.num3 if a.num1 is greater than 3, otherwise display a null value

我的表很宽,并且每列都有可能需要自己的条件来确定是否应对每列的行值进行排名.任何帮助,将不胜感激,谢谢!

My table is quite wide, and there's a chance that each column will require its own condition to determine if each columns' row's values should be ranked or not. Any help with this would be greatly appreciated, thanks!

推荐答案

以下是BigQuery标准SQL

Below is for BigQuery Standard SQL

#standardSQL
WITH correct_games_played AS (
  SELECT "a" AS name, 7 AS num1, 0.4 AS num2, 0.55 AS num3 UNION ALL 
  SELECT "b" AS name, 13 AS num1, 0.53 AS num2, 0.37 AS num3 UNION ALL 
  SELECT "c" AS name, 4 AS num1, 0.42 AS num2, 0.32 AS num3 UNION ALL 
  SELECT "d" AS name, 17 AS num1, 0.6 AS num2, 0.23 AS num3 UNION ALL 
  SELECT "e" AS name, 7 AS num1, 0.3 AS num2, 0.25 AS num3 UNION ALL 
  SELECT "f" AS name, 16 AS num1, 0.7 AS num2, 0.43 AS num3 UNION ALL 
  SELECT "g" AS name, 10 AS num1, 0.53 AS num2, 0.52 AS num3 UNION ALL 
  SELECT "h" AS name, 5 AS num1, 0.54 AS num2, 0.21 AS num3 UNION ALL 
  SELECT "i" AS name, 9 AS num1, 0.56 AS num2, 0.17 AS num3 UNION ALL 
  SELECT "j" AS name, 3 AS num1, 0.75 AS num2, 0.53 AS num3
)
SELECT *,
  RANK() OVER(ORDER BY num1 DESC) AS num1_rank,
  IF(num1 > 4, RANK() OVER(ORDER BY IF(num1 > 4, num2, NULL) DESC), NULL)  AS num2_rank,
  IF(num1 > 3, RANK() OVER(ORDER BY IF(num1 > 3, num3, NULL) DESC), NULL) AS num3_rank
FROM correct_games_played

这篇关于BigQuery使用WHERE子句将rank/percent_rank应用于列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 12:33