sql - 根据最新状态/属性值检索记录

在Redshift和plan中使用非规范化结构是为了继续创建记录，并且在检索时只考虑针对用户的最新属性。
下表：

user_id   state  created_at
1         A      15-10-2015 02:00:00 AM
2         A      15-10-2015 02:00:01 AM
3         A      15-10-2015 02:00:02 AM
1         B      15-10-2015 02:00:03 AM
4         A      15-10-2015 02:00:04 AM
5         B      15-10-2015 02:00:05 AM

所需的结果集是：

user_id   state  created_at
2         A      15-10-2015 02:00:01 AM
3         A      15-10-2015 02:00:02 AM
4         A      15-10-2015 02:00:04 AM

我有检索上述结果的查询：

select user_id, first_value AS state
from (
   select user_id, first_value(state) OVER (
                     PARTITION BY user_id
                     ORDER BY created_at desc
                     ROWS between UNBOUNDED PRECEDING and CURRENT ROW)
   from customer_properties
   order by created_at) t
where first_value = 'A'

这是检索的最佳方法还是可以改进查询？

最佳答案

最佳查询取决于各种细节：查询谓词的选择性、基数、数据分布。如果state = 'A'是一个选择性条件（视图行符合条件），则此查询应该快得多：

SELECT c.user_id, c.state
FROM   customer_properties c
LEFT   JOIN customer_properties c1 ON c1.user_id = c.user_id
                                  AND c1.created_at > c.created_at
WHERE  c.state = 'A'
AND    c1.user_id IS NULL;

如果(state)（甚至(state, user_id, created_at)）上有一个索引，而(user_id, created_at)上有另一个索引。
有多种方法来确保该行的后期版本不存在：
Select rows which are not present in other table
如果'A'是state中的常用值，则此更通用的查询将更快：

SELECT user_id, state
FROM (
   SELECT user_id, state
        , row_number() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
   FROM   customer_properties
   ) t
WHERE  t.rn = 1
AND    t.state = 'A';

我删除了NULLS LAST，假设created_at定义为NOT NULL。另外，我不认为Redshift有：
PostgreSQL sort by datetime asc, null first?
两个查询都应该使用Redshift的有限功能。对于现代博士后，有更好的选择：
Select first row in each GROUP BY group?
Optimize GROUP BY query to retrieve latest record per user
如果最新行匹配，则原始行将返回每个user_id的所有行。你得把重复的，不必要的工作。。。

关于sql - 根据最新状态/属性值检索记录，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/33135937/