一、原题描述

Table: Failed

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| fail_date    | date    |
+--------------+---------+
Primary key for this table is fail_date.
Failed table contains the days of failed tasks.

Table: Succeeded

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| success_date | date    |
+--------------+---------+
Primary key for this table is success_date.
Succeeded table contains the days of succeeded tasks.

 

A system is running one task every day. Every task is independent of the previous tasks. The tasks can fail or succeed.

Write an SQL query to generate a report of period_state for each continuous interval of days in the period from 2019-01-01 to 2019-12-31.

period_state is 'failed' if tasks in this interval failed or 'succeeded' if tasks in this interval succeeded. Interval of days are retrieved as start_date and end_date.

Order result by start_date.

The query result format is in the following example:

Failed table:
+-------------------+
| fail_date         |
+-------------------+
| 2018-12-28        |
| 2018-12-29        |
| 2019-01-04        |
| 2019-01-05        |
+-------------------+

Succeeded table:
+-------------------+
| success_date      |
+-------------------+
| 2018-12-30        |
| 2018-12-31        |
| 2019-01-01        |
| 2019-01-02        |
| 2019-01-03        |
| 2019-01-06        |
+-------------------+


Result table:
+--------------+--------------+--------------+
| period_state | start date   | end date     |
+--------------+--------------+--------------+
| succeeded    | 2019-01-01   | 2019-01-03   |
| failed       | 2019-01-04   | 2019-01-05   |
| succeeded    | 2019-01-06   | 2019-01-06   |
+--------------+--------------+--------------+

The report ignored the system state in 2018 as we care about the system in the period 2019-01-01 to 2019-12-31.
From 2019-01-01 to 2019-01-03 all tasks succeeded and the system state was "succeeded".
From 2019-01-04 to 2019-01-05 all tasks failed and system state was "failed".
From 2019-01-06 to 2019-01-06 all tasks succeeded and system state was "succeeded".

二、简要翻译

两张表, Failed 和 Succeeded,用来记录一个每日定时跑的系统任务的失败和成功。要求返回一张结果表,按顺序展示该任务失败和成功的连续时间段以及起止时间。

三、SQL分析

1、将两张表里面的数据按照时间是否连续进行分组。
 1 #失败表
 2 SELECT
 3       fail_date,
 4       IF(
 5         @date = DATE_SUB(fail_date, INTERVAL 1 DAY),
 6         @rank := @rank,
 7         @rank := @rank + 1
 8       ) AS rank,
 9       @date := fail_date
10     FROM
11       Failed,
12       (SELECT
13         @date := NULL,
14         @rank := 0) r
15     WHERE fail_date > '2018-12-31'
16       AND fail_date < '2020-01-01'
17     ORDER BY fail_date;
18
19 #成功表
20 SELECT
21       success_date,
22       IF(
23         @date = DATE_SUB(success_date, INTERVAL 1 DAY),
24         @rank := @rank,
25         @rank := @rank + 1
26       ) AS rank,
27       @date := success_date
28     FROM
29       Succeeded,
30       (SELECT
31         @date := NULL,
32         @rank := 0) r
33     WHERE success_date > '2018-12-31'
34       AND success_date < '2020-01-01'
35     ORDER BY success_date

2、将上一步的两张表分组查询最大日期,最小日期,然后合并再排序

 1 SELECT
 2   *
 3 FROM
 4   (SELECT
 5     'succeeded' AS period_state,
 6     MIN(success_date) AS start_date,
 7     MAX(success_date) AS end_date
 8   FROM
 9     (SELECT
10       success_date,
11       IF(
12         @date = DATE_SUB(success_date, INTERVAL 1 DAY),
13         @rank := @rank,
14         @rank := @rank + 1
15       ) AS rank,
16       @date := success_date
17     FROM
18       Succeeded,
19       (SELECT
20         @date := NULL,
21         @rank := 0) r
22     WHERE success_date > '2018-12-31'
23       AND success_date < '2020-01-01'
24     ORDER BY success_date) t1
25   GROUP BY t1.rank
26   UNION
27   ALL
28   SELECT
29     'failed' AS period_state,
30     MIN(fail_date) AS start_date,
31     MAX(fail_date) AS end_date
32   FROM
33     (SELECT
34       fail_date,
35       IF(
36         @date = DATE_SUB(fail_date, INTERVAL 1 DAY),
37         @rank := @rank,
38         @rank := @rank + 1
39       ) AS rank,
40       @date := fail_date
41     FROM
42       Failed,
43       (SELECT
44         @date := NULL,
45         @rank := 0) r
46     WHERE fail_date > '2018-12-31'
47       AND fail_date < '2020-01-01'
48     ORDER BY fail_date) t2
49   GROUP BY t2.rank) temp
50 ORDER BY start_date ;
 
02-12 01:22