我最近在我的数据库中提高了正常化的级别,从以下几个方面着手:

+--------------------------------------+
| state_changes                        |
+----+-------+-----------+------+------+
| ID | Name  | Timestamp | Val1 | Val2 |
+----+-------+-----------+------+------+
| 0  | John  | 17:19:01  |  A   |  X   |
| 1  | Bob   | 17:19:02  |  E   |  W   |
| 2  | John  | 17:19:05  |  E   |  Y   |
| 3  | John  | 17:19:06  |  B   |  Y   |
| 4  | John  | 17:19:12  |  C   |  Z   |
| 5  | John  | 17:19:15  |  A   |  Z   |
+----+-------+-----------+------+------+

更像这样:
+-------------------------------+   +-------------------------------+
| state_changes_1               |   | state_changes_2               |
+----+-------+-----------+------+   +----+-------------------+------+
| ID | Name  | Timestamp | Val1 |   | ID | Name  | Timestamp | Val2 |
+----+-------+-----------+------+   +----+-------+-----------+------+
| 0  | John  | 17:19:01  |  A   |   | 0  | John  | 17:19:01  |  X   |
| 1  | Bob   | 17:19:02  |  E   |   | 1  | Bob   | 17:19:02  |  W   |
| 2  | John  | 17:19:05  |  E   |   | 2  | John  | 17:19:05  |  Y   |
| 3  | John  | 17:19:06  |  B   |   | 3  | John  | 17:19:06  |  Y   |
| 4  | John  | 17:19:12  |  C   |   | 4  | John  | 17:19:12  |  Z   |
| 5  | John  | 17:19:15  |  A   |   | 5  | John  | 17:19:15  |  Z   |
+----+-------+-----------+------+   +----+-------+-----------+------+

我现在如何编写查询来“压缩”两个值重复的结果表?
我想在考虑行唯一性时忽略ID字段;
我想在考虑行唯一性时忽略Timestamp
但是字段必须是连续的(在Name,Timestamp顺序下)才能被认为是重复的。
在本例中,结果应该是:
+-------------------------------+   +-------------------------------+
| state_changes_1               |   | state_changes_2               |
+----+-------+-----------+------+   +----+-------+-----------+------+
| ID | Name  | Timestamp | Val1 |   | ID | Name  | Timestamp | Val2 |
+----+-------+-----------+------+   +----+-------+-----------+------+
| 0  | John  | 17:19:01  |  A   |   | 0  | John  | 17:19:01  |  X   |
| 1  | Bob   | 17:19:02  |  E   |   | 1  | Bob   | 17:19:02  |  W   |
| 3  | John  | 17:19:06  |  B   |   | 2  | John  | 17:19:05  |  Y   |
| 4  | John  | 17:19:12  |  C   |   | 4  | John  | 17:19:12  |  Z   |
| 5  | John  | 17:19:15  |  A   |   +----+-------+-----------+------+
+----+-------+-----------+------+

我的表有几十亿行,所以我在寻找一些考虑到效率的东西;也就是说,我是一个现实的人,所以我很高兴查询需要一两个小时来运行(包括索引重建)。

最佳答案

我在mysql 5.1.58上尝试过这个,它似乎可以处理您的测试数据。

SET @name = NULL;
SET @val1 = NULL;

UPDATE state_changes_1
SET Val1 = IF(Name=@name AND Val1=@val1, NULL, (@val1:=Val1)),
    Name = (@name:=Name)
ORDER BY Name, `Timestamp`;

DELETE FROM state_changes_1 WHERE Val1 IS NULL;

关于mysql - 标准化后压缩表格,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/7474565/

10-16 14:40