SQL 映射

聚合管道允许MongoDB 提供原生聚合功能,对应于 SQL 中许多常见的数据聚合操作。比如:GROUP BY、COUNT()、UNION ALL

测试数据

For MySQL

root@localhost 14:40:40 [test]> select * from orders; 
+-----+---------+---------------------+-------+--------+
| _id | cust_id | ord_date            | price | status |
+-----+---------+---------------------+-------+--------+
|   1 | A       | 2023-06-01 00:00:00 |    15 |      1 |
|   2 | A       | 2023-06-08 00:00:00 |    60 |      1 |
|   3 | B       | 2023-06-08 00:00:00 |    55 |      1 |
|   4 | B       | 2023-06-18 00:00:00 |    26 |      1 |
|   5 | B       | 2023-06-19 00:00:00 |    40 |      1 |
|   6 | C       | 2023-06-19 00:00:00 |    38 |      1 |
|   7 | C       | 2023-06-20 00:00:00 |    21 |      1 |
|   8 | D       | 2023-06-20 00:00:00 |    76 |      1 |
|   9 | D       | 2023-06-20 00:00:00 |    51 |      1 |
|  10 | D       | 2023-06-23 00:00:00 |    23 |      1 |
+-----+---------+---------------------+-------+--------+
10 rows in set (0.00 sec)


root@localhost 14:41:19 [test]> select * from orders_item; 
+-----+----------+---------+-----+-------+
| _id | order_id | sku     | qty | price |
+-----+----------+---------+-----+-------+
|   1 |        4 | apple   |  10 |   2.5 |
|   2 |        6 | carrots |  10 |     1 |
|   3 |        6 | apples  |  10 |   2.5 |
|   4 |        1 | apple   |   5 |   2.5 |
|   5 |        1 | apples  |   5 |   2.5 |
|   6 |        2 | apple   |   8 |   2.5 |
|   7 |        2 | banana  |   5 |    10 |
|   8 |        9 | carrots |   5 |     1 |
|   9 |        9 | apples  |  10 |   2.5 |
|  10 |        9 | apple   |  10 |   2.5 |
|  11 |        3 | apple   |  10 |   2.5 |
|  12 |        3 | pears   |  10 |   2.5 |
|  13 |        5 | banana  |   5 |    10 |
|  14 |        7 | apple   |  10 |   2.5 |
|  15 |        8 | banana  |   5 |    10 |
|  16 |        8 | apples  |  10 |   2.5 |
|  17 |       10 | apple   |  10 |   2.5 |
+-----+----------+---------+-----+-------+
17 rows in set (0.01 sec)

For Mongodb :

sit_rs1:PRIMARY> db.orders.find().sort({"_id": 1}); 
{ "_id" : 1, "cust_id" : "A", "ord_date" : ISODate("2023-06-01T00:00:00Z"), "price" : 15, "items" : [ { "sku" : "apple", "qty" : 5, "price" : 2.5 }, { "sku" : "apples", "qty" : 5, "price" : 2.5 } ], "status" : "1" }
{ "_id" : 2, "cust_id" : "A", "ord_date" : ISODate("2023-06-08T00:00:00Z"), "price" : 60, "items" : [ { "sku" : "apple", "qty" : 8, "price" : 2.5 }, { "sku" : "banana", "qty" : 5, "price" : 10 } ], "status" : "1" }
{ "_id" : 3, "cust_id" : "B", "ord_date" : ISODate("2023-06-08T00:00:00Z"), "price" : 55, "items" : [ { "sku" : "apple", "qty" : 10, "price" : 2.5 }, { "sku" : "pears", "qty" : 10, "price" : 2.5 } ], "status" : "1" }
{ "_id" : 4, "cust_id" : "B", "ord_date" : ISODate("2023-06-18T00:00:00Z"), "price" : 26, "items" : [ { "sku" : "apple", "qty" : 10, "price" : 2.5 } ], "status" : "1" }
{ "_id" : 5, "cust_id" : "B", "ord_date" : ISODate("2023-06-19T00:00:00Z"), "price" : 40, "items" : [ { "sku" : "banana", "qty" : 5, "price" : 10 } ], "status" : "1" }
{ "_id" : 6, "cust_id" : "C", "ord_date" : ISODate("2023-06-19T00:00:00Z"), "price" : 38, "items" : [ { "sku" : "carrots", "qty" : 10, "price" : 1 }, { "sku" : "apples", "qty" : 10, "price" : 2.5 } ], "status" : "1" }
{ "_id" : 7, "cust_id" : "C", "ord_date" : ISODate("2023-06-20T00:00:00Z"), "price" : 21, "items" : [ { "sku" : "apple", "qty" : 10, "price" : 2.5 } ], "status" : "1" }
{ "_id" : 8, "cust_id" : "D", "ord_date" : ISODate("2023-06-20T00:00:00Z"), "price" : 76, "items" : [ { "sku" : "banana", "qty" : 5, "price" : 10 }, { "sku" : "apples", "qty" : 10, "price" : 2.5 } ], "status" : "1" }
{ "_id" : 9, "cust_id" : "D", "ord_date" : ISODate("2023-06-20T00:00:00Z"), "price" : 51, "items" : [ { "sku" : "carrots", "qty" : 5, "price" : 1 }, { "sku" : "apples", "qty" : 10, "price" : 2.5 }, { "sku" : "apple", "qty" : 10, "price" : 2.5 } ], "status" : "1" }
{ "_id" : 10, "cust_id" : "D", "ord_date" : ISODate("2023-06-23T00:00:00Z"), "price" : 23, "items" : [ { "sku" : "apple", "qty" : 10, "price" : 2.5 } ], "status" : "1" }

示例一:客户订单统计

按客户分组,统计每个客户订单数量,并计算订单总价格,按价格从高到低排序。 可以使用聚合管道的方式,如下:

$group

  • 按指定的标识符表达式对输入文档进行分组,并将累加器表达式(如果指定)应用于每个组。消耗所有输入文档并为每个不同组输出一个文档。输出文档仅包含标识符字段和累积字段(如果指定)。

$sort

  • 按指定的排序键对文档流重新排序。仅顺序发生变化;文件保持不变。对于每个输入文档,输出一个文档。

SQL 示例:

root@localhost 14:41:26 [test]> SELECT cust_id, count(*), SUM(price) AS total FROM orders GROUP BY cust_id order by total desc; 
+---------+----------+-------+
| cust_id | count(*) | total |
+---------+----------+-------+
| D       |        3 |   150 |
| B       |        3 |   121 |
| A       |        2 |    75 |
| C       |        2 |    59 |
+---------+----------+-------+
4 rows in set (0.00 sec)

MongoDB 示例:

sit_rs1:PRIMARY> db.orders.aggregate( 
... [
...    { $group: { _id: "$cust_id", count: { $sum: 1 }, total: { $sum: "$price" } } }, 
...    { $sort: { total: -1 } }
... ] 
... )
{ "_id" : "D", "count" : 3, "total" : 150 }
{ "_id" : "B", "count" : 3, "total" : 121 }
{ "_id" : "A", "count" : 2, "total" : 75 }
{ "_id" : "C", "count" : 2, "total" : 59 }

示例二:日期订单统计

对于每个唯一的cust_id 按 cust_id、ord_date 分组 ,对price字段求和并仅在总和大于 30时返回。不包括日期的时间部分。

$group

  • 按指定的标识符表达式对输入文档进行分组,并将累加器表达式(如果指定)应用于每个组。消耗所有输入文档并为每个不同组输出一个文档。输出文档仅包含标识符字段和累积字段(如果指定)。

$match

  • 过滤文档流以仅允许匹配的文档未经修改地传递到下一个管道阶段。 $match使用标准 MongoDB 查询。对于每个输入文档,输出一个文档(匹配)或零个文档(不匹配)。

$sort

  • 按指定的排序键对文档流重新排序。仅顺序发生变化;文件保持不变。对于每个输入文档,输出一个文档。

SQL 示例:

root@localhost 14:42:51 [test]> SELECT cust_id, DATE(ord_date),  SUM(price) AS total FROM orders GROUP BY cust_id, DATE(ord_date) HAVING total > 30 order by total desc;
+---------+----------------+-------+
| cust_id | DATE(ord_date) | total |
+---------+----------------+-------+
| D       | 2023-06-20     |   127 |
| A       | 2023-06-08     |    60 |
| B       | 2023-06-08     |    55 |
| B       | 2023-06-19     |    40 |
| C       | 2023-06-19     |    38 |
+---------+----------------+-------+
5 rows in set (0.00 sec)

MongoDB 示例:

sit_rs1:PRIMARY> db.orders.aggregate( 
... [
...    { $group: { _id: { cust_id: "$cust_id",  ord_date: { $dateToString: {  format: "%Y-%m-%d",   date: "$ord_date"  } } }, total: { $sum: "$price" } } },
...    { $match: { total: { $gt: 30 } } },
...    { $sort: { total: -1 } }
... 
... ] 
... )
{ "_id" : { "cust_id" : "D", "ord_date" : "2023-06-20" }, "total" : 127 }
{ "_id" : { "cust_id" : "A", "ord_date" : "2023-06-08" }, "total" : 60 }
{ "_id" : { "cust_id" : "B", "ord_date" : "2023-06-08" }, "total" : 55 }
{ "_id" : { "cust_id" : "B", "ord_date" : "2023-06-19" }, "total" : 40 }
{ "_id" : { "cust_id" : "C", "ord_date" : "2023-06-19" }, "total" : 38 }

示例三:SKU商品统计

对于每个唯一的cust_id 按 用户分组 ,对 items 数组字段进行分解,统计每个用户的 SKU 总数量,如下:

$unwind

  • 从输入文档解构数组字段以输出每个元素的文档。每个输出文档都用一个元素值替换数组。对于每个输入文档,输出n 个文档,其中n是数组元素的数量,对于空数组可以为零。

$group

  • 按指定的标识符表达式对输入文档进行分组,并将累加器表达式(如果指定)应用于每个组。消耗所有输入文档并为每个不同组输出一个文档。输出文档仅包含标识符字段和累积字段(如果指定)。

$sort

  • 按指定的排序键对文档流重新排序。仅顺序发生变化;文件保持不变。对于每个输入文档,输出一个文档。

SQL 示例:

root@localhost 17:58:04 [test]> SELECT cust_id, SUM(i.qty) as qty  FROM orders o,  orders_item i WHERE i.order_id = o._id GROUP BY cust_id order by qty desc; 
+---------+------+
| cust_id | qty  |
+---------+------+
| D       |   50 |
| B       |   35 |
| C       |   30 |
| A       |   23 |
+---------+------+
4 rows in set (0.00 sec)

MongoDB 示例:

sit_rs1:PRIMARY> db.orders.aggregate( 
... [
...    { $unwind: "$items" },
...    { $group: { _id: "$cust_id", qty: { $sum: "$items.qty" } } },
...    { $sort: { qty: -1 }}
... ] 
... )
{ "_id" : "D", "qty" : 50 }
{ "_id" : "B", "qty" : 35 }
{ "_id" : "C", "qty" : 30 }
{ "_id" : "A", "qty" : 23 }
07-28 13:21