Online Statistics Gathering for Bulk Loads

Oracle12c后,在大批量灌数后,提供了自动收集统计信息的方式。

大批量灌数后立即查询是很多数据库的痛点,通常都会因为统计信息不准导致计划出问题,下面总结下Oracle的解决方法:

什么场景会触发自动收集统计信息?(已验证)

  1. create table as。
  2. insert into select必须带APPEND hint,普通的insert into不行。
  3. delete清空表,做bulk insert不会收集统计信息,因为表有segment,必须是空表才行。

分区表:遵循上述规则的前提下

  1. 对父表的insert只会收集父表统计信息。
  2. 对子表的insert只会收集子表统计信息。

收集哪些统计信息?

  1. 行数
  2. 页面数
  3. 平均行长度
SELECT table_name, num_rows, BLOCKS, AVG_SPACE, CHAIN_CNT, AVG_ROW_LEN FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS     BLOCKS  AVG_SPACE  CHAIN_CNT AVG_ROW_LEN
-------------------- ---------- ---------- ---------- ---------- -----------
TAB1                       1000          7          0          0          23

什么时候不会触发自动收集统计信息?

  1. 非空segments(delete ×,truncate ok)
  2. 用户自定义的schema。
  3. 嵌套表、索引组织表、external。
  4. 带虚列的表。
  5. 表的统计信息已经锁定。
  6. PUBLISH为false。
  7. multitable INSERT statement,即insert all into a … into b … into c …。

文档:

  • Bulk Loads后在线收集统计信息
    • “Bulk Loads统计信息收集” 具体触发场景
      • INSERT INTO ... SELECT
      • CREATE TABLE AS SELECT
    • “Bulk Loads统计信息收集” 目标:更准确的统计信息,提升性能
      • 分析性业务往往需要灌入大量数据,老版本中是会建议用户灌完数后主动收集统计信息。
      • 但在实践中,由于疏忽或等维护窗口没有收集统计信息,是生成错误计划的主要原因。
    • “Bulk Loads统计信息收集” 分区表
      • 如果插入父表,则收集父表统计信息,不会收集子表。
      • 如果插入子表,则收集子表统计信息,不会手机父表。
    • “Bulk Loads统计信息收集” 后的直方图?
      • 数据库不会自动生成直方图,需要手动执行GATHER_TABLE_STATS。
      • 注意在bulk loads统计信息收集中,Oracle不会收集列统计信息。
    • “Bulk Loads统计信息收集” 限制,下面场景不会自动收集统计信息:
      • 目标对象已经有数据了。
      • oracle自己的schema。
      • 表类型限制:nested table, index-organized table (IOT), external table, or global temporary table。
      • 表的PUBLISH为false。
      • 表的统计信息已经锁定了。
      • multitable INSERT statement,即insert all into a … into b … into c …
    • “Bulk Loads统计信息收集” 的hint开关:
      • CREATE TABLE employees2 AS SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS*/ * FROM employees
      • CREATE TABLE employees2 AS SELECT /*+GATHER_OPTIMIZER_STATISTICS*/ * FROM employees

一些实验和SQL

场景一:create table as

DROP TABLE tab1 PURGE;
CREATE TABLE tab1 AS SELECT level AS id, 'Description of '||level AS description FROM dual CONNECT BY level <= 1000;

COLUMN table_name FORMAT A20
SELECT table_name, num_rows FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS
-------------------- ----------
TAB1                       1000


SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED FROM DBA_TAB_STATISTICS WHERE TABLE_NAME='TAB1';
  NUM_ROWS AVG_ROW_LEN     BLOCKS EMPTY_BLOCKS LAST_ANAL
---------- ----------- ---------- ------------ ---------
      1000          23          7            0 25-JUN-23

删除统计信息

SELECT table_name, num_rows, BLOCKS, AVG_SPACE, CHAIN_CNT, AVG_ROW_LEN FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS     BLOCKS  AVG_SPACE  CHAIN_CNT AVG_ROW_LEN
-------------------- ---------- ---------- ---------- ---------- -----------
TAB1                       1000          7          0          0          23


EXEC DBMS_STATS.delete_table_stats(USER, 'TAB1');
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'TAB1', options => 'GATHER AUTO');


SELECT table_name, num_rows, BLOCKS, AVG_SPACE, CHAIN_CNT, AVG_ROW_LEN FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS     BLOCKS  AVG_SPACE  CHAIN_CNT AVG_ROW_LEN
-------------------- ---------- ---------- ---------- ---------- -----------
TAB1                       1000          7          0          0          23

场景二:insert into select(必须带APPEND才能收集)

带APPEND直插:统计信息会更新

TRUNCATE TABLE tab1;
INSERT /*+ APPEND */ INTO tab1 SELECT level AS id,'Description of ' || level AS description FROM dual CONNECT BY level <= 500;
COMMIT;

COLUMN table_name FORMAT A20
SELECT table_name, num_rows FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS
-------------------- ----------
TAB1                        500

SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED FROM DBA_TAB_STATISTICS WHERE TABLE_NAME='TAB1';
  NUM_ROWS AVG_ROW_LEN     BLOCKS EMPTY_BLOCKS LAST_ANAL
---------- ----------- ---------- ------------ ---------
       500          23          5            0 25-JUN-23

不带APPEND:统计信息未更新

TRUNCATE TABLE tab1;
INSERT INTO tab1 SELECT level AS id,'Description of ' || level AS description FROM dual CONNECT BY level <= 800;
COMMIT;

COLUMN table_name FORMAT A20
SELECT table_name, num_rows FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS
-------------------- ----------
TAB1                        500

SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED FROM DBA_TAB_STATISTICS WHERE TABLE_NAME='TAB1';
  NUM_ROWS AVG_ROW_LEN     BLOCKS EMPTY_BLOCKS LAST_ANAL
---------- ----------- ---------- ------------ ---------
       500          23          5            0 25-JUN-23

select count(*) from TAB1;
  COUNT(*)
----------
       800

场景三:delete清空表不会收集统计信息,表必须是空的(no segments)

使用DELET清空表,表中其实还是有数据的。表示空的(没有段)是触发收集的必要条件。

DELETE FROM tab1;
COMMIT;
INSERT /*+ APPEND */ INTO tab1 SELECT level AS id,'Description of ' || level AS description FROM dual CONNECT BY level <= 900;
COMMIT;

COLUMN table_name FORMAT A20
SELECT table_name, num_rows FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS
-------------------- ----------
TAB1                        500

SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED FROM DBA_TAB_STATISTICS WHERE TABLE_NAME='TAB1';
  NUM_ROWS AVG_ROW_LEN     BLOCKS EMPTY_BLOCKS LAST_ANAL
---------- ----------- ---------- ------------ ---------
       500          23          5            0 25-JUN-23

使用truncate清空,会收集统计信息。

TRUNCATE TABLE tab1;
INSERT /*+ APPEND */ INTO tab1 SELECT level AS id,'Description of ' || level AS description FROM dual CONNECT BY level <= 1200;
COMMIT;

COLUMN table_name FORMAT A20
SELECT table_name, num_rows FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS
-------------------- ----------
TAB1                       1200

SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED FROM DBA_TAB_STATISTICS WHERE TABLE_NAME='TAB1';
  NUM_ROWS AVG_ROW_LEN     BLOCKS EMPTY_BLOCKS LAST_ANAL
---------- ----------- ---------- ------------ ---------
      1200          23          8            0 25-JUN-23

场景四:使用hint显示关闭该功能

NO_GATHER_OPTIMIZER_STATISTICS可以关闭该功能。

TRUNCATE TABLE tab1;
INSERT /*+ APPEND */ INTO tab1 SELECT /*+ NO_GATHER_OPTIMIZER_STATISTICS */ level AS id,'Description of ' || level AS description FROM dual CONNECT BY level <= 200;
COMMIT;

COLUMN table_name FORMAT A20
SELECT table_name, num_rows FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS
-------------------- ----------
TAB1                       1200

SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED FROM DBA_TAB_STATISTICS WHERE TABLE_NAME='TAB1';
  NUM_ROWS AVG_ROW_LEN     BLOCKS EMPTY_BLOCKS LAST_ANAL
---------- ----------- ---------- ------------ ---------
      1200          23          8            0 25-JUN-23

分区表场景一:父表insert select append只收集父表统计信息

DROP TABLE tab1 PURGE;

CREATE TABLE tab1(id NUMBER, created_date DATE)
PARTITION BY RANGE (created_date)
(PARTITION tab1_2015 VALUES LESS THAN (TO_DATE('01/01/2016', 'DD/MM/YYYY')) TABLESPACE users,
 PARTITION tab1_2016 VALUES LESS THAN (TO_DATE('01/01/2017', 'DD/MM/YYYY')) TABLESPACE users);

INSERT /*+ APPEND */ INTO tab1
SELECT level, TO_DATE('01/01/2016', 'DD/MM/YYYY')
FROM dual CONNECT BY level <= 100;
COMMIT;

COLUMN table_name FORMAT A20
SELECT table_name, num_rows FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS
-------------------- ----------
TAB1                        100

COLUMN table_name FORMAT A20
COLUMN partition_name FORMAT A20
SELECT table_name, partition_name, num_rows FROM user_tab_partitions WHERE table_name = 'TAB1' ORDER BY partition_name;
TABLE_NAME           PARTITION_NAME         NUM_ROWS
-------------------- -------------------- ----------
TAB1                 TAB1_2015
TAB1                 TAB1_2016

删除统计信息

EXEC DBMS_STATS.delete_table_stats(USER, 'TAB1');

SELECT table_name, num_rows FROM user_tables WHERE table_name = 'TAB1';

TABLE_NAME             NUM_ROWS
-------------------- ----------
TAB1

分区表场景二:直接插入子分区表只收集子表统计信息

TRUNCATE TABLE tab1;
INSERT /*+ APPEND */ INTO tab1 PARTITION (TAB1_2016)
SELECT level, TO_DATE('01/01/2016', 'DD/MM/YYYY') FROM dual CONNECT BY level <= 100;
COMMIT;

COLUMN table_name FORMAT A20
SELECT table_name, num_rows FROM user_tables WHERE table_name = 'TAB1';
TABLE_NAME             NUM_ROWS
-------------------- ----------
TAB1


COLUMN table_name FORMAT A20
COLUMN partition_name FORMAT A20
SELECT table_name, partition_name, num_rows FROM user_tab_partitions WHERE table_name = 'TAB1' ORDER BY partition_name;
TABLE_NAME           PARTITION_NAME         NUM_ROWS
-------------------- -------------------- ----------
TAB1                 TAB1_2015
TAB1                 TAB1_2016                   100
06-26 17:21