Author: 陈江(恬泰)
一个逻辑运算符析取合取式(Item_cond)下如果有多个子查询,可以尝试对这多个子查询进行折叠,折叠成1个或者改写为true、false。 显而易见可以大大加速sql执行。
注:该功能在PolarDB 8.02.2.22 版本中, 默认在ro上开启,在rw上开启需设置如下参数
set subquery_coalesce_mode=on; // 开启折叠,默认只执行启发式折叠
set subquery_coalesce=on; // 开启基于代价折叠,折叠会增加新算子,大部分场景是个better plan,但并不总是
// 如果开启了cbqt,会忽略这个开关,基于代价进行枚举。
规则生效场景:
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE c2 = 0) --子查询1
AND EXISTS (SELECT 1 FROM t2) --子查询2
==》
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE c2 = 0)
*注:
子查询1是子查询2的子集,子查询2直接消除掉
SELECT * FROM t1 WHERE EXIST (SELECT 1 FROM t2 WHERE c2 = 0) --子查询1
or EXIST (SELECT 1 FROM t2) --子查询2
==》
SELECT * FROM t1 WHERE EXISTs (SELECT 1 FROM t2)
*注: or条件下改写为 false or EXISTs (SELECT 1 FROM t2), 保留大集合
select * from t1 where exists (select t1.a as f from t1 where a >10 and b < 10)
or exists (select a from t1 where a > 10 and c <3);
=>
select * from t1 where exists (select t1.a as f from t1 where a >10 and (b < 10 or c <3);
也可以是and_cond表达式
select * from t1 where not exists (select t1.a as f from t1 where a >10 and b < 10)
and not exists (select a from t1 where a > 10 and c <3);
=>
select * from t1 where not exists (select t1.a as f from t1 where a >10 and (b < 10 or c <3);
场景:
SELECT * FROM t1 WHERE EXISTs (SELECT 1 FROM t2 WHERE c1 = 0) --子查询1
AND NOT EXISTs (SELECT 1 FROM t2) --子查询2
==>
SELECT * FROM t1 WHERE false
场景:
>any vs <all、<=all
<any vs >all, >=all
SELECT * FROM t1 WHERE t1.c1 > any (SELECT c1 FROM t2 where c1 > 100) and
t1.c1 < all (select c1 from t2 where c1 > 10)
==>
SELECT * FROM t1 WHERE false; //any 是all 子集
SELECT * FROM t1 WHERE EXISTs (SELECT 1 FROM t2 ) --子查询1
or NOT EXISTs (SELECT 1 FROM t2 WHERE c1 = 0) --子查询2
==>
SELECT * FROM t1 WHERE true
SELECT * FROM t1 WHERE EXIST (SELECT 1 FROM t2) --子查询1
AND NOT EXIST (SELECT 1 FROM t2 WHERE c2 = 0) --子查询2
=>
SELECT * FROM t1 WHERE EXIST (SELECT 1 FROM t2 having sum (case when extra_cond 1 else 0) ==0)
注:
sql/耗时(64dop) | 折叠(增加sum算子) | 不折叠(semijoin执行) |
---|---|---|
Q21 | 3 min 50 sec | 2 min 49.45 sec |
场景:只支持in/not in子查询
SELECT * FROM t1 WHERE t1.c1 = any (SELECT c1 FROM t2 where c1 > 10) and
t1.c1 != all (select c1 from t2 where c1 > 100)
==>
SELECT * FROM t1 WHERE t1.c1 =
any (SELECT c1 FROM t2 where c1 > 10 and LNNVL(c1 >100));
注: 该plan也是基于代价进行改写的,通常都是改写更优。
SELECT * FROM t1 WHERE t1.c1 = any (SELECT c1 FROM t2 ) or
t1.c1 != all (select c1 from t2 where c1 > 100)
==>
SELECT * FROM t1 WHERE 1;
PolarDB子查询折叠基本上是能转尽转,可以折叠在where条件, join on条件, having条件内部的任何冗余子查询, 能大大加速SQL执行。