Calcite源码分析,参考:

http://matt33.com/2019/03/07/apache-calcite-process-flow/

https://matt33.com/2019/03/17/apache-calcite-planner/

Rule作为Calcite查询优化的核心,

具体看几个有代表性的Rule,看看是如何实现的

最简单的例子,Join结合律,JoinAssociateRule

首先所有的Rule都继承RelOptRule类

/**
* Planner rule that changes a join based on the associativity rule.
*
* <p>((a JOIN b) JOIN c) &rarr; (a JOIN (b JOIN c))</p>
*
* <p>We do not need a rule to convert (a JOIN (b JOIN c)) &rarr;
* ((a JOIN b) JOIN c) because we have
* {@link JoinCommuteRule}.
*
* @see JoinCommuteRule
*/
public class JoinAssociateRule extends RelOptRule {

RelOptRule用于transform expression

它维护的Operand tree表明,该rule可以适用于何种树结构,具体看下面的例子

/**
* A <code>RelOptRule</code> transforms an expression into another. It has a
* list of {@link RelOptRuleOperand}s, which determine whether the rule can be
* applied to a particular section of the tree.
*
* <p>The optimizer figures out which rules are applicable, then calls
* {@link #onMatch} on each of them.</p>
*/
public abstract class RelOptRule { /**
* Root of operand tree.
*/
private final RelOptRuleOperand operand; /** Factory for a builder for relational expressions.
*
* <p>The actual builder is available via {@link RelOptRuleCall#builder()}. */
public final RelBuilderFactory relBuilderFactory; /**
* Flattened list of operands.
*/
public final List<RelOptRuleOperand> operands; //~ Constructors ----------------------------------------------------------- /**
* Creates a rule with an explicit description.
*
* @param operand root operand, must not be null
* @param description Description, or null to guess description
* @param relBuilderFactory Builder for relational expressions
*/
public RelOptRule(RelOptRuleOperand operand,
RelBuilderFactory relBuilderFactory, String description) {
this.operand = Objects.requireNonNull(operand);
this.relBuilderFactory = Objects.requireNonNull(relBuilderFactory);this.description = description;
this.operands = flattenOperands(operand);
assignSolveOrder();
}

比如对于Join结合律,调用super,即RelOptRule的构造函数

  /**
* Creates a JoinAssociateRule.
*/
public JoinAssociateRule(RelBuilderFactory relBuilderFactory) {
super(
operand(Join.class,
operand(Join.class, any()),
operand(RelSubset.class, any())),
relBuilderFactory, null);
}

operand也是一个树形结构,Top的Operand的类型是Join,他有两个children,其中一个也是join,另一个是RelSubset

他们的children是any()

  /**
* Creates a list of child operands that signifies that the operand matches
* any number of child relational expressions.
*
* @return List of child operands that signifies that the operand matches
* any number of child relational expressions
*/
public static RelOptRuleOperandChildren any() {
return RelOptRuleOperandChildren.ANY_CHILDREN;
}

然后要理解Rule如果被使用的,来看看HepPlaner里面的代码

因为HepPlaner逻辑比较简单,就是遍历所有的HepRelVertex,看看RelOptRule是否可以匹配,如果匹配就应用Rule

所以会调用到applyRule,输入包含Rule和Vertex

  private HepRelVertex applyRule(
RelOptRule rule,
HepRelVertex vertex,
boolean forceConversions) { final List<RelNode> bindings = new ArrayList<>();
final Map<RelNode, List<RelNode>> nodeChildren = new HashMap<>();
boolean match =
matchOperands(
rule.getOperand(),
vertex.getCurrentRel(),
bindings,
nodeChildren); if (!match) {
return null;
} HepRuleCall call =
new HepRuleCall(
this,
rule.getOperand(),
bindings.toArray(new RelNode[0]),
nodeChildren,
parents); // Allow the rule to apply its own side-conditions.
if (!rule.matches(call)) {
return null;
} fireRule(call); if (!call.getResults().isEmpty()) {
return applyTransformationResults(
vertex,
call,
parentTrait);
}
return null;
}

1. 首先调用,matchOperands,看看是否match,

  private boolean matchOperands(
RelOptRuleOperand operand, //Rule中的Operand
RelNode rel, //vertex中的RelNode
List<RelNode> bindings,
Map<RelNode, List<RelNode>> nodeChildren) {
if (!operand.matches(rel)) { //先比较top node和operand是否match
return false;
}
bindings.add(rel); //如果match,把top relnode加入bindings
//接着来比较child是否match
//child有几种类型:
//Any,随意,直接返回true
//Unordered,无序,对于每个operand只要有任何一个child RelNode可满足即可
//Default,Some,Operand和RelNode严格有序匹配
List<HepRelVertex> childRels = (List) rel.getInputs();
switch (operand.childPolicy) {
case ANY:
return true;
case UNORDERED:
// For each operand, at least one child must match. If
// matchAnyChildren, usually there's just one operand.
for (RelOptRuleOperand childOperand : operand.getChildOperands()) {
boolean match = false;
for (HepRelVertex childRel : childRels) {
match =
matchOperands( //递归调用matchOperands
childOperand,
childRel.getCurrentRel(),
bindings,
nodeChildren);
if (match) {
break;
}
}
if (!match) {
return false;
}
}
final List<RelNode> children = new ArrayList<>(childRels.size());
for (HepRelVertex childRel : childRels) {
children.add(childRel.getCurrentRel());
}
nodeChildren.put(rel, children);
return true;
default:
int n = operand.getChildOperands().size();
if (childRels.size() < n) {
return false;
}
//一一按顺序对应match
for (Pair<HepRelVertex, RelOptRuleOperand> pair
: Pair.zip(childRels, operand.getChildOperands())) {
boolean match =
matchOperands(
pair.right,
pair.left.getCurrentRel(),
bindings,
nodeChildren);
if (!match) {
return false;
}
}
return true;
}
}

逻辑是先比较自身,然后再递归比较children

比较函数,

可以看出,从类型,Trait,Predicate上来比较,是否匹配

 /**
* Returns whether a relational expression matches this operand. It must be
* of the right class and trait.
*/
public boolean matches(RelNode rel) {
if (!clazz.isInstance(rel)) {
return false;
}
if ((trait != null) && !rel.getTraitSet().contains(trait)) {
return false;
}
return predicate.test(rel);
}

child的类型分为,

/**
* Policy by which operands will be matched by relational expressions with
* any number of children.
*/
public enum RelOptRuleOperandChildPolicy {
/**
* Signifies that operand can have any number of children.
*/
ANY, /**
* Signifies that operand has no children. Therefore it matches a
* leaf node, such as a table scan or VALUES operator.
*
* <p>{@code RelOptRuleOperand(Foo.class, NONE)} is equivalent to
* {@code RelOptRuleOperand(Foo.class)} but we prefer the former because
* it is more explicit.</p>
*/
LEAF, /**
* Signifies that the operand's children must precisely match its
* child operands, in order.
*/
SOME, /**
* Signifies that the rule matches any one of its parents' children.
* The parent may have one or more children.
*/
UNORDERED,
}

不同的类型按照注释中的规则进行match

2. 如果match,继续会把当前结果封装成HepRuleCall,继承自RelOptRuleCall

是RelOptRule的一次invocation,即会记录调用中用到的上下文数据和结果

/**
* A <code>RelOptRuleCall</code> is an invocation of a {@link RelOptRule} with a
* set of {@link RelNode relational expression}s as arguments.
*/
public abstract class RelOptRuleCall { /**
* Generator for {@link #id} values.
*/
private static int nextId = 0; //~ Instance fields -------------------------------------------------------- public final int id;
protected final RelOptRuleOperand operand0; //Rule的Operand树的root
protected Map<RelNode, List<RelNode>> nodeInputs; //所有的RelNode和他们的inputs
public final RelOptRule rule;
public final RelNode[] rels; //match到的所有RelNodes
private final RelOptPlanner planner;
private final List<RelNode> parents; //对于Top RelNodes而言的parents

3. 调用rule.matches(call)

默认实现是,return true,即不检查,这里允许Rule加一下特定的match检测

4. 调用fireRule(call)

而其中主要是调用,ruleCall.getRule().onMatch(ruleCall);

所以做的事情是需要在Rule的onMatch里面定义的

下面看下JoinAssociateRule的实现,

  public void onMatch(final RelOptRuleCall call) {
final Join topJoin = call.rel(0);
final Join bottomJoin = call.rel(1);
final RelNode relA = bottomJoin.getLeft();
final RelNode relB = bottomJoin.getRight();
final RelSubset relC = call.rel(2);
final RelOptCluster cluster = topJoin.getCluster();
final RexBuilder rexBuilder = cluster.getRexBuilder(); if (relC.getConvention() != relA.getConvention()) {
// relC could have any trait-set. But if we're matching say
// EnumerableConvention, we're only interested in enumerable subsets.
return;
} // topJoin
// / \
// bottomJoin C
// / \
// A B final int aCount = relA.getRowType().getFieldCount();
final int bCount = relB.getRowType().getFieldCount();
final int cCount = relC.getRowType().getFieldCount();
final ImmutableBitSet aBitSet = ImmutableBitSet.range(0, aCount);
final ImmutableBitSet bBitSet =
ImmutableBitSet.range(aCount, aCount + bCount);

这部分都是在初始化,上面好理解,下面算这些count是干啥?

其实是在算RexInputRef,

RexNode和RelNode的共同点是,他们都代表一个树

差别是他们代表的东西不同,RelNode代表关系代数算子组成的数,而RexNode表示表达式树

比如RexLiteral表示常量,RexVariable表示变量,RexCall表示操作来连接Literal和Variable

RexVariable往往表示输入的某个field,为了效率,这里只会记录field的id,即这个RexInputRef

/**
* Variable which references a field of an input relational expression.
*
* <p>Fields of the input are 0-based. If there is more than one input, they are
* numbered consecutively. For example, if the inputs to a join are</p>
*
* <ul>
* <li>Input #0: EMP(EMPNO, ENAME, DEPTNO) and</li>
* <li>Input #1: DEPT(DEPTNO AS DEPTNO2, DNAME)</li>
* </ul>
*
* <p>then the fields are:</p>
*
* <ul>
* <li>Field #0: EMPNO</li>
* <li>Field #1: ENAME</li>
* <li>Field #2: DEPTNO (from EMP)</li>
* <li>Field #3: DEPTNO2 (from DEPT)</li>
* <li>Field #4: DNAME</li>
* </ul>
*
* <p>So <code>RexInputRef(3, Integer)</code> is the correct reference for the
* field DEPTNO2.</p>
*/
public class RexInputRef extends RexSlot {

而RexInputRef的index不是固定的,是按顺序排的,

所以按上面把fieldcount都算出来,就可以排出对应的index

接着做的一堆都是为了调整conditions的位置和相对应的index

    // Goal is to transform to
//
// newTopJoin
// / \
// A newBottomJoin
// / \
// B C // Split the condition of topJoin and bottomJoin into a conjunctions. A
// condition can be pushed down if it does not use columns from A.
final List<RexNode> top = new ArrayList<>();
final List<RexNode> bottom = new ArrayList<>();
JoinPushThroughJoinRule.split(topJoin.getCondition(), aBitSet, top, bottom);
JoinPushThroughJoinRule.split(bottomJoin.getCondition(), aBitSet, top,
bottom); // Mapping for moving conditions from topJoin or bottomJoin to
// newBottomJoin.
// target: | B | C |
// source: | A | B | C |
final Mappings.TargetMapping bottomMapping =
Mappings.createShiftMapping(
aCount + bCount + cCount,
0, aCount, bCount,
bCount, aCount + bCount, cCount);
final List<RexNode> newBottomList = new ArrayList<>();
new RexPermuteInputsShuttle(bottomMapping, relB, relC)
.visitList(bottom, newBottomList);
RexNode newBottomCondition =
RexUtil.composeConjunction(rexBuilder, newBottomList);

1. 把原先TopJoin和BottomJoin里面的conditions,分成和A相关的和无关的

因为调整完以后,和A相关的conditions要放到TopJoin,而和A无关的需要放到BottomJoin

aBitSet里面记录了所有A的fields的index,

  /**
* Splits a condition into conjunctions that do or do not intersect with
* a given bit set.
*/
static void split(
RexNode condition,
ImmutableBitSet bitSet,
List<RexNode> intersecting,
List<RexNode> nonIntersecting) {
for (RexNode node : RelOptUtil.conjunctions(condition)) {//把conjunction的条件拆分开
ImmutableBitSet inputBitSet = RelOptUtil.InputFinder.bits(node); //找出condition所用到的fields
if (bitSet.intersects(inputBitSet)) {
intersecting.add(node);
} else {
nonIntersecting.add(node);
}
}
}

如何找出condition所用到的fields?

如下,在递归的过程做回把所有碰到的inputRef记录下来

    /**
* Returns a bit set describing the inputs used by an expression.
*/
public static ImmutableBitSet bits(RexNode node) {
return analyze(node).inputBitSet.build();
} /** Returns an input finder that has analyzed a given expression. */
public static InputFinder analyze(RexNode node) {
final InputFinder inputFinder = new InputFinder();
node.accept(inputFinder); //Visitor模式
return inputFinder;
} //如果RexNode是InputRef,就记录下该Ref的index
public Void visitInputRef(RexInputRef inputRef) {
inputBitSet.set(inputRef.getIndex());
return null;
} //如果RexNode是Call,继续递归accept该Call的相关Operands
public R visitCall(RexCall call) {
if (!deep) {
return null;
} R r = null;
for (RexNode operand : call.operands) {
r = operand.accept(this);
}
return r;
}

2. 由于树结构变了,会导致整个field的index发生改变

对于原先的,bottomJoin,A的field从0开始,B是从aCount开始
而对于变换后的,BottomJoin,B的field从0开始,C是从bCount开始

所以需要完成Ref的修正,

createShiftMapping,生成的mapping,包含原先的index,和当前新的index

第一个参数是index的范围,一共aCount + bCount + cCount,所以index需要小于这个值

然后,只有B,C的index需要调整,A的index不需要调整

其中,B的原先是从aCount开始,当前从0开始,一共bCount个fields;C的原先是从aCount + bCount开始,当前从bCount开始,一共cCount个fields

接着,RexPermuteInputsShuttle做具体的更新

/**
* Shuttle which applies a permutation to its input fields.
*
* @see RexPermutationShuttle
* @see RexUtil#apply(org.apache.calcite.util.mapping.Mappings.TargetMapping, RexNode)
*/
public class RexPermuteInputsShuttle extends RexShuttle {
//~ Instance fields -------------------------------------------------------- private final Mappings.TargetMapping mapping;
private final ImmutableList<RelDataTypeField> fields; //对于每个InputRef,根据Mapping更新index
@Override public RexNode visitInputRef(RexInputRef local) {
final int index = local.getIndex();
int target = mapping.getTarget(index);
return new RexInputRef(
target,
local.getType());
}

最后,构建新的join,

    RexNode newBottomCondition =
RexUtil.composeConjunction(rexBuilder, newBottomList); final Join newBottomJoin =
bottomJoin.copy(bottomJoin.getTraitSet(), newBottomCondition, relB,
relC, JoinRelType.INNER, false); // Condition for newTopJoin consists of pieces from bottomJoin and topJoin.
// Field ordinals do not need to be changed.
RexNode newTopCondition = RexUtil.composeConjunction(rexBuilder, top);
@SuppressWarnings("SuspiciousNameCombination")
final Join newTopJoin =
topJoin.copy(topJoin.getTraitSet(), newTopCondition, relA,
newBottomJoin, JoinRelType.INNER, false); call.transformTo(newTopJoin);
05-28 03:36