2020-11-10 ycdxsb research / codeql21 分钟读完 (大约 3164 个字) 0次访问

CodeQL之CWE-401(2)

在写第二类CWE-401的查询脚本前，我们需要补充一些东西、在能找到的codeql文档里，大部分都只强推了它的DataFlow模块，而在写第二类查询脚本时，需要用到它的可达性分析模块import semmle.code.cpp.controlflow.StackVariableReachability

CodeQL可达性分析模块#

在这个模块里，存在三个抽象类StackVariableReachability、StackVariableReachabilityWithReassignment和StackVariableReachabilityExt

StackVariableReachability#

StackVariableReachability存在三个抽象函数，需要自己实现具体内容。isSource和isSink很好理解。isBarrier是指中间不允许出现的某个节点

/** Holds if `node` is a source for the reachability analysis using variable `v`. */
  abstract predicate isSource(ControlFlowNode node, StackVariable v);

  /** Holds if `sink` is a (potential) sink for the reachability analysis using variable `v`. */
  abstract predicate isSink(ControlFlowNode node, StackVariable v);

  /** Holds if `node` is a barrier for the reachability analysis using variable `v`. */
  abstract predicate isBarrier(ControlFlowNode node, StackVariable v);

同时也有可达性分析的函数reaches的描述为Holds if the source node can reach the sink node without crossing a barrier，也就是说可达是指，我从source点到sink点，但是不想经过barrier。

predicate reaches(ControlFlowNode source, SemanticStackVariable v, ControlFlowNode sink) {
   exists(BasicBlock bb, int i |
     isSource(source, v) and
     bb.getNode(i) = source and
     not bb.isUnreachable()
   |
     exists(int j |
       j > i and
       sink = bb.getNode(j) and
       isSink(sink, v) and
       not exists(int k | isBarrier(bb.getNode(k), v) | k in [i + 1 .. j - 1])
     )
     or
     not exists(int k | isBarrier(bb.getNode(k), v) | k > i) and
     bbSuccessorEntryReaches(bb, v, sink, _)
   )
 }

reaches函数是在一个函数内判断的，它其实是分为两类进行的。

第一类：source、sink、barrier在一个基本块内。那么就是判断source点、sink都点在基本块内，且满足前后控制流关系，并且不存在一个barrier点。
第二类：source、sink、barrier是跨基本块的。还是先判断source点存在于当前基本块，然后在它的后继基本块里继续找sink点和barrier点，即调用bbSuccessorEntryReaches函数，具体细节就不多说啦。

StackVariableReachabilityExt#

StackVariableReachabilityExt和StackVariableReachability类似，从注释里我们可以看出

/**
 * Same as `StackVariableReachability`, but `isBarrier` works on control-flow
 * edges rather than nodes and is therefore parameterized by the original
 * source node as well. Otherwise, this class is used like
 * `StackVariableReachability`.
 */

它和StackVariableReachability的区别在于isBarrier函数作用于边而不是控制流结点

/** `node` is a barrier for the reachability analysis using variable `v` and starting from `source`. */
abstract predicate isBarrier(
  ControlFlowNode source, ControlFlowNode node, ControlFlowNode next, StackVariable v
);

StackVariableReachabilityWithReassignment#

StackVariableReachabilityWithReassignment和之前的类似，区别在于这个类将重新赋值的情况考虑了进去

/**
 * Reachability analysis for control-flow nodes involving stack variables.
 * Unlike `StackVariableReachability`, this analysis takes variable
 * reassignments into account.
 *
 * This class is used like `StackVariableReachability`, except that
 * subclasses should override `isSourceActual` and `isSinkActual` instead of
 * `isSource` and `isSink`, and that there is a `reachesTo` predicate in
 * addition to `reaches`.
 */

同时使用reachTo替代父类的reaches条件，加入了其余的判断

/**
 * As `reaches`, but also specifies the last variable it was reassigned to (`v0`).
 */
predicate reachesTo(
  ControlFlowNode source, SemanticStackVariable v, ControlFlowNode sink, SemanticStackVariable v0
) {
  exists(ControlFlowNode def |
    actualSourceReaches(source, v, def, v0) and
    StackVariableReachability.super.reaches(def, v0, sink) and
    isSinkActual(sink, v0)
  )
}

进一步熟悉可达性分析模块#

CodeQL开发者在示例代码中有两处可以让我们更好的了解上面的三个类的使用，一个是https://github.com/github/codeql/blob/main/cpp/ql/src/Critical/MemoryMayNotBeFreed.ql另一个是https://github.com/github/codeql/blob/main/cpp/ql/src/Critical/FileMayNotBeClosed.ql

分别用于寻找CWE-401和CWE-755，而在我看来这两类其实都属于函数调用失配的情况，即调用了malloc没有调用free，调用了fopen，没有调用fclose。这里我们通过MemoryMayNotBeFreed进一步熟悉可达性分析模块。

直接调用or间接调用#

我们知道对一个函数的调用一般分为两种，第一种是直接调用，第二种是通过函数指针进行间接调用，为了同时考虑这两种情况，首先我们需要实现函数调用的函数如下

/**
 * call is either a direct call to f, or a possible call to f
 * via a function pointer.
 */
predicate mayCallFunction(Expr call, Function f) {
  call.(FunctionCall).getTarget() = f or
  call.(VariableCall).getVariable().getAnAssignedValue().getAChild*().(FunctionAccess).getTarget() =
    f
}

FunctionCall是直接调用，而VariableCall就是间接调用了，用来处理下面这类情况

/**
 * A C/C++ call which is performed through a variable of function pointer type.
 * 
 * int call_via_ptr(int (*pfn)(int)) {
 *   return pfn(5);
 * }
 * 
 */

直接找到变量，然后看它是不是在某处获得了函数的地址即可。

虽然codeQL的文档不多，但是通过看示例和它已有的注释，可以学到很多文档里没有的东西，不仅限于一些思路的写法和一些已有的API

赋值给全局or某个类的域#

predicate assignedToFieldOrGlobal(StackVariable v, Expr e) {
  // assigned to anything except a StackVariable
  // (typically a field or global, but for example also *ptr = v)
  e.(Assignment).getRValue() = v.getAnAccess() and
  not e.(Assignment).getLValue().(VariableAccess).getTarget() instanceof StackVariable
  or
  exists(Expr midExpr, Function mid, int arg |
    // indirect assignment
    e.(FunctionCall).getArgument(arg) = v.getAnAccess() and
    mayCallFunction(e, mid) and
    midExpr.getEnclosingFunction() = mid and
    assignedToFieldOrGlobal(mid.getParameter(arg), midExpr)
  )
  or
  // assigned to a field via constructor field initializer
  e.(ConstructorFieldInit).getExpr() = v.getAnAccess()
}

一般情况下，有这么两类内存分配是不一定在当前函数释放，第一类是全局的变量，它的内存释放可以在程序退出后自行释放，第二类是类内的局部变量，是由类在析构的时候释放的。

assignedToFieldOrGlobal函数用于判断这两种情况，避免误报产生。第一种是：一个表达式，右侧是局部变量，左侧不是局部变量(也就是全局变量)，这是直接赋值的情况；第三种是：在类的初始化时进行赋值；第二种是间接赋值，比如说通过参数调用函数，然后在函数内对这个参数进行了赋值。

allocCallOrIndirect#

allocCallOrIndirect是用来找source点的，因为我们要找MemoryMayNotBeFreed的情况，所以我们的source点肯定是对分配内存函数的调用

predicate allocCallOrIndirect(Expr e) {
  // direct alloc call
  e.(AllocationExpr).requiresDealloc() and
  // We are only interested in alloc calls that are
  // actually freed somehow, as MemoryNeverFreed
  // will catch those that aren't.
  allocMayBeFreed(e)
  or
  exists(ReturnStmt rtn |
    // indirect alloc call
    mayCallFunction(e, rtn.getEnclosingFunction()) and
    (
      // return alloc
      allocCallOrIndirect(rtn.getExpr())
      or
      // return variable assigned with alloc
      exists(Variable v |
        v = rtn.getExpr().(VariableAccess).getTarget() and
        allocCallOrIndirect(v.getAnAssignedValue()) and
        not assignedToFieldOrGlobal(v, _)
      )
    )
  )
}

freeCallOrIndirect#

在找内存未释放漏洞时，我们肯定要判断是不是有释放点，因此freeCallOrIndirect就是为了找内存释放的点，同时realloc也是一种内存释放

/**
 * The point at which a call to 'realloc' on 'v' has been verified to
 * succeed.  A failed realloc does *not* free the input pointer, which
 * can cause memory leaks.
 */
predicate verifiedRealloc(FunctionCall reallocCall, Variable v, ControlFlowNode verified) {
  reallocCall.(AllocationExpr).getReallocPtr() = v.getAnAccess() and
  (
    exists(Variable newV, ControlFlowNode node |
      // a realloc followed by a null check at 'node' (return the non-null
      // successor, i.e. where the realloc is confirmed to have succeeded)
      newV.getAnAssignedValue() = reallocCall and
      node.(AnalysedExpr).getNonNullSuccessor(newV) = verified and
      // note: this case uses naive flow logic (getAnAssignedValue).
      // special case: if the result of the 'realloc' is assigned to the
      // same variable, we don't descriminate properly between the old
      // and the new allocation; better to not consider this a free at
      // all in that case.
      newV != v
    )
    or
    // a realloc(ptr, 0), which always succeeds and frees
    // (return the realloc itself)
    reallocCall.(AllocationExpr).getReallocPtr().getValue() = "0" and
    verified = reallocCall
  )
}

predicate freeCallOrIndirect(ControlFlowNode n, Variable v) {
  // direct free call
  n.(DeallocationExpr).getFreedExpr() = v.getAnAccess() and
  not exists(n.(AllocationExpr).getReallocPtr())
  or
  // verified realloc call
  verifiedRealloc(_, v, n)
  or
  exists(FunctionCall midcall, Function mid, int arg |
    // indirect free call
    n.(Call).getArgument(arg) = v.getAnAccess() and
    mayCallFunction(n, mid) and
    midcall.getEnclosingFunction() = mid and
    freeCallOrIndirect(midcall, mid.getParameter(arg))
  )
}

AllocVariableReachability#

predicate allocationDefinition(StackVariable v, ControlFlowNode def) {
  exists(Expr expr | exprDefinition(v, def, expr) and allocCallOrIndirect(expr))
}

class AllocVariableReachability extends StackVariableReachabilityWithReassignment {
  AllocVariableReachability() { this = "AllocVariableReachability" }

  override predicate isSourceActual(ControlFlowNode node, StackVariable v) {
    allocationDefinition(v, node)
  }

  override predicate isSinkActual(ControlFlowNode node, StackVariable v) {
    // node may be used in allocationReaches
    exists(node.(AnalysedExpr).getNullSuccessor(v)) or
    freeCallOrIndirect(node, v) or
    assignedToFieldOrGlobal(v, node) or
    // node may be used directly in query
    v.getFunction() = node.(ReturnStmt).getEnclosingFunction()
  }

  override predicate isBarrier(ControlFlowNode node, StackVariable v) { definitionBarrier(v, node) }
}

AllocVariableReachability是对StackVariableReachabilityWithReassignment的继承。source点是：存在一个表达式，它调用了分配内存的函数并赋值给了局部变量v。sink点可以是free或者是赋值给了全局的变量。barrier是通过库中的函数实现的，目的是判断重新赋值的情况。

/**
 * Holds if `barrier` is either a (potential) definition of `v` or follows an
 * access that gets the address of `v`. In both cases, the value of
 * `v` after `barrier` cannot be assumed to be the same as before.
 */
predicate definitionBarrier(SemanticStackVariable v, ControlFlowNode barrier)

AllocReachability#

AllocReachability继承StackVariableReachabilityExt，source点同上，sink点是需要是和变量存在在一盒函数内的return语句，barrier是找被free、被赋值给全局变量或者已经经过了NULL检查的情况。

/**
 * The value from allocation `def` is still held in Variable `v` upon entering `node`.
 */
predicate allocatedVariableReaches(StackVariable v, ControlFlowNode def, ControlFlowNode node) {
  exists(AllocVariableReachability r |
    // reachability
    r.reachesTo(def, _, node, v)
    or
    // accept def node itself
    r.isSource(def, v) and
    node = def
  )
}

class AllocReachability extends StackVariableReachabilityExt {
  AllocReachability() { this = "AllocReachability" }

  override predicate isSource(ControlFlowNode node, StackVariable v) {
    allocationDefinition(v, node)
  }

  override predicate isSink(ControlFlowNode node, StackVariable v) {
    v.getFunction() = node.(ReturnStmt).getEnclosingFunction()
  }

  override predicate isBarrier(
    ControlFlowNode source, ControlFlowNode node, ControlFlowNode next, StackVariable v
  ) {
    isSource(source, v) and
    next = node.getASuccessor() and
    // the memory (stored in any variable `v0`) allocated at `source` is freed or
    // assigned to a global at node, or NULL checked on the edge node -> next.
    exists(StackVariable v0 | allocatedVariableReaches(v0, source, node) |
      node.(AnalysedExpr).getNullSuccessor(v0) = next or
      freeCallOrIndirect(node, v0) or
      assignedToFieldOrGlobal(v0, node)
    )
  }
}

联合查询#

from ControlFlowNode def, ReturnStmt ret
where
  allocationReaches(def, ret) and
  not exists(StackVariable v |
    allocatedVariableReaches(v, def, ret) and
    ret.getAChild*() = v.getAnAccess()
  )
select def, "The memory allocated here may not be released at $@.", ret, "this exit point"

在有了前面的铺垫后，最终的查询用自然语言描述就是：

从一个控制流结点到一个返回语句可达
且不存在一个变量保存了分配的内存的指针且最后被释放或者被赋值给全局变量等，并且申请的空间也没有作为函数返回值被返回到上层函数。

附录：MemoryMayNotBeFreed.ql#

/**
 * @name Memory may not be freed
 * @description A function may return before freeing memory that was allocated in the function. Freeing all memory allocated in the function before returning ties the lifetime of the memory blocks to that of the function call, making it easier to avoid and detect memory leaks.
 * @kind problem
 * @id cpp/memory-may-not-be-freed
 * @problem.severity warning
 * @tags efficiency
 *       security
 *       external/cwe/cwe-401
 */

import MemoryFreed
import semmle.code.cpp.controlflow.StackVariableReachability

/**
 * 'call' is either a direct call to f, or a possible call to f
 * via a function pointer.
 */
predicate mayCallFunction(Expr call, Function f) {
  call.(FunctionCall).getTarget() = f or
  call.(VariableCall).getVariable().getAnAssignedValue().getAChild*().(FunctionAccess).getTarget() =
    f
}

predicate allocCallOrIndirect(Expr e) {
  // direct alloc call
  e.(AllocationExpr).requiresDealloc() and
  // We are only interested in alloc calls that are
  // actually freed somehow, as MemoryNeverFreed
  // will catch those that aren't.
  allocMayBeFreed(e)
  or
  exists(ReturnStmt rtn |
    // indirect alloc call
    mayCallFunction(e, rtn.getEnclosingFunction()) and
    (
      // return alloc
      allocCallOrIndirect(rtn.getExpr())
      or
      // return variable assigned with alloc
      exists(Variable v |
        v = rtn.getExpr().(VariableAccess).getTarget() and
        allocCallOrIndirect(v.getAnAssignedValue()) and
        not assignedToFieldOrGlobal(v, _)
      )
    )
  )
}

/**
 * The point at which a call to 'realloc' on 'v' has been verified to
 * succeed.  A failed realloc does *not* free the input pointer, which
 * can cause memory leaks.
 */
predicate verifiedRealloc(FunctionCall reallocCall, Variable v, ControlFlowNode verified) {
  reallocCall.(AllocationExpr).getReallocPtr() = v.getAnAccess() and
  (
    exists(Variable newV, ControlFlowNode node |
      // a realloc followed by a null check at 'node' (return the non-null
      // successor, i.e. where the realloc is confirmed to have succeeded)
      newV.getAnAssignedValue() = reallocCall and
      node.(AnalysedExpr).getNonNullSuccessor(newV) = verified and
      // note: this case uses naive flow logic (getAnAssignedValue).
      // special case: if the result of the 'realloc' is assigned to the
      // same variable, we don't descriminate properly between the old
      // and the new allocation; better to not consider this a free at
      // all in that case.
      newV != v
    )
    or
    // a realloc(ptr, 0), which always succeeds and frees
    // (return the realloc itself)
    reallocCall.(AllocationExpr).getReallocPtr().getValue() = "0" and
    verified = reallocCall
  )
}

predicate freeCallOrIndirect(ControlFlowNode n, Variable v) {
  // direct free call
  n.(DeallocationExpr).getFreedExpr() = v.getAnAccess() and
  not exists(n.(AllocationExpr).getReallocPtr())
  or
  // verified realloc call
  verifiedRealloc(_, v, n)
  or
  exists(FunctionCall midcall, Function mid, int arg |
    // indirect free call
    n.(Call).getArgument(arg) = v.getAnAccess() and
    mayCallFunction(n, mid) and
    midcall.getEnclosingFunction() = mid and
    freeCallOrIndirect(midcall, mid.getParameter(arg))
  )
}

predicate allocationDefinition(StackVariable v, ControlFlowNode def) {
  exists(Expr expr | exprDefinition(v, def, expr) and allocCallOrIndirect(expr))
}

class AllocVariableReachability extends StackVariableReachabilityWithReassignment {
  AllocVariableReachability() { this = "AllocVariableReachability" }

  override predicate isSourceActual(ControlFlowNode node, StackVariable v) {
    allocationDefinition(v, node)
  }

  override predicate isSinkActual(ControlFlowNode node, StackVariable v) {
    // node may be used in allocationReaches
    exists(node.(AnalysedExpr).getNullSuccessor(v)) or
    freeCallOrIndirect(node, v) or
    assignedToFieldOrGlobal(v, node) or
    // node may be used directly in query
    v.getFunction() = node.(ReturnStmt).getEnclosingFunction()
  }

  override predicate isBarrier(ControlFlowNode node, StackVariable v) { definitionBarrier(v, node) }
}

/**
 * The value from allocation `def` is still held in Variable `v` upon entering `node`.
 */
predicate allocatedVariableReaches(StackVariable v, ControlFlowNode def, ControlFlowNode node) {
  exists(AllocVariableReachability r |
    // reachability
    r.reachesTo(def, _, node, v)
    or
    // accept def node itself
    r.isSource(def, v) and
    node = def
  )
}

class AllocReachability extends StackVariableReachabilityExt {
  AllocReachability() { this = "AllocReachability" }

  override predicate isSource(ControlFlowNode node, StackVariable v) {
    allocationDefinition(v, node)
  }

  override predicate isSink(ControlFlowNode node, StackVariable v) {
    v.getFunction() = node.(ReturnStmt).getEnclosingFunction()
  }

  override predicate isBarrier(
    ControlFlowNode source, ControlFlowNode node, ControlFlowNode next, StackVariable v
  ) {
    isSource(source, v) and
    next = node.getASuccessor() and
    // the memory (stored in any variable `v0`) allocated at `source` is freed or
    // assigned to a global at node, or NULL checked on the edge node -> next.
    exists(StackVariable v0 | allocatedVariableReaches(v0, source, node) |
      node.(AnalysedExpr).getNullSuccessor(v0) = next or
      freeCallOrIndirect(node, v0) or
      assignedToFieldOrGlobal(v0, node)
    )
  }
}

/**
 * The value returned by allocation `def` has not been freed, confirmed to be null,
 * or potentially leaked globally upon reaching `node`  (regardless of what variable
 * it's still held in, if any).
 */
predicate allocationReaches(ControlFlowNode def, ControlFlowNode node) {
  exists(AllocReachability r | r.reaches(def, _, node))
}

predicate assignedToFieldOrGlobal(StackVariable v, Expr e) {
  // assigned to anything except a StackVariable
  // (typically a field or global, but for example also *ptr = v)
  e.(Assignment).getRValue() = v.getAnAccess() and
  not e.(Assignment).getLValue().(VariableAccess).getTarget() instanceof StackVariable
  or
  exists(Expr midExpr, Function mid, int arg |
    // indirect assignment
    e.(FunctionCall).getArgument(arg) = v.getAnAccess() and
    mayCallFunction(e, mid) and
    midExpr.getEnclosingFunction() = mid and
    assignedToFieldOrGlobal(mid.getParameter(arg), midExpr)
  )
  or
  // assigned to a field via constructor field initializer
  e.(ConstructorFieldInit).getExpr() = v.getAnAccess()
}

from ControlFlowNode def, ReturnStmt ret
where
  allocationReaches(def, ret) and
  not exists(StackVariable v |
    allocatedVariableReaches(v, def, ret) and
    ret.getAChild*() = v.getAnAccess()
  )
select def, "The memory allocated here may not be released at $@.", ret, "this exit point"

#research codeql