[Theory] Static Program Analysis 讀書筆記(二):data?ow analysis

5. Data?ow Analysis with Monotone Frameworks

Data?ow Analysis

  • Sign Analysis
  • Constant Propagation Analysis
  • Live Variables Analysis
  • Available Expressions Analysis
  • Very Busy Expressions Analysis
  • Reaching De?nitions Analysis
  • Initialized Variables Analysis

Classical data?ow analysis starts with a CFG and a lattice with ?nite height. The lattice describes abstract information we wish to infer for the di?erent CFG nodes.

The combination of a lattice and a space of monotone functions is called a monotone framework.

An analysis is sound if all solutions to the constraints correspond to correct information about the program. The solutions may be more or less imprecise, but computing the least solution will give the highest degree of precision possible.

  • naive ?xed-point algorithm
  • round-robin algorithm
  • chaotic-iteration algorithm
  • work-list algorithm

A variable is live at a program point if there exists an execution where its value is read later in the execution without it is being written to in between. Clearly undecidable, this property can be approximated by a static analysis called live variables analysis (or liveness analysis).

The typical use of live variables analysis is optimization: there is no need to store the value of a variable that is not live. For this reason, we want the analysis to be conservative in the direction where the answer “not live” can be trusted and “l(fā)ive” is the safe but useless answer.

  • parameterized lattice

The set of available expressions for all program points can be approximated using a data?ow analysis.

An expression is very busy if it will de?nitely be evaluated again before its value changes.

The reaching de?nitions for a given program point are those assignments that may have de?ned the current values of variables.

A data?ow analysis is speci?ed by providing the lattice and the constraint rules.

A forward analysis is one that for each program point computes information about the past behavior. A backward analysis is one that for each program point computes information about the future behavior.

A may analysis is one that describes information that may possibly be true and, thus, computes an over-approximation. Conversely, a must analysis is one that describes information that must de?nitely be true and, thus, computes an under-approximation.

Forward Backward
May Reaching De?nitions Live Variables
Must Available Expressions Very Busy Expressions
  • transfer function

6. Widening

A central limitation of the monotone frameworks approach is the requirement that the lattices have finite height. We describe a technique called widening that overcomes that limitation (and a related technique called narrowing).

An interval analysis computes for every integer variable a lower and an upper bound for its possible values.

Intervals are interesting analysis results, since sound answers can be used for optimizations and bug detection related to array bounds checking, numerical over?ows, and integer representations.

The interval analysis lattice has infinite height, so applying the naive fixed-point algorithms may never terminate.

To obtain convergence of the interval analysis we shall use a technique called widening.

Let f : L → L denote the function from the fixed-point theorem and the naive fixed-point algorithm. A particularly simple form of widening, which often suffices in practice, introduces a function ω: L → L so that the sequence

(\omega\circ f)^i(\perp)~~~~for~i=0,1,\cdots 

is guaranteed to converge on a fixed-point that is larger than or equal to each approximant f^i(\perp) of the naive fixed-point algorithm and thus represents sound information about the program. To ensure this property, it suffices that ω is monotone and extensive, and that the image ω(L) = {y ∈ L | ?x ∈ L: y = ω(x)} has finite height. The widening function ω will intuitively coarsen the information sufficiently to ensure termination.

Widening generally shoots above the target, but a subsequent technique called narrowing may improve the result. If we define

fix=\bigvee f^i(\perp)
fix~\omega=\bigvee(\omega\circ f)^i(\perp)

then we have fix\sqsubseteq fix~\omega. However, we also have that fix\sqsubseteq f(fix~\omega)\sqsubseteq fix~\omega, which means that a subsequent application of f may improve our result and still produce sound information. This technique, called narrowing, may in fact be iterated arbitrarily many times.

The simple kind of widening discussed above is sometimes unnecessarily aggressive: widening every interval in every abstract state in each iteration of the fixed-point algorithm is not necessary to ensure convergence. For this reason, traditional widening takes a more sophisticated approach that may lead to better analysis precision.

For this reason, traditional widening takes a more sophisticated approach that may lead to better analysis precision. It involves a binary operator,

\triangledown : L\times  L\rightarrow L

The widening operator \triangledown must satisfy

x \sqsubseteq x \triangledown y \wedge y \sqsubseteq x \triangledown y ~~~~\forall x,y\in L

and that for any increasing sequence z_0 \sqsubseteq z_1 \sqsubseteq z_2 \sqsubseteq \cdots, the sequence y_0,y_1,y_2,\cdots defined by y_0 = z_0 and y_{i+1} = y_i\triangledown z_{i+1} for i=0,1,\cdots converges after a finite number of steps.

With such an operator, we can obtain a safe approximation of the least fixed-point of f by computing the following sequence

\left\{\begin{matrix}
x_0 = \perp\\ 
x_{i+1} = x_i \triangledown f(x_i)
\end{matrix}\right.

This sequence eventually converges, that is, for some k we have x_{k+1} = x_k. Furthermore, the result is a safe approximation of the ordinary fixed-point: fix \sqsubseteq x_k.

7. Path Sensitivity and Relational Analysis

Until now, we have ignored the values of branch and loop conditions by simply treating if- and while-statements as a nondeterministic choice between the two branches, which is called control insensitive analysis. Such analyses are also path insensitive, because they do not distinguish different paths that lead to a given program point.

  • control sensitive (branch sensitive)
  • independent attribute analysis
  • relational analysis
  • path contexts
  • path sensitive
  • iterative refinement

The present analysis is also called an independent attribute analysis as the abstract value of the file is independent of the abstract value of the boolean ?ag. What we need is a relational analysis that can keep track of relations between variables. One approach to achieve this is by generalizing the analysis to maintain multiple abstract states per program point.

Often, Paths consists of combinations of predicates that appear in conditionals in the program. This quickly results in an exponential blow-up: for k predicates, each statement may need to be analyzed in 2^k different path contexts.

In practice, however, there is usually much redundancy in these analysis steps. Thus, in addition to the challenge of reasoning about the assert predicates relative to the lattice elements, it requires a considerable effort to avoid too many redundant computations in path sensitive analysis. One approach is iterative refinement where Paths is initially a single universal path context, which is then iteratively refined by adding relevant predicates until either the desired properties can be established or disproved or the analysis is unable to select relevant predicates and hence gives up.

8. Interprocedural Analysis

So far, we have only analyzed the body of individual functions, which is called intraprocedural analysis. We now consider interprocedural analysis of whole programs containing multiple functions and function calls.

We use the subset of the TIP language containing functions, but still ignore pointers and functions as values. As we shall see, the CFG for an entire program is then quite simple to obtain. It becomes more complicated when adding function values.

In the CFG, we represent each function call statement using two nodes: a call node representing the connection from the caller to the entry of f, and an after-call node where execution resumes after returning from the exit of f. Next, we represent each return statement as an assignment using a special variable named result.

  • context insensitive
  • interprocedurally invalid paths
  • call contexts
  • call string approach
  • functional approach
  • parameter sensitivity
  • object sensitivity

Context sensitivity with the functional approach as presented here gives optimal precision, in the sense that it is as precise as if inlining all function calls (even recursive ones). This means that it completely avoids the problem with data?ow along interprocedurally invalid paths.

When analyzing object oriented programs, a popular choice is object sensitivity, which is essentially a variant of the functional approach that distinguishes calls not on the entire abstract states at function entries but only on the abstract values of the receiver objects.


Reference

Static Program Analysis

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 0. Preface Static program analysis is the art of reasonin...
    何幻閱讀 1,294評論 0 2
  • 本文轉(zhuǎn)載自知乎 作者:季子烏 筆記版權(quán)歸筆記作者所有 其中英文語句取自:英語流利說-懂你英語 ——————————...
    Danny_Edward閱讀 44,083評論 4 38
  • 很久很久,記不起多久。總是有這種情況,不論是事還是物,都在積累積累再積累,最后發(fā)現(xiàn)受不了,就來一次大整理大掃除。然...
    廖月友閱讀 1,052評論 1 2
  • 中午吃飯看電影,基本已經(jīng)成為了公司不成文的文化,而今天被選中的是《胖子行動(dòng)隊(duì)》。二十多分鐘的劇情,幾乎一路笑點(diǎn)。 ...
    螞蟻說說閱讀 356評論 0 0
  • 有時(shí)經(jīng)常會聽到一些來自微信群QQ群或者老友閑聊的消息,說老李在某某上市企業(yè)干的風(fēng)生水起,買了豪車新房,老趙居然能在...
    若溪水閱讀 335評論 0 0

友情鏈接更多精彩內(nèi)容