Overview of nullability analysis
A regular Roslyn contributor, Yair, asked for some pointers about C# 8.0’s nullability analysis on the gitter channel. I thought I’d expand on my reply and share it more broadly.
This post assumes familiarity with the “nullable reference types” feature, including the concepts of nullability annotations (annotated, not-annotated, oblivious) and states (not-null, maybe-null).
Bound trees
The backbone of the compiler consists of four main stages:
- parsing source code into syntax trees,
- building symbols from declarations and binding the syntax of each method body into an initial bound tree,
- lowering the bound tree into a set of simpler bound nodes,
- emitting IL from the lowered bound nodes, along with some metadata.
Nullability analysis rests on the initial bound tree. This tree has a structure similar to the syntax tree, but instead of referencing un-interpreted identifiers (like x
or Method
) it references symbols. Symbols are an object model for entities declared by a program or loaded from metadata.
For example, symbols allow differentiating different uses of a given identifier in code. You could have a parameter x
, a local x
, a type x
or even a method x
. For each kind of symbol you can ask different questions, such as the type of the parameter or local, or the return and parameter types of a method.
When types are explicit in source (for example, string nonNullLocal = "";
, string? maybeNullLocal = "";
or MakeArray<string?>(item)
), the bound nodes and symbols capture an explicit/declared nullability: TypeWithAnnotations
with Annotated
or NotAnnotated
annotations in a context with nullability annotations enabled, or Oblivious
in a disabled context.
When types are inferred (for example, in var local = "";
or MakeArray(item)
), the bound node just uses an Oblivious
annotation, which the nullability analysis will later revise.
NullableWalker
NullableWalker
is responsible for most of the analysis. It is a visitor for the initial bound tree, which:
- computes the nullability state of expressions (and save those to answer queries from the IDE),
- keeps track of knowledge for different variables (more on that below), and
- produces warnings.
State tracking
As the analysis progresses through a method body, NullableWalker
tracks some knowledge for each variable (or more generally, storage location). At a certain point in the analysis, the state for a given variable is either MaybeNull
or NotNull
.
For all the tracked variables, this is represented as a state array, in which each variable gets an index/position/slot.
For instance, each parameter and local in a method gets a slot, which holds either a NotNull
or MaybeNull
state. Consider a parameter string? p1
: we give it a slot/index and we’ll initialize its state to maybe-null (ie. State[slot] = MaybeNull
, because its declared type is Annotated
), then when we visit p1 = "";
we can just override that state, and when we visit p1.ToString()
we consult that state to decide whether to warn for possible null dereference.
NullableWalker
not only tracks variables, it also tracks fields, so it assigns slots for those too. That way, it can warn on localStruct.field1.ToString()
, but not localStruct.field2.ToString()
independently.
Such nested slots are known to have a containing slot. With that information, we can look at an assignment like local2 = local1;
and we can not only copy the slot for local1
to set the state of local2
, but we can copy the nested slots thereby transfering all of our knowledge of local1
to local2
.
The state is generally just a simple array, but it can also be two arrays in some cases. That’s called “conditional state”. It is used for analyzing expressions like x == null
. We keep track of the states “if the expression were true” and “if the expression were false” separately. Slots are still used to index into those arrays as normal.
Cloning states is another common operation. When analyzing if (b) ... else ...
, we clone the state so that we can analyze each branch separately. We can merge those states when the branches rejoin (Meet
takes the worst case values). That gives us the state following the if
statement.
In code that isn’t reachable, as in if (false) { ... unreachable ...}
, every value you read is NotNull
regardless of tracked state to minimize warnings.
Simple example
Let’s wrap up this overview by looking at an assignment, x = y
. To analyze this expression, we’re going to:
- visit the right-hand-side expression and get a
TypeWithState
back which tells us the null-state ofy
at this point in the program, - visit the left-hand-side expression as an L-value (i.e., for assigning to) and get a
TypeWithAnnotations
back which tells us the declared type ofx
(not its state), - we check if the assignment from the state of
y
to the declared type ofx
poses problems, both in terms of top-level nullability (for instance, are we assigning anull
value to a un-annotatedstring
variable?), or nested nullability (for example, are we assigning aList<string>
value to aList<string?>
variable?), - we update the state of
x
based on the state ofy
, - return the state of
x
as the state of the assignment expression, in case it is a nested expression like(x = y).ToString()
.
In that example, y
might not be a simple bound node for accessing y
, but it could also involve implicit conversions. In that case, visiting y
at the step (1) will visit a bound conversion which holds y
as its operand. As long as the visit operation for each kind of bound node does its part (i.e., produce a TypeWithState
for the expression, produce proper side effects on state and diagnostics) then this process composes well.