Cezary Piątek posted a good overview of the C# nullable reference types feature.
It includes a critique of a code snippet. Examining that snippet is a good way to understand some of the C# LDM’s decisions.
In the following, Cezary expects a warning on a seemingly unreachable branch and no warning on the dereference.
#nullableenablepublicclassUser{publicstaticvoidMethod(UseruserEntity){if(userEntity==null)// Actual: no warning for seemingly unreachable branch. {}vars=userEntity.ToString();// Actual: warning CS8602: Dereference of a possibly null reference.}}
Why is there no warning on the seemingly unnecessary null test if (userEntity == null) ... or the apparently unreachable branch?
It’s because such tests are useful and encouraged in public APIs. Users should check inputs and the compiler should not get in the way of good practices. The branch of the if is therefore reachable.
Then, what is the state of userEntity within the if block?
We take the user’s null test seriously by considering userEntity to be maybe-null within the if block. So if the user did userEntity.ToString() inside the if, the compiler would rightly warn. This protects the user against a null reference exception that could realistically happen.
Given those, what should be the state of the userEntity at the exit of the if?
Because we’re merging branches where userEntity is maybe-null (when the condition of the if is true) and not-null (in the alternative), the state of userEntity is maybe-null. Therefore we warn on dereference on userEntity after the if.
Note that if the if block contained a throw, the userEntity would be considered not-null after the if. This is a common patter: if (userEntity is null) throw new ArgumentNullException(nameof(userEntity));.
A regular Roslyn contributor, Yair, asked for some pointers about C# 8.0’s nullability analysis on the gitter channel. I thought I’d expand on my reply and share it more broadly.
This post assumes familiarity with the “nullable reference types” feature, including the concepts of nullability annotations (annotated, not-annotated, oblivious) and states (not-null, maybe-null).
Bound trees
The backbone of the compiler consists of four main stages:
parsing source code into syntax trees,
building symbols from declarations and binding the syntax of each method body into an initial bound tree,
lowering the bound tree into a set of simpler bound nodes,
emitting IL from the lowered bound nodes, along with some metadata.
Nullability analysis rests on the initial bound tree. This tree has a structure similar to the syntax tree, but instead of referencing un-interpreted identifiers (like x or Method) it references symbols. Symbols are an object model for entities declared by a program or loaded from metadata.
For example, symbols allow differentiating different uses of a given identifier in code. You could have a parameter x, a local x, a type x or even a method x. For each kind of symbol you can ask different questions, such as the type of the parameter or local, or the return and parameter types of a method.
When types are explicit in source (for example, string nonNullLocal = "";, string? maybeNullLocal = ""; or MakeArray<string?>(item)), the bound nodes and symbols capture an explicit/declared nullability: TypeWithAnnotations with Annotated or NotAnnotated annotations in a context with nullability annotations enabled, or Oblivious in a disabled context.
When types are inferred (for example, in var local = ""; or MakeArray(item)), the bound node just uses an Oblivious annotation, which the nullability analysis will later revise.
NullableWalker
NullableWalker is responsible for most of the analysis. It is a visitor for the initial bound tree, which:
computes the nullability state of expressions (and save those to answer queries from the IDE),
keeps track of knowledge for different variables (more on that below), and
produces warnings.
State tracking
As the analysis progresses through a method body, NullableWalker tracks some knowledge for each variable (or more generally, storage location). At a certain point in the analysis, the state for a given variable is either MaybeNull or NotNull.
For all the tracked variables, this is represented as a state array, in which each variable gets an index/position/slot.
For instance, each parameter and local in a method gets a slot, which holds either a NotNull or MaybeNull state. Consider a parameter string? p1: we give it a slot/index and we’ll initialize its state to maybe-null (ie. State[slot] = MaybeNull, because its declared type is Annotated), then when we visit p1 = ""; we can just override that state, and when we visit p1.ToString() we consult that state to decide whether to warn for possible null dereference.
NullableWalker not only tracks variables, it also tracks fields, so it assigns slots for those too. That way, it can warn on localStruct.field1.ToString(), but not localStruct.field2.ToString() independently.
Such nested slots are known to have a containing slot. With that information, we can look at an assignment like local2 = local1; and we can not only copy the slot for local1 to set the state of local2, but we can copy the nested slots thereby transfering all of our knowledge of local1 to local2.
The state is generally just a simple array, but it can also be two arrays in some cases. That’s called “conditional state”. It is used for analyzing expressions like x == null. We keep track of the states “if the expression were true” and “if the expression were false” separately. Slots are still used to index into those arrays as normal.
Cloning states is another common operation. When analyzing if (b) ... else ..., we clone the state so that we can analyze each branch separately. We can merge those states when the branches rejoin (Meet takes the worst case values). That gives us the state following the if statement.
In code that isn’t reachable, as in if (false) { ... unreachable ...}, every value you read is NotNull regardless of tracked state to minimize warnings.
Simple example
Let’s wrap up this overview by looking at an assignment, x = y. To analyze this expression, we’re going to:
visit the right-hand-side expression and get a TypeWithState back which tells us the null-state of y at this point in the program,
visit the left-hand-side expression as an L-value (i.e., for assigning to) and get a TypeWithAnnotations back which tells us the declared type of x (not its state),
we check if the assignment from the state of y to the declared type of x poses problems, both in terms of top-level nullability (for instance, are we assigning a null value to a un-annotated string variable?), or nested nullability (for example, are we assigning a List<string> value to a List<string?> variable?),
we update the state of x based on the state of y,
return the state of x as the state of the assignment expression, in case it is a nested expression like (x = y).ToString().
In that example, y might not be a simple bound node for accessing y, but it could also involve implicit conversions. In that case, visiting y at the step (1) will visit a bound conversion which holds y as its operand. As long as the visit operation for each kind of bound node does its part (i.e., produce a TypeWithState for the expression, produce proper side effects on state and diagnostics) then this process composes well.