Curiosity is bliss    Archive    Feed    About    Search

Julien Couvreur's programming blog and more

Overview of nullability analysis

A regular Roslyn contributor, Yair, asked for some pointers about C# 8.0’s nullability analysis on the gitter channel. I thought I’d expand on my reply and share it more broadly.

This post assumes familiarity with the “nullable reference types” feature, including the concepts of nullability annotations (annotated, not-annotated, oblivious) and states (not-null, maybe-null).

Bound trees

The backbone of the compiler consists of four main stages:

  • parsing source code into syntax trees,
  • building symbols from declarations and binding the syntax of each method body into an initial bound tree,
  • lowering the bound tree into a set of simpler bound nodes,
  • emitting IL from the lowered bound nodes, along with some metadata.

Nullability analysis rests on the initial bound tree. This tree has a structure similar to the syntax tree, but instead of referencing un-interpreted identifiers (like x or Method) it references symbols. Symbols are an object model for entities declared by a program or loaded from metadata. For example, symbols allow differentiating different uses of a given identifier in code. You could have a parameter x, a local x, a type x or even a method x. For each kind of symbol you can ask different questions, such as the type of the parameter or local, or the return and parameter types of a method.

When types are explicit in source (for example, string nonNullLocal = "";, string? maybeNullLocal = ""; or MakeArray<string?>(item)), the bound nodes and symbols capture an explicit/declared nullability: TypeWithAnnotations with Annotated or NotAnnotated annotations in a context with nullability annotations enabled, or Oblivious in a disabled context.

When types are inferred (for example, in var local = ""; or MakeArray(item)), the bound node just uses an Oblivious annotation, which the nullability analysis will later revise.

NullableWalker

NullableWalker is responsible for most of the analysis. It is a visitor for the initial bound tree, which:

  1. computes the nullability state of expressions (and save those to answer queries from the IDE),
  2. keeps track of knowledge for different variables (more on that below), and
  3. produces warnings.

State tracking

As the analysis progresses through a method body, NullableWalker tracks some knowledge for each variable (or more generally, storage location). At a certain point in the analysis, the state for a given variable is either MaybeNull or NotNull. For all the tracked variables, this is represented as a state array, in which each variable gets an index/position/slot.

For instance, each parameter and local in a method gets a slot, which holds either a NotNull or MaybeNull state. Consider a parameter string? p1: we give it a slot/index and we’ll initialize its state to maybe-null (ie. State[slot] = MaybeNull, because its declared type is Annotated), then when we visit p1 = ""; we can just override that state, and when we visit p1.ToString() we consult that state to decide whether to warn for possible null dereference.

NullableWalker not only tracks variables, it also tracks fields, so it assigns slots for those too. That way, it can warn on localStruct.field1.ToString(), but not localStruct.field2.ToString() independently. Such nested slots are known to have a containing slot. With that information, we can look at an assignment like local2 = local1; and we can not only copy the slot for local1 to set the state of local2, but we can copy the nested slots thereby transfering all of our knowledge of local1 to local2.

The state is generally just a simple array, but it can also be two arrays in some cases. That’s called “conditional state”. It is used for analyzing expressions like x == null. We keep track of the states “if the expression were true” and “if the expression were false” separately. Slots are still used to index into those arrays as normal.

Cloning states is another common operation. When analyzing if (b) ... else ..., we clone the state so that we can analyze each branch separately. We can merge those states when the branches rejoin (Meet takes the worst case values). That gives us the state following the if statement.

In code that isn’t reachable, as in if (false) { ... unreachable ...}, every value you read is NotNull regardless of tracked state to minimize warnings.

Simple example

Let’s wrap up this overview by looking at an assignment, x = y. To analyze this expression, we’re going to:

  1. visit the right-hand-side expression and get a TypeWithState back which tells us the null-state of y at this point in the program,
  2. visit the left-hand-side expression as an L-value (i.e., for assigning to) and get a TypeWithAnnotations back which tells us the declared type of x (not its state),
  3. we check if the assignment from the state of y to the declared type of x poses problems, both in terms of top-level nullability (for instance, are we assigning a null value to a un-annotated string variable?), or nested nullability (for example, are we assigning a List<string> value to a List<string?> variable?),
  4. we update the state of x based on the state of y,
  5. return the state of x as the state of the assignment expression, in case it is a nested expression like (x = y).ToString().

In that example, y might not be a simple bound node for accessing y, but it could also involve implicit conversions. In that case, visiting y at the step (1) will visit a bound conversion which holds y as its operand. As long as the visit operation for each kind of bound node does its part (i.e., produce a TypeWithState for the expression, produce proper side effects on state and diagnostics) then this process composes well.

Async Enumerables with Cancellation

In this post, I’ll explain how to produce and consume async enumerables with support for cancellation. Since originally publishing this post, we’ve added support in the language for a new attribute which solve this problem more elegantly. I’ve added a section detailing the new method.

Some context

Visual Studio 2019 (currently in preview) includes a preview of C# 8.0 and the async-streams feature.

Three parts compose this feature:

  1. async-iterator methods: you can write methods with the async modifier, returning either IAsyncEnumerable or IAsyncEnumerator, and using both yield and await syntax.
  2. await foreach: you can asynchronously enumerate collections that implement IAsyncEnumerable (or implement equivalent APIs).
  3. await using: you can asynchronously dispose resources that implement IAsyncDisposable.

await foreach follows a similar execution pattern as its synchronous sibling foreach: it first gets an enumerator from the enumerable (by calling GetAsyncEnumerator(), then repeatedly does await MoveNextAsync() on the enumerator and gets the item with Current until the enumerator is exhausted.

Here’s the code generated for an await foreach:

E e = ((C)(x)).GetAsyncEnumerator();
try
{
    while (await e.MoveNextAsync())
    {
        V v = (V)(T)e.Current;
        // body
    }
}
finally
{
    await e.DisposeAsync();
}

You may notice in the relevant APIs (copied below) that GetAsyncEnumerator accepts a CancellationToken parameter. But await foreach doesn’t make use of this parameter (it passes a default value).

This raises two questions: 1) how do you write an async enumerable with support for cancellation? and 2) how do you consume one?

Writing an async enumerable supporting cancellation (original method)

Let’s say that you intend to write IAsyncEnumerable<int> GetItemsAsync(int maxItems) supporting cancellation.

You cannot just write an async iterator method async IAsyncEnumerable<int> GetItemsAsync(int maxItems) because that does not give you access to any cancellation token.

You also cannot write an async iterator method async IAsyncEnumerable<int> GetItemsAsync(int maxItems, CancellationToken token) because:

  1. if a method has its own cancellation token and wants to enumerate an async enumerable it received, it could not use the token it wants with that enumerable (the cancellation token would be already built into the enumerable),
  2. the same cancellation token would be used in every enumerator when the collection is enumerated multiple times,

So instead, you need to implement the enumerable yourself and put your business logic in async IAsyncEnumerator<int> GetAsyncEnumerable(CancellationToken cancellationToken).

Here’s what that looks like:

public static IAsyncEnumerable<int> GetItemsAsync(int maxItems)
    => new MyCancellableCollection(maxItems);
    
class MyCancellableCollection : IAsyncEnumerable<int>
{
    private int _maxItems;
    internal MyCancellableCollection(int maxItems)
        => _maxItems = maxItems;
        
    public async IAsyncEnumerator<int> GetAsyncEnumerator(CancellationToken cancellationToken)
    {
        // Your method body using:
        // - `_maxItems`
        // - `cancellationToken.ThrowIfCancelled();`
        // - `await` and `yield` constructs
    }
}

We recognize that this involves boilerplate. We are considering some language design options to further simplify this. Since originally publishing this, we’ve solved this problem more elegantly by extending the language. The next section explains the updated design.

Writing an async enumerable supporting cancellation (improved method)

In an updated preview of C# 8.0 (shipping in Visual Studio 2019 version 16.1), we’ll be adding support for the [EnumeratorCancellation] token. The attribute allows you to write an async-iterator method, returning IAsyncEnumerable<T> as you intend, but tells the compiler to store the token from GetAsyncEnumerator(CancellationToken) into one of your method’s parameters.

In the above example, you would just declare the method as async IAsyncEnumerable<int> GetItemsAsync(int maxItems, [EnumeratorCancellation] CancellationToken token). Because of the attribute, the token parameter will be set to a synthesized cancellation token that combines two token: the one passed as an argument to the method, and the other given to GetAsyncEnumerator. This synthesized token gets cancelled when either of the two given tokens is cancelled.

async IAsyncEnumerable<int> GetItemsAsync(int maxItems, [EnumeratorCancellation] CancellationToken token)
{
        // Your method body using:
        // - `_maxItems`
        // - `token.ThrowIfCancelled();`
        // - `await` and `yield` constructs
}

Note: in dev16.1 preview5, we have not yet implemented this method of combining tokens, we took a simpler approach whereby any non-default token given to GetAsyncEnumerator will override the token passed as an argument. I expect to implement the more elaborate method of combining tokens in preview6 timeframe.

Consuming an async enumerable with cancellation

With the above implementation, if you wrote await foreach (var item in GetItemsAsync(maxItems: 10)) ..., a default cancellation token would be passed to the cancellable method.

Users of enumerables could try and expand the low-level code for an await foreach to pass a token, but that’s a terrible solution (defeats the purpose of await foreach).

To help with this, we provide a WithCancellation<T>(this IAsyncEnumerable<T> source, CancellationToken cancellationToken) extension method. It allows you to pass your token in:

await foreach (var item in GetItemsAsync(maxItems: 10).WithCancellation(token)) ...

This helper method wraps the enumerable from GetItemsAsync along with the given cancellation token. When GetAsyncEnumerator() is invoked on this wrapper, it calls GetAsyncEnumerator(token) on the underlying enumerable.

Appendix: relevant interfaces

using System.Threading;

namespace System.Collections.Generic
{
    /// <summary>Exposes an enumerator that provides asynchronous iteration over values of a specified type.</summary>
    /// <typeparam name="T">The type of values to enumerate.</typeparam>
    public interface IAsyncEnumerable<out T>
    {
        /// <summary>Returns an enumerator that iterates asynchronously through the collection.</summary>
        /// <param name="cancellationToken">A <see cref="CancellationToken"/> that may be used to cancel the asynchronous iteration.</param>
        /// <returns>An enumerator that can be used to iterate asynchronously through the collection.</returns>
        IAsyncEnumerator<T> GetAsyncEnumerator(CancellationToken cancellationToken = default);
    }

    /// <summary>Supports a simple asynchronous iteration over a generic collection.</summary>
    /// <typeparam name="T">The type of objects to enumerate.</typeparam>
    public interface IAsyncEnumerator<out T> : IAsyncDisposable
    {
        /// <summary>Advances the enumerator asynchronously to the next element of the collection.</summary>
        /// <returns>
        /// A <see cref="ValueTask{Boolean}"/> that will complete with a result of <c>true</c> if the enumerator
        /// was successfully advanced to the next element, or <c>false</c> if the enumerator has passed the end
        /// of the collection.
        /// </returns>
        ValueTask<bool> MoveNextAsync();

        /// <summary>Gets the element in the collection at the current position of the enumerator.</summary>
        T Current { get; }
    }
    
    /// <summary>Provides a mechanism for releasing unmanaged resources asynchronously.</summary>
    public interface IAsyncDisposable
    {
        /// <summary>
        /// Performs application-defined tasks associated with freeing, releasing, or
        /// resetting unmanaged resources asynchronously.
        /// </summary>
        ValueTask DisposeAsync();
    }
}

Original code for IAsyncEnumerable, IAsyncEnumerator and IAsyncDisposable.

For further details, see the async-streams design doc.

Using C# 7.1

C# 7.0 was released as part of Visual Studio 2017 (version 15.0). While we work on C# 8.0, we will also ship features that are ready earlier as point releases.

C# 7.1 is the first such release. It will ship along with Visual Studio 2017 version 15.3. To try it out today, you can install Visual Studio Preview side-by-side, quickly and safely.

As you start using new C# 7.1 features in your code, a lightbulb will offer you to upgrade your project, either to “C# 7.1” or “latest”. If you leave your project’s language version set to “default”, you can only use C# 7.0 features (“default” means the latest major version, so does not include point releases).

Note: make sure you select Configuration All Configuration, as Debug is the configuration selected by default when editting a project.

LangVer7_1.png

Here are more specific instructions for using C# 7.1 in ASP.NET and ASP.NET Core and .NET CLI. The NuGet compiler packages for this release are versioned 2.3.

You can provide feedback on the C# features on the Roslyn repository or via the “Report a Problem” button in Visual Studio.

C# 7.1 features

In addition to numerous issues fixed in this release, the compiler comes with the following features for C# 7.1 (summarized below): async Main, pattern-matching with generics, “default” expressions, and inferred tuple names.

You can find more details about C# 7.1 and our progress on C# 7.2 and 8.0 in the language feature status page.

Async Main

This makes it easier to get started with async code, by recognizing static async Task Main() {...await some asynchronous code...} as a valid entry-point to your program.

Pattern-matching with generics

This allows using open types in type patterns. For example, case T t:.

“default” literal

This lets you omit the type in the default operator (default(T)) when the type can be inferred from the context. For instance, you can invoke void M(ImmutableArray<int> x) with M(default), or specify a default parameter value when declaring void M(CancellationToken x = default).

DefaultError.png

DefaultLightbulb.png

Inferred tuple names

This is a refinement on tuple literals (introduced in 7.0) which makes tuple element names redundant when they can be infered from the expressions. Instead of writing var tuple = (a: this.a, b: X.Y.b), you can simply write var tuple = (this.a, X.Y.b);. The elements tuple.a and tuple.b will still be recognized.

InferredTupleNameLightbulb.png

Error version pragma

This is a small undocumented feature to assist with troubleshooting language version issues. Type #error version and you will see the version of the compiler that you’re using, as well as your current language version setting. ErrorVersion.png