Is async/await suitable for methods that are both IO and CPU bound?

The MSDN documentation appears to state that async and await are suitable for IO-bound tasks whereas Task.Run should be used for CPU-bound tasks.

I'm working on an application that performs HTTP requests to retrieve HTML documents, which it then parses. I have a method that looks like this:

public async Task<HtmlDocument> LoadPage(Uri address)
{
    using (var httpResponse = await new HttpClient().GetAsync(address)) //IO-bound
    using (var responseContent = httpResponse.Content)
    using (var contentStream = await responseContent.ReadAsStreamAsync())
        return await Task.Run(() => LoadHtmlDocument(contentStream)); //CPU-bound
}

Is this good and suitable use of async and await, or am I over-using it?


Solution 1:

There are two good answers already, but to add my 0.02...

If you're talking about consuming asynchronous operations, async/await works excellently for both I/O-bound and CPU-bound.

I think the MSDN docs do have a slight slant towards producing asynchronous operations, in which case you do want to use TaskCompletionSource (or similar) for I/O-bound and Task.Run (or similar) for CPU-bound. Once you've created the initial Task wrapper, it's best consumed by async and await.

For your particular example, it really comes down to how much time LoadHtmlDocument will take. If you remove the Task.Run, you will execute it within the same context that calls LoadPage (possibly on a UI thread). The Windows 8 guidelines specify that any operation taking more than 50ms should be made async... keeping in mind that 50ms on your developer machine may be longer on a client's machine...

So if you can guarantee that LoadHtmlDocument will run for less than 50ms, you can just execute it directly:

public async Task<HtmlDocument> LoadPage(Uri address)
{
  using (var httpResponse = await new HttpClient().GetAsync(address)) //IO-bound
  using (var responseContent = httpResponse.Content)
  using (var contentStream = await responseContent.ReadAsStreamAsync()) //IO-bound
    return LoadHtmlDocument(contentStream); //CPU-bound
}

However, I would recommend ConfigureAwait as @svick mentioned:

public async Task<HtmlDocument> LoadPage(Uri address)
{
  using (var httpResponse = await new HttpClient().GetAsync(address)
      .ConfigureAwait(continueOnCapturedContext: false)) //IO-bound
  using (var responseContent = httpResponse.Content)
  using (var contentStream = await responseContent.ReadAsStreamAsync()
      .ConfigureAwait(continueOnCapturedContext: false)) //IO-bound
    return LoadHtmlDocument(contentStream); //CPU-bound
}

With ConfigureAwait, if the HTTP request doesn't complete immediately (synchronously), then this will (in this case) cause LoadHtmlDocument to be executed on a thread pool thread without an explicit call to Task.Run.

If you're interested in async performance at this level, you should check out Stephen Toub's video and MSDN article on the subject. He has tons of useful information.

Solution 2:

It is appropriate to await any operation that is asynchronous (i.e. is represented by a Task).

The key point is that for IO operations, whenever possible, you want to use a provided method that is, at it's very core, asynchronous, rather than using Task.Run on a blocking synchronous method. If you're blocking a thread (even a thread pool thread) while performing IO, you're not leveraging the real power of the await model.

Once you have created a Task that represents your operation you no longer care if it's CPU or IO bound. To the caller it's just some async operation that needs to be await-ed.

Solution 3:

There are several things to consider:

  • In a GUI application, you want as little code as possible to execute on the UI thread. In that case, offloading CPU-bound operation to another thread using Task.Run() is probably a good idea. Though the users of your code can do that themselves, if they want.
  • In something like ASP.NET application, there is no UI thread and all you care about is performance. In that case, there is some overhead in using Task.Run() instead of running the code directly, but it shouldn't be significant if the operation actually takes some time. (Also, there is some overhead in returning to the synchronization context, which is one more reason why you should use ConfigureAwait(false) for most awaits in your library code.)
  • If your method is async (which BTW should be also reflected in the name of the method, not just its return type), people will expect that it won't block the synchronization context thread, even for CPU-bound work.

Weighting that, I think using await Task.Run() is the right choice here. It does have some overhead, but also some advantages, which can be significant.