How do I implement an async I/O bound operation from scratch?
I'm trying to understand how and when to use async
programming and got to I/O bound operations, but I don't understand them. I want to implement them from scratch. How can I do that?
Consider the example below which is synchronous:
private void DownloadBigImage() {
var url = "https://cosmos-magazine.imgix.net/file/spina/photo/14402/180322-Steve-Full.jpg";
new WebClient().DownloadFile(url, "image.jpg");
}
How do I implement the async
version by only having the normal synchronous method DownloadBigImage
without using Task.Run
since that will use a thread from the thread pool only for waiting - that's just being wasteful!
Also do not use the special method that's already async
! This is the purpose of this question: how do I make it myself without relying on methods which are already async? So, NO things like:
await new WebClient().DownloadFileTaskAsync(url, "image.jpg");
Examples and documentation available are very lacking in this regard. I found only this: https://docs.microsoft.com/en-us/dotnet/standard/async-in-depth which says:
The call to GetStringAsync() calls through lower-level .NET libraries (perhaps calling other async methods) until it reaches a P/Invoke interop call into a native networking library. The native library may subsequently call into a System API call (such as write() to a socket on Linux). A task object will be created at the native/managed boundary, possibly using TaskCompletionSource. The task object will be passed up through the layers, possibly operated on or directly returned, eventually returned to the initial caller.
Basically I have to use a "P/Invoke interop call into a native networking library"... but how?
I think this is a very interesting question and a fun learning exercise.
Fundamentally, you cannot use any existing API that is synchronous. Once it's synchronous there is no way to turn it truly asynchronous. You correctly identified that Task.Run
and it's equivalents are not a solution.
If you refuse to call any async .NET API then you need to use PInvoke to call native APIs. This means that you need to call the WinHTTP API or use sockets directly. This is possible but I don't have the experience to guide you.
Rather, you can use async managed sockets to implement an async HTTP download.
Start with the synchronous code (this is a raw sketch):
using (var s = new Socket(...))
{
s.Connect(...);
s.Send(GetHttpRequestBytes());
var response = new StreamReader(new NetworkStream(s)).ReadToEnd();
}
This very roughly gets you an HTTP response as a string.
You can easily make this truly async by using await
.
using (var s = new Socket(...))
{
await s.ConnectAsync(...);
await s.SendAsync(GetHttpRequestBytes());
var response = await new StreamReader(new NetworkStream(s)).ReadToEndAsync();
}
If you consider await
cheating with respect to your exercise goals you would need to write this using callbacks. This is awful so I'm just going to write the connect part:
var s = new Socket(...)
s.BeginConnect(..., ar => {
//perform next steps here
}, null);
Again, this code is very raw but it shows the principle. Instead of waiting for an IO to complete (which happens implicitly inside of Connect
) you register a callback that is called when the IO is done. That way your main thread continues to run. This turns your code into spaghetti.
You need to write safe disposal with callbacks. This is a problem because exception handling cannot span callbacks. Also, you likely need to write a read loop if you don't want to rely on the framework to do that. Async loops can be mind bending.
This is a great question which really isn't explained well in most texts about C# and async.
I searched for this for ages thinking I could and should maybe be implementing my own async I/O methods. If a method/library I was using didn't have async methods I thought I should somehow wrap these functions in code that made them asynchronous. It turns out that this isn't really feasible for most programmers. Yes, you can spawn a new thread using Thread.Start(() => {...})
and that does make your code asynchronous, but it also creates a new thread which is an expensive overhead for asynchronous operations. It can certainly free up your UI thread to ensure your app stays responsive, but it doesn't create a truly async operation the way that HttpClient.GetAsync() is a truly asynchronous operation.
This is because async methods in the .net libraries use something called "standard P/Invoke asynchronous I/O system in .NET" to call low level OS code that doesn't require a dedicated CPU thread while doing outbound IO (networking or storage). It actually doesn't dedicate a thread to its work and signals the .net runtime when it's done doing its stuff.
I'm not familiar with the details but this knowledge is enough to free me from trying to implement async I/O and make me focus on using the async methods already present in the .net libraries (such as HttpClient.GetAsync()). More interesting info can be found here (Microsoft async deep dive) and a nice description by Stephen Cleary here