Cleanest way to write retry logic?

Solution 1:

Blanket catch statements that simply retry the same call can be dangerous if used as a general exception handling mechanism. Having said that, here's a lambda-based retry wrapper that you can use with any method. I chose to factor the number of retries and the retry timeout out as parameters for a bit more flexibility:

public static class Retry
{
    public static void Do(
        Action action,
        TimeSpan retryInterval,
        int maxAttemptCount = 3)
    {
        Do<object>(() =>
        {
            action();
            return null;
        }, retryInterval, maxAttemptCount);
    }

    public static T Do<T>(
        Func<T> action,
        TimeSpan retryInterval,
        int maxAttemptCount = 3)
    {
        var exceptions = new List<Exception>();

        for (int attempted = 0; attempted < maxAttemptCount; attempted++)
        {
            try
            {
                if (attempted > 0)
                {
                    Thread.Sleep(retryInterval);
                }
                return action();
            }
            catch (Exception ex)
            {
                exceptions.Add(ex);
            }
        }
        throw new AggregateException(exceptions);
    }
}

You can now use this utility method to perform retry logic:

Retry.Do(() => SomeFunctionThatCanFail(), TimeSpan.FromSeconds(1));

or:

Retry.Do(SomeFunctionThatCanFail, TimeSpan.FromSeconds(1));

or:

int result = Retry.Do(SomeFunctionWhichReturnsInt, TimeSpan.FromSeconds(1), 4);

Or you could even make an async overload.

Solution 2:

You should try Polly. It's a .NET library written by me that allows developers to express transient exception handling policies such as Retry, Retry Forever, Wait and Retry or Circuit Breaker in a fluent manner.

Example

Policy
    .Handle<SqlException>(ex => ex.Number == 1205)
    .Or<ArgumentException>(ex => ex.ParamName == "example")
    .WaitAndRetry(3, retryAttempt => TimeSpan.FromSeconds(3))
    .Execute(() => DoSomething());

Solution 3:

public void TryThreeTimes(Action action)
{
    var tries = 3;
    while (true) {
        try {
            action();
            break; // success!
        } catch {
            if (--tries == 0)
                throw;
            Thread.Sleep(1000);
        }
    }
}

Then you would call:

TryThreeTimes(DoSomething);

...or alternatively...

TryThreeTimes(() => DoSomethingElse(withLocalVariable));

A more flexible option:

public void DoWithRetry(Action action, TimeSpan sleepPeriod, int tryCount = 3)
{
    if (tryCount <= 0)
        throw new ArgumentOutOfRangeException(nameof(tryCount));

    while (true) {
        try {
            action();
            break; // success!
        } catch {
            if (--tryCount == 0)
                throw;
            Thread.Sleep(sleepPeriod);
        }
   }
}

To be used as:

DoWithRetry(DoSomething, TimeSpan.FromSeconds(2), tryCount: 10);

A more modern version with support for async/await:

public async Task DoWithRetryAsync(Func<Task> action, TimeSpan sleepPeriod, int tryCount = 3)
{
    if (tryCount <= 0)
        throw new ArgumentOutOfRangeException(nameof(tryCount));

    while (true) {
        try {
            await action();
            return; // success!
        } catch {
            if (--tryCount == 0)
                throw;
            await Task.Delay(sleepPeriod);
        }
   }
}

To be used as:

await DoWithRetryAsync(DoSomethingAsync, TimeSpan.FromSeconds(2), tryCount: 10);

Solution 4:

This is possibly a bad idea. First, it is emblematic of the maxim "the definition of insanity is doing the same thing twice and expecting different results each time". Second, this coding pattern does not compose well with itself. For example:

Suppose your network hardware layer resends a packet three times on failure, waiting, say, a second between failures.

Now suppose the software layer resends a notification about a failure three times on packet failure.

Now suppose the notification layer reactivates the notification three times on a notification delivery failure.

Now suppose the error reporting layer reactivates the notification layer three times on a notification failure.

And now suppose the web server reactivates the error reporting three times on error failure.

And now suppose the web client resends the request three times upon getting an error from the server.

Now suppose the line on the network switch that is supposed to route the notification to the administrator is unplugged. When does the user of the web client finally get their error message? I make it at about twelve minutes later.

Lest you think this is just a silly example: we have seen this bug in customer code, though far, far worse than I've described here. In the particular customer code, the gap between the error condition happening and it finally being reported to the user was several weeks because so many layers were automatically retrying with waits. Just imagine what would happen if there were ten retries instead of three.

Usually the right thing to do with an error condition is report it immediately and let the user decide what to do. If the user wants to create a policy of automatic retries, let them create that policy at the appropriate level in the software abstraction.

Solution 5:

The Transient Fault Handling Application Block provides an extensible collection of retry strategies including:

  • Incremental
  • Fixed interval
  • Exponential back-off

It also includes a collection of error detection strategies for cloud-based services.

For more information see this chapter of the Developer's Guide.

Available via NuGet (search for 'topaz').