Return DataReader from DataLayer in Using statement

We have a lot of data layer code that follows this very general pattern:

public DataTable GetSomeData(string filter)
{
    string sql = "SELECT * FROM [SomeTable] WHERE SomeColumn= @Filter";

    DataTable result = new DataTable();
    using (SqlConnection cn = new SqlConnection(GetConnectionString()))
    using (SqlCommand cmd = new SqlCommand(sql, cn))
    {
        cmd.Parameters.Add("@Filter", SqlDbType.NVarChar, 255).Value = filter;

        result.Load(cmd.ExecuteReader());
    }
    return result;
}

I think we can do a little better. My main complaint right now is that it forces all the records to be loaded into memory, even for large sets. I'd like to be able to take advantage of a DataReader's ability to only keep one record in ram at a time, but if I return the DataReader directly the connection is cut off when leaving the using block.

How can I improve this to allow returning one row at a time?


Once again, the act of composing my thoughts for the question reveals the answer. Specifically, the last sentence where I wrote "one row at a time". I realized I don't really care that it's a datareader, as long as I can enumerate it row by row. That lead me to this:

public IEnumerable<IDataRecord> GetSomeData(string filter)
{
    string sql = "SELECT * FROM [SomeTable] WHERE SomeColumn= @Filter";

    using (SqlConnection cn = new SqlConnection(GetConnectionString()))
    using (SqlCommand cmd = new SqlCommand(sql, cn))
    {
        cmd.Parameters.Add("@Filter", SqlDbType.NVarChar, 255).Value = filter;
        cn.Open();

        using (IDataReader rdr = cmd.ExecuteReader())
        {
            while (rdr.Read())
            {
                yield return (IDataRecord)rdr;
            }
        }
    }
}

This will work even better once we move to 3.5 and can start using other linq operators on the results, and I like it because it sets us up to start thinking in terms of a "pipeline" between each layer for queries that return a lot of results.

The down-side is that it will be awkward for readers holding more than one result set, but that is exceedingly rare.

Update
Since I first started playing with this pattern in 2009, I have learned that it's best if I also make it a generic IEnumerable<T> return type and add a Func<IDataRecord, T> parameter to convert the DataReader state to business objects in the loop. Otherwise, there can be issues with the lazy iteration, such that you see the last object in the query every time.


What you want is a supported pattern, you'll have to use

cmd.ExecuteReader(CommandBehavior.CloseConnection);

and remove both using()'s form your GetSomeData() method. Exception safety has to be supplied by the caller, in guaranteeing a Close on the reader.


In times like these I find that lambdas can be of great use. Consider this, instead of the data layer giving us the data, let us give the data layer our data processing method:

public void GetSomeData(string filter, Action<IDataReader> processor)
{
    ...

    using (IDataReader reader = cmd.ExecuteReader())
    {
        processor(reader);
    }
}

Then the business layer would call it:

GetSomeData("my filter", (IDataReader reader) => 
    {
        while (reader.Read())
        {
            ...
        }
    });

The key is yield keyword.

Similar to Joel's original answer, little more fleshed out:

public IEnumerable<S> Get<S>(string query, Action<IDbCommand> parameterizer, 
                             Func<IDataRecord, S> selector)
{
    using (var conn = new T()) //your connection object
    {
        using (var cmd = conn.CreateCommand())
        {
            if (parameterizer != null)
                parameterizer(cmd);
            cmd.CommandText = query;
            cmd.Connection.ConnectionString = _connectionString;
            cmd.Connection.Open();
            using (var r = cmd.ExecuteReader())
                while (r.Read())
                    yield return selector(r);
        }
    }
}

And I have this extension method:

public static void Parameterize(this IDbCommand command, string name, object value)
{
    var parameter = command.CreateParameter();
    parameter.ParameterName = name;
    parameter.Value = value;
    command.Parameters.Add(parameter);
}

So I call:

foreach(var user in Get(query, cmd => cmd.Parameterize("saved", 1), userSelector))
{

}

This is fully generic, fits any model that comply to ado.net interfaces. The connection and reader objects are disposed after the collection is enumerated. Anyway filling a DataTable using IDataAdapter's Fill method can be faster than DataTable.Load