Database Best-Practices for Beginners

So, I am a fairly new programmer working towards an undergraduate Comp Sci degree with a very small amount of work experience. In looking for internship-type jobs for my program, I have noticed that what I've heard from several profs -- "working with databases makes up 90% of all modern computer science jobs" -- looks like it is actually true. However, my program doesn't really have any courses with databases until 3rd year, so I'm trying to at least learn some things myself in the mean time.

I've seen very little on SO and the internet in general for somebody like myself. There seem to be tons of tutorials on the mechanics of how to read and write data in a database, but little on the associated best practices. To demonstrate what I am talking about, and to help get across my actual question, here is what can easily be found on the internet:

public static void Main ()
{
    using (var conn = new OdbcConnection())
    {
        var command = new OdbcCommand();
        command.Connection = conn;
        command.CommandText = "SELECT * FROM Customer WHERE id = 1";
        var dbAdapter = new OdbcDataAdapter();
        dbAdapter.SelectCommand = command;
        var results = new DataTable();
        dbAdapter.Fill(results);
    }

    // then you would do something like
    string customerName = (string) results.Rows[0]["name"]; 
}

And so forth. This is pretty simple to understand but obviously full of problems. I started out with code like this and quickly started saying things like "well it seems dumb to just have SQL all over the place, I should put all that in a constants file." And then I realized that it was silly to have those same lines of code all over the place and just put all that stuff with connection objects etc inside a method:

public DataTable GetTableFromDB (string sql)
{
    // code similar to first sample
}    

string getCustomerSql = String.Format(Constants.SelectAllFromCustomer, customerId);
DataTable customer = GetTableFromDB(getCustomerSql);
string customerName = (string) customer.Rows[0]["name"];

This seemed to be a big improvement. Now it's super-easy to, say, change from an OdbcConnection to an SQLiteConnection. But that last line, accessing the data, still seemed awkward; and it is still a pain to change a field name (like going from "name" to "CustName" or something). I started reading about using typed Data sets or custom business objects. I'm still kind of confused by all the terminology, but decided to look into it anyway. I figure that it is stupid to rely on a shiny Database Wizard to do all this stuff for me (like in the linked articles) before I actually learn what is going on, and why. So I took a stab at it myself and started getting things like:

public class Customer
{
    public string Name {get; set;}
    public int Id {get; set;}

    public void Populate ()
    {
        string getCustomerSql = String.Format(Constants.SelectAllFromCustomer, this.Id);
        DataTable customer = GetTableFromDB(getCustomerSql);
        this.Name = (string) customer.Rows[0]["name"]; 
    }

    public static IEnumerable<Customer> GetAll()
    {
        foreach ( ... ) { 
            // blah blah
            yield return customer;
        }
    }
}

to hide the ugly table stuff and provide some strong typing, allowing outside code to just do things like

var customer = new Customer(custId);
customer.Populate();
string customerName = customer.Name;

which is really nice. And if the Customer table changes, changes in the code only need to happen in one place: inside the Customer class.

So, at the end of all this rambling, my question is this. Has my slow evolution of database code been going in the right direction? And where do I go next? This style is all well and good for small-ish databases, but when there are tons of different tables, writing out all those classes for each one would be a pain. I have heard about software that can generate that type of code for you, but am kind of still confused by the DAL/ORM/LINQ2SQL/etc jargon and those huge pieces of software are kind of overwhelming. I'm looking for some good not-overwhelmingly-complex resources that can point me in the right direction. All I can find on this topic are complex articles that go way over my head, or articles that just show you how to use the point-and-click wizards in Visual Studio and such. Also note that I'm looking for information on working with Databases in code, not information on Database design/normalization...there's lots of good material on that out there.

Thanks for reading this giant wall of text.


Very good question indeed and you are certainly on the right track!

Being a computer engineer myself, databases and how to write code to interact with databases was also never a big part of my university degree and sure enough I'm responsible for all the database code at work.

Here's my experience, using legacy technology from the the early 90s on one project and modern technology with C# and WPF on another.

I'll do my best to explain terminology as I go but I'm certainly not an expert myself yet.

Tables, Objects, and Mappings Oh My!

A database contains tables but what really is that? It's just flat data related to other flat data and if you dive in and start grabbing things its going to get messy quickly! Strings will be all over the place, SQL statements repeated, records loaded twice, etc... It's therefore generally a good practice to represent each table record ( or collection of tables records depending on their relationships ) as an single object, generally referred to as a Model. This helps to encapsulate the data and provide functionality for maintaining and updating its state.

In your posting your Customer class would act as the Model! So you've already realized that benefit.

Now there are a variety of tools/frameworks (LINQ2SQL, dotConnect, Mindscape LightSpeed) that will write all your Model code for you. In the end they are mapping objects to relational tables or O/R mapping as they refer to it.

As expected when your database changes so do your O/R mappings. Like you touched on, if your Customer changes, you have to fix it in one place, again why we put things in classes. In the case of my legacy project, updating models consumed a lot of time because their were so many, while in my newer project it's a few clicks BUT ultimately the result is the same.

Who should know what?

In my two projects there has been two different ways of how objects interact with their tables.

In some camps, Models should know everything about their tables, how to save themselves, have direct shared access to the connection/session and can perform actions like Customer.Delete() and Customer.Save() all by themselves.

Other camps, put reading, writing, deleting, logic in a managing class. For example, MySessionManager.Save( myCustomer ). This methodology has the advantage of being able to easily implement change tracking on objects and ensuring all objects reference the same underlying table record. Implementing it however is more complex than the previously mention method of localized class/table logic.

Conclusion

You're on the right track and in my opinion interacting with databases is extremely rewarding. I can remember my head spinning when I first started doing research myself.

I would recommend experimenting a bit, start a small project maybe a simple invoicing system, and try writing the models yourself. After that try another small project and try leveraging a database O/R mapping tool and see the difference.


Your evolution is definitely in the right direction. A few more things to consider:

  • Use prepared statements versus String.Format to bind your parameters. This will protect you from SQL injection attacks.
  • Use the DBProviderFactory and System.Data.Common inferfaces to further disconnect your implementation from a specific database.
  • After that, look at methods to generate your SQL commands and map data into objects automatically. If you don't want to jump into a big complex ORM, look for simple examples: ADO.NET ORM in 10 minutes, Light ORM library, or Creating an ORM in .NET. If you decide to go this route, you'll ultimately be better served by a mature library like the Entity Framework, Hibernate, or SubSonic.

My advice if you want to learn about databases, the first step is forget about the programming language, next, forget about which database you are using and learn SQL. Sure there are many differences between mySQL, MS SQLserver and Oracle but there is so much that is the same.

Learn about joins, select as, date formats, normalization. Learn what happens when you have millions and millions of records and things start to slow down, then learn to fix it.

Create a test project related to something that interests you, for example a bike store. See what happens when you add a few million products, and a few million customers and think of all the ways the data needs to relate.

Use a desktop app for running queries on a local database (sequel pro, mysql workbench etc) as it's much quicker than uploading source code to a server. And have fun with it!


IMHO, you're definitely going in the right direction for really nice to work with maintainable code! However I'm not convinced the approach will scale to a real app. A few thoughts that may be helpful

  1. While the code you're writing will be really nice to work with and really maintainable, it involves a lot of work up-front, this is part of the reason the wizards are so popular. They aren;t the nicest thing to work with, but save a lot of time.
  2. Querying from the database is just the beginning; another reason for the use of typed datasets and wizards in general is that in most applications, users are at some stage going to edit your information and send it back for updating. Single records are fine, but what if your data is best represented in a Normalised way with a hierarchy of tables 4 deep? Writing code to auto-generate the update/insert/delete statements by hand for all that call be hellish, so tools are the only way forward. typed DataSets will generate all the code to perform these updates for you and have some very powerful functionality for handling disconnected (e.g. Client-side) updates/rollbacks of recent modifications.
  3. What the last guys said about SQL injection (which is a SERIOUSLY big deal in industry) and protecting yourself by using a DBCommand object and adding DbParameters.

In general there's a really big problem in going from code to databases referred to as an impedance mismatch. Bridging the gap is very tricky and that's why the majority of industry relies on tools to do the heavy lifting. My advice would be to try the wizards out - because while stepping through a wizard is no test in skill, learning all their drawbacks/bugs and their various workarounds is a really useful skill in industry, and will allow you to get to some more advanced scenarios in data management more quickly (e.g. the disconnected update of a 4-deep table hierarchy I mentioned).


If you're a bit scared of things like Linq to SQL and the Entity Framework, you could step half way in between and explore something like iBATIS.NET. It is simply a data mapper tool that takes some of the pain of the database connection management and mapping your result sets to custom domain objects.

You still have to write all of your object classes and SQL, but it maps all of your data to the classes for you using reflection, and you don't have to worry about all of the underlying connectivity (you could easily write a tool to generate your classes). When you're up and running with iBATIS (assuming you might be interested), your code will start to look like this:

var customer = Helpers.Customers.SelectByCustomerID(1);

That SelectByCustomerID function exists inside the Customers mapper, whose definition might look like:

public Customer SelectByCustomerID(int id)
{
    Return Mapper.QueryForObject<Customer>("Customers.SelectByID", id);
}

The "Customers.SelectByID" maps to an XML statement definition where "Customers" is the namespace and "SelectByID" is the ID of the map containing your SQL:

<statements>
    <select id="SelectByID" parameterClass="int" resultClass="Customer">
        SELECT * FROM Customers WHERE ID = #value#
    </select>
</statements>

Or when you want to change a customer you can do things like:

customer.FirstName = "George"
customer.LastName = "Costanza"

Helpers.Customers.Update(customer);

LINQ to SQL and the Entity Framework get fancier by producing the SQL for you automatically. I like iBATIS because I still have full control of the SQL and what my domain objects look like.

Check out iBATIS (now migrated to Google under the name MyBatis.NET). Another great package is NHibernate, which is a few steps ahead of iBATIS and closer to a full ORM.