SQL Database Design Best Practice (Addresses)

Of course I realize that there's no one "right way" to design a SQL database, but I wanted to get some opinions on what is better or worse in my particular scenario.

Currently, I'm designing an order entry module (Windows .NET 4.0 application with SQL Server 2008) and I'm torn between two design decisions when it comes to data that can be applied in more than one spot. In this question I'll refer specifically to Addresses.

Addresses can be used by a variety of objects (orders, customers, employees, shipments, etc..) and they almost always contain the same data (Address1/2/3, City, State, Postal Code, Country, etc). I was originally going to include each of these fields as a column in each of the related tables (e.g. Orders will contain Address1/2/3, City, State, etc.. and Customers will also contain this same column layout). But a part of me wants to apply DRY/Normalization principles to this scenario, i.e. have a table called "Addresses" which is referenced via Foreign Key in the appropriate table.

CREATE TABLE DB.dbo.Addresses
    (
        Id          INT
                    NOT NULL
                    IDENTITY(1, 1)
                    PRIMARY KEY
                    CHECK (Id > 0),

        Address1    VARCHAR(120)
                                NOT NULL,

        Address2    VARCHAR(120),

        Address3    VARCHAR(120),

        City        VARCHAR(100)
                    NOT NULL,

        State       CHAR(2)
                    NOT NULL,

        Country     CHAR(2)
                    NOT NULL,

        PostalCode  VARCHAR(16)
                    NOT NULL
    )

CREATE TABLE DB.dbo.Orders
    (
        Id          INT
                    NOT NULL
                    IDENTITY(1000, 1)
                    PRIMARY KEY
                    CHECK (Id > 1000),

        Address     INT
                    CONSTRAINT fk_Orders_Address
                    FOREIGN KEY REFERENCES Addresses(Id)
                    CHECK (Address > 0)
                    NOT NULL,

        -- other columns....
    )

CREATE TABLE DB.dbo.Customers
    (
        Id          INT
                    NOT NULL
                    IDENTITY(1000, 1)
                    PRIMARY KEY
                    CHECK (Id > 1000),

        Address     INT
                    CONSTRAINT fk_Customers_Address
                    FOREIGN KEY REFERENCES Addresses(Id)
                    CHECK (Address > 0)
                    NOT NULL,

        -- other columns....
    )

From a design standpoint I like this approach because it creates a standard address format that is easily changeable, i.e. if I ever needed to add Address4 I would just add it in one place rather than to every table. However, I can see the number of JOINs required to build queries might get a little insane.

I guess I'm just wondering if any enterprise-level SQL architects out there have ever used this approach successfully, or if the number of JOINs that this creates would create a performance issue?


You're on the right track by breaking address out into its own table. I'd add a couple of additional suggestions.

  1. Consider taking the Address FK columns out of the Customers/Orders tables and creating junction tables instead. In other words, treat Customers/Addresses and Orders/Addresses as many-to-many relationships in your design now so you can easily support multiple addresses in the future. Yes, this means introducing more tables and joins, but the flexibility you gain is well worth the effort.

  2. Consider creating lookup tables for city, state and country entities. The city/state/country columns of the address table then consist of FKs pointing to these lookup tables. This allows you to guarantee consistent spellings across all addresses and gives you a place to store additional metadata (e.g., city population) if needed in the future.


I just have some cautions. For each of these, there's more than one way to fix the problem.

First, normalization doesn't mean "replace text with an id number".

Second, you don't have a key. I know, you have a column declared "PRIMARY KEY", but that's not enough.

insert into Addresses 
  (Address1, Address2, Address3, City, State, Country, PostalCode)
values
  ('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
  ('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
  ('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500'),
  ('President Obama', '1600 Pennsylvania Avenue NW', NULL, 'Washington', 'DC', 'US', '20500');

select * from Addresses;

1;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
2;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
3;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500
4;President Obama;1600 Pennsylvania Avenue NW;;Washington;DC;US;20500

In the absence of any other constraints, your "primary key" identifies a row; it doesn't identify an address. Identifying a row is usually not good enough.

Third, "Address1", "Address2", and "Address3" aren't attributes of addresses. They're attributes of mailing labels. (Lines on a mailing label.) That distinction might not be important to you. It's really important to me.

Fourth, addresses have a lifetime. Between birth and death, they sometimes change. They change when streets get re-routed, buildings get divided, buildings get undivided, and sometimes (I'm pretty sure) when a city employee has a pint too many. Natural disasters can eliminate whole communities. Sometimes buildings get renumbered. In our database, which is tiny compared to most, about 1% per year change like that.

When an address dies, you have to do two things.

  • Make sure nobody uses that address to mail, ship, or whatever.
  • Make sure its death doesn't affect historical data.

When an address itself changes, you have to do two things.

  • Some data must reflect that change. Make sure it does.
  • Some data must not reflect that change. Make sure it doesn't.

Fifth, DRY doesn't apply to foreign keys. Their whole purpose is to be repeated. The only question is how wide a key? An id number is narrow, but requires a join. (10 id numbers might require 10 joins.) An address is wide, but requires no joins. (I'm talking here about a proper address, not a mailing label.)

That's all I can think of off the top of my head.