Best way to model Customer <--> Address

Every Customer has a physical address and an optional mailing address. What is your preferred way to model this?

Option 1. Customer has foreign key to Address

   Customer   (id, phys_address_id, mail_address_id)
   Address    (id, street, city, etc.)

Option 2. Customer has one-to-many relationship to Address, which contains a field to describe the address type

   Customer   (id)
   Address    (id, customer_id, address_type, street, city, etc.)

Option 3. Address information is de-normalized and stored in Customer

   Customer   (id, phys_street, phys_city, etc. mail_street, mail_city, etc.)

One of my overriding goals is to simplify the object-relational mappings, so I'm leaning towards the first approach. What are your thoughts?


Solution 1:

I tend towards first approach for all the usual reasons of normalisation. This approach also makes it easier to perform data cleansing on mailing details.

If you are possibly going to allow multiple addresses (mail, residential, etc) or wish to be able to use effective dates, consider this approach

   Customer   (id, phys_address_id)
   Cust_address_type (cust_id, mail_address_id, address_type, start_date, end_date)
   Address    (id, street, city, etc.)

Solution 2:

One important fact you may need to consider (depending on your problem domain) is that people change addresses, and may want to let you know in advance of their address change; this is certainly true for utility companies, telcos, etc.

In this case you need to have a way to store multiple addresses for the customer with validity dates, so that the address can be set up in advance and automatically switch at the correct point. If this is a requirement, then a variation on (2) is the only sensible way to model it, e.g.

Customer (id, ...)
Address (id, customer_id, address_type, valid_from, valid_to)

On the other hand, if you don't need to cater for this (and you're sure you won't in the future) then probably (1) is simpler to manage because it's much easier to maintain data integrity as there's no issues with ensuring only one address of the same type exists, and the joins become simpler as they're only on one field.

So either (1) or (2) are fine depending on whether you need house-moves, but I'd steer clear of (3) because you're then repeating the definition of what an address is in the table, and you'll have to add multiple columns if you change what an address looks like. It's possibly slightly more performant, but to be honest when you're dealing with properly indexed joins in a relational database there isn't a lot to be gained, and it's likely to be slower in some scenarios where you don't need the address as the record size for a customer will be larger.

Solution 3:

We are moving forward with a model like this:

Person (id, given_name, family_name, title, suffix, birth_date)
Address (id, culture_id, line1, line2, city, state, zipCode, province, postalCode)
AddressType (id, descriptiveName)
PersonAddress (person_id, address_id, addressType_id, activeDates)

Most may consider this excessive. However, an undeniable common theme amongst the apps we develop is that they will have some of these fundamental entities - People, Organizations, Addresses, Phone Numbers, etc.. - and they all want to combine them in different ways. So, we're building in some generalization up-front that we are 100% certain we have use cases for.

The Address table will follow a table-per-hierarchy inheritance scheme to differentiate addresses based on culture; so a United States address will have a state and zip field, but Canadian addresses will have a province and postal code.

We use a separate connecting table to "give" a person an address. This keeps our other entities - Person & Address - free from ties to other entities when our experience is this tends to complicate matters down the road. It also makes it far simpler to connect Address entities to many other types of entities (People, Organizations, etc.) and with different contextual information associated with the link (like activeDates in my example).

Solution 4:

The second option would probably be the way I would go. And on the off-chance it would let users add additional address' (If you wanted to let them do that), that they could switch between at will for shipping and such.

Solution 5:

I'd prefer #1. Good normalization and communicates intent clearly. This model also allows the same address object (row) to be used for both addresses, something I have found to be quite valuable. It's far too easy to get lost in duplicating this information too much.