How should you build your database from source control?
There has been some discussion on the SO community wiki about whether database objects should be version controlled. However, I haven't seen much discussion about the best-practices for creating a build-automation process for database objects.
This has been a contentious point of discussion for my team - particularly since developers and DBAs often have different goals, approaches, and concerns when evaluating the benefits and risks of an automation approach to database deployment.
I would like to hear some ideas from the SO community about what practices have been effective in the real world.
I realize that it is somewhat subjective which practices are really best, but I think a good dialog about what work could be helpful to many folks.
Here are some of my teaser questions about areas of concern in this topic. These are not meant to be a definitive list - rather a starting point for people to help understand what I'm looking for.
- Should both test and production environments be built from source control?
- Should both be built using automation - or should production by built by copying objects from a stable, finalized test environment?
- How do you deal with potential differences between test and production environments in deployment scripts?
- How do you test that the deployment scripts will work as effectively against production as they do in test?
- What types of objects should be version controlled?
- Just code (procedures, packages, triggers, java, etc)?
- Indexes?
- Constraints?
- Table Definitions?
- Table Change Scripts? (eg. ALTER scripts)
- Everything?
- Which types of objects shouldn't be version controlled?
- Sequences?
- Grants?
- User Accounts?
- How should database objects be organized in your SCM repository?
- How do you deal with one-time things like conversion scripts or ALTER scripts?
- How do you deal with retiring objects from the database?
- Who should be responsible for promoting objects from development to test level?
- How do you coordinate changes from multiple developers?
- How do you deal with branching for database objects used by multiple systems?
- What exceptions, if any, can be reasonable made to this process?
- Security issues?
- Data with de-identification concerns?
- Scripts that can't be fully automated?
- How can you make the process resilient and enforceable?
- To developer error?
- To unexpected environmental issues?
- For disaster recovery?
- How do you convince decision makers that the benefits of DB-SCM truly justify the cost?
- Anecdotal evidence?
- Industry research?
- Industry best-practice recommendations?
- Appeals to recognized authorities?
- Cost/Benefit analysis?
- Who should "own" database objects in this model?
- Developers?
- DBAs?
- Data Analysts?
- More than one?
Solution 1:
Here are some some answers to your questions:
- Should both test and production environments be built from source control? YES
- Should both be built using automation - or should production by built by copying objects from a stable, finalized test environment?
- Automation for both. Do NOT copy data between the environments
- How do you deal with potential differences between test and production environments in deployment scripts?
- Use templates, so that actually you would produce different set of scripts for each environment (ex. references to external systems, linked databases, etc)
- How do you test that the deployment scripts will work as effectively against production as they do in test?
- You test them on pre-production environment: test deployment on exact copy of production environment (database and potentially other systems)
- What types of objects should be version controlled?
- Just code (procedures, packages, triggers, java, etc)?
- Indexes?
- Constraints?
- Table Definitions?
- Table Change Scripts? (eg. ALTER scripts)
- Everything?
-
Everything, and:
- Do not forget static data (lookup lists etc), so you do not need to copy ANY data between environments
- Keep only current version of the database scripts (version controlled, of course), and
- Store ALTER scripts: 1 BIG script (or directory of scripts named liked 001_AlterXXX.sql, so that running them in natural sort order will upgrade from version A to B)
- Which types of objects shouldn't be version controlled?
- Sequences?
- Grants?
- User Accounts?
- see 2. If your users/roles (or technical user names) are different between environments, you can still script them using templates (see 1.)
- How should database objects be organized in your SCM repository?
- How do you deal with one-time things like conversion scripts or ALTER scripts?
- see 2.
- How do you deal with retiring objects from the database?
- deleted from DB, removed from source control trunk/tip
- Who should be responsible for promoting objects from development to test level?
- dev/test/release schedule
- How do you coordinate changes from multiple developers?
- try NOT to create a separate database for each developer. you use source-control, right? in this case developers change the database and check-in the scripts. to be completely safe, re-create the database from the scripts during nightly build
- How do you deal with branching for database objects used by multiple systems?
- tough one: try to avoid at all costs.
- What exceptions, if any, can be reasonable made to this process?
- Security issues?
- do not store passwords for test/prod. you may allow it for dev, especially if you have automated daily/nightly DB rebuilds
- Data with de-identification concerns?
- Scripts that can't be fully automated?
- document and store with the release info/ALTER script
- How can you make the process resilient and enforceable?
- To developer error?
- tested with daily build from scratch, and compare the results to the incremental upgrade (from version A to B using ALTER). compare both resulting schema and static data
- To unexpected environmental issues?
- use version control and backups
- compare the PROD database schema to what you think it is, especially before deployment. SuperDuperCool DBA may have fixed a bug that was never in your ticket system :)
- For disaster recovery?
- How do you convince decision makers that the benefits of DB-SCM truly justify the cost?
- Anecdotal evidence?
- Industry research?
- Industry best-practice recommendations?
- Appeals to recognized authorities?
- Cost/Benefit analysis?
- if developers and DBAs agree, you do not need to convince anyone, I think (Unless you need money to buy a software like a dbGhost for MSSQL)
- Who should "own" database objects in this model?
- Developers?
- DBAs?
- Data Analysts?
- More than one?
- Usually DBAs approve the model (before check-in or after as part of code review). They definitely own performance related objects. But in general the team own it [and employer, of course :)]
Solution 2:
I treat the SQL as source-code when possible
If I can write it in standard's compliant SQL then it generally goes in a file in my source control. The file will define as much as possible such as SPs, Table CREATE statements.
I also include dummy data for testing in source control:
- proj/sql/setup_db.sql
- proj/sql/dummy_data.sql
- proj/sql/mssql_specific.sql
- proj/sql/mysql_specific.sql
And then I abstract out all my SQL queries so that I can build the entire project for MySQL, Oracle, MSSQL or anything else.
Build and test automation uses these build-scripts as they are as important as the app source and tests everything from integrity through triggers, procedures and logging.