How to document a database [closed]

Solution 1:

In my experience, ER (or UML) diagrams aren't the most useful artifact - with a large number of tables, diagrams (especially reverse engineered ones) are often a big convoluted mess that nobody learns anything from.

For my money, some good human-readable documentation (perhaps supplemented with diagrams of smaller portions of the system) will give you the most mileage. This will include, for each table:

  • Descriptions of what the table means and how it's functionally used (in the UI, etc.)
  • Descriptions of what each attribute means, if it isn't obvious
  • Explanations of the relationships (foreign keys) from this table to others, and vice-versa
  • Explanations of additional constraints and / or triggers
  • Additional explanation of major views & procs that touch the table, if they're not well documented already

With all of the above, don't document for the sake of documenting - documentation that restates the obvious just gets in people's way. Instead, focus on the stuff that confused you at first, and spend a few minutes writing really clear, concise explanations. That'll help you think it through, and it'll massively help other developers who run into these tables for the first time.

As others have mentioned, there are a wide variety of tools to help you manage this, like Enterprise Architect, Red Gate SQL Doc, and the built-in tools from various vendors. But while tool support is helpful (and even critical, in bigger databases), doing the hard work of understanding and explaining the conceptual model of the database is the real win. From that perspective, you can even do it in a text file (though doing it in Wiki form would allow several people to collaborate on adding to that documentation incrementally - so, every time someone figures out something, they can add it to the growing body of documentation instantly).

Solution 2:

One thing to consider is the COMMENT facility built into the DBMS. If you put comments on all of the tables and all of the columns in the DBMS itself, then your documentation will be inside the database system.

Using the COMMENT facility does not make any changes to the schema itself, it only adds data to the USER_TAB_COMMENTS catalog table.

Solution 3:

We use Enterprise Architect for our DB definitions. We include stored procedures, triggers, and all table definitions defined in UML. The three brilliant features of the program are:

  1. Import UML Diagrams from an ODBC Connection.
  2. Generate SQL Scripts (DDL) for the entire DB at once
  3. Generate Custom Templated Documentation of your DB.

You can edit your class / table definitions within the UML tool, and generate a fully descriptive with pictures included document. The autogenerated document can be in multiple formats including MSWord. We have just less than 100 tables in our schema, and it's quite managable.

I've never been more impressed with any other tool in my 10+ years as a developer. EA supports Oracle, MySQL, SQL Server (multiple versions), PostGreSQL, Interbase, DB2, and Access in one fell swoop. Any time I've had problems, their forums have answered my problems promptly. Highly recommended!!

When DB changes come in, we make then in EA, generate the SQL, and check it into our version control (svn). We use Hudson for building, and it auto-builds the database from scripts when it sees you've modified the checked-in sql.

(Mostly stolen from another answer of mine)

Solution 4:

In our team we came to useful approach to documenting legacy large Oracle and SQL Server databases. We use Dataedo for documenting database schema elements (data dictionary) and creating ERD diagrams. Dataedo comes with documentation repository so all your team can work on documenting and reading recent documentation online. And you don’t need to interfere with database (Oracle comments or SQL Server MS_Description).

First you import schema (all tables, views, stored procedures and functions – with triggers, foreign keys etc.). Then you define logical domains/modules and group all objects (drag & drop) into them to be able to analyze and work on smaller chunks of database. For each module you create an ERD diagram and write top level description. Then, as you discover meaning of tables and views write a short description for each. Do the same for each column. Dataedo enables you to add meaningful title for each object and column – it’s useful if object names are vague or invalid. Pro version enables you to describe foreign keys, unique keys/constraints and triggers – which is useful but not essential to understand a database.

You can access documentation through UI or you can export it to PDF or interactive HTML (the latter is available only in Pro version).

Described here is a continuous process rather than one time job. If your database changes (eg. new columns, views) you should sync your documentation on regular basis (couple clicks with Dataedo).

See sample documentation: http://dataedo.com/download/Dataedo%20repository.pdf

Some guidelines on documentation process:

Diagrams:

  • Keep your diagrams small and readable – just include important tables, relations and columns – only the one that have any meaning to understand big picture – primary/business keys, important attributes and relations,
  • Use different color for key tables in a diagram,
  • You can have more than one diagram per module,
  • You can add diagram to description of most important tables/with most relations.

Descriptions:

  • Don’t document the obvious – don’t write description “Document date” for document.date column. If there’s nothing meaningful to add just leave it blank,
  • If objects stored in tables have types or statuses it’s good to list them in general description of a table,
  • Define format that is expected, eg. “mm/dd/yy” for a date that is stored in text field,
  • List all known/important values an it’s meaning, e.g. for status column could be something like this: “Document status: A – Active, C – Cancelled, D – Deleted”,
  • If there’s any API to a table – a view that should be used to read data and function/procedures to insert/update data – list it in the description of table,
  • Describe where does rows/columns’ values come from (procedure, form, interface etc.) ,
  • Use “[deprecated]” mark (or similar) for columns that should not be used (title column is useful for this, explain which field should be used instead in description field).

Solution 5:

Here is a good post on how to approach the database documentation: http://www.simple-talk.com/sql/database-administration/database-documentation---lands-of-trolls-why-and-how/