Name database design notation you prefer and why?

Extended 11 Dec 10

What does the question mean ?

Before answering the "notation" question, which is the small visible issue on the surface, we need to understand the issues that underpin it, which lead to the surface issues. Otherwise the relevance to the question is reduced to personal preferences; what some product or other provides, whether it is free, etc.

Standards

I assume that the reader has no argument that bridges built for the public out of the public purse (as opposed to hobby bridges built on private land for personal use) need to have standards, in order to ensure that they do not fall over; that they can carry the declared traffic, and that people will not die while using them. The intended standards are first presented by highly acknowledged and recognised engineering experts, and heavily practised and tested by their peers, eventually attaining the status of a Standard, and being declared as such by government standards bodies. The authorities then simply identify the standards that all suppliers must conform to, in order to supply bridges to the government.

It is a waste of time to engage is discussion with non-compliant engineering companies, as what or why their sub-standard offering is worthy of evaluation. Or to look at diagrams that have specific information (demanded by the Standard) missing from them. Or to talk to companies presenting themselves as bridge builders, but who have no civil engineers on the payroll.

Standards are Normalised

Another important aspect of Standards, is that there is no repetition. The bridge building standard only has to refer to the electrical standard for all wiring on the bridge, it does not repeat the content. Standards are Normalised. This allows each Standard to progress and change (eg. as appropriate, due to new building materials becoming available), without affecting other standards or their compliance.

Likewise, Standards do not compete. If one complies with the bridge building standard, there is no danger that one has broken the communications standard.

Standards relate to Higher Principles

Standards are thus the higher principles that every supplier who genuinely supplies quality and performance, will eagerly comply with. And the others will range from reluctant compliance all the way through to ignorance that an applicable standard exists.

Standards are Not a Notation

Third, the Standard is not merely a set of symbols and notations that the diagram must comply with. The unqualified and inexperienced bridge builders may say it is, because of their low level of understanding of the entire set of knowledge required to build faultless bridges, but no. The Standard is always a methodology or set of explicit steps, which are prescribed for the process, which if adhered to, produce the decisions that need to be made at each step, in a progression, so that each decision can be based on previous decisions that have been confirmed, and that can be relied upon. A simple diagram progresses through prescribed standard-compliant stages or phases, with complexity and notations being progressively added, such that the final diagram is compliant.

Standards are Complete

The fourth issue has to do with ensuring that the information conveyed in a diagram is complete and unambiguous. The Standard ensures that the information required is complete. The exercise of the methodology means that ambiguities have been formally identified and resolved. Sub-standard diagrams do not include that requirement of essential Standard information, they are incomplete and contain ambiguities.

Standards are Easy

In addition to the confidence of achieving a certain level of quality, it is actually easier and faster to go through the standard methodology. It is absurd for the non-complaint companies to retro-fit their diagrams by merely supplying standard notation to them. The absence of the prescribed process is visible to any standard-aware person, and the diagram will conflicted (the components s lacking integration).

Responsible, aware customers (government departments, aircraft manufacturers ... companies that expect to be around in the next decade) have reasonable expectations that the software it purchases from suppliers will be of a certain quality; easy to enhance and extend; perform well; will integrate easily with other software; etc.

Lack of Standards in IT

The problem with the IT industry is, unlike the car manufacturing or bridge building industries, since it has exploded in the last 30 years, we have been flooded with providers who are:

  • not educated in IT (do not know the formal steps required for the project)
  • not qualified in IT (have no idea re what is right and wrong)
  • not experienced in IT (they are experienced in what they know)
  • unaware of Standards
  • work in isolation from their qualified peers
  • work in the comfort and ease, the isolation, of a single vendor's "products"
  • have false confidence based on success in isolation
  • Unskilled and Unaware of It

This has been the subject of much research and publications, it is not merely my professional opinion, the credit goes to the academics who did the hard work to sift through and identify exactly what the specific issues are.

So in terms of the whole population of IT providers, IT companies as well as IT employees in non-IT companies, the awareness of quality, of the standards required to provide quality, and their importance, is much lower than it was 30 years ago. It is a 'build and forget' mentality; if it doesn't work, throw it out and build another one. But for the big end of town, the responsible aware customers, that is not acceptable.

Standards are Not Single-Vendor

By definition, Standards are arrived at by multiple vendors, at the international level.

One huge problem in the industry is, in spite of having good vendors who supply good tools for real modelling (the resulting diagrams which comply with Standards), we also have vendors who supply a full range of absurd pictures and non-compliant diagrams. This allows IT people to produce good-looking but completely meaningless diagrams. The point is, these horrible little tools actually give people confidence, that they have built a good bridge, a good database. First they create a bunch of spreadsheets, which they load into a container called a "database" and the software helps them think that they now have a "database"; then they use a tool and reverse-engineer the "database" to produce a "data model". It gives them false confidence; they are unaware that they have a funny picture of a bucket of fish, and they feel offended if anyone points that out. Stick with the tools that are standard-compliant by declaration and certification, and toss those that aren't.

The single-vendor non-standard tools may give one comfort and ease, and a false sense of confidence, in isolation. But they cannot be used if one wants to achieve diagrams that convey all required info; confidence that is gained by acceptance of qualified peers; quality that is derived from a prescribed set of steps; confidence that one does not have to keep on reworking the model.

Conventions are Not Standards

And before anyone points out the the horrible tools are "de facto standards", no they aren't, they are common conventions, with no reference to Standards, and we should not confuse the two by using the word "standard" in reference to them. The way one gets out of bed and makes coffee is a convention, possibly very common, but it is not a "standard". Your common vendor, in their commercial interest, may have commercialised you into believing that there is only one "standard" coffee machine and one "standard" coffee bean, but that bears no relation to the Standards to which the coffee machine manufacturer must comply, or the Standard of coffee bean imported into the country.

There is the mis-quotation that MS is infamous for, comparing the progress of car industry to the "progress" of Microsoft, which the car industry responded to publicly, with justified indignation, and wiped the grin off Bill Gate's face. Sun Microsystems also responded, famously, but I doubt that is known in MS circles. Note that MS credibility is gained by sheer volume: the number of internet sites providing and exchanging ever-changing "information" among MS devotees. They are isolated from genuine qualifications and Standards, and falsely believe that single-vendor conventions, and partial performance, using nice-looking pictures, actually comprise "software".

Standards are Not Expensive

That does not mean you have to buy expensive tools. For small projects, it is quite acceptable to draw diagrams using simple drawing tools, because the compliance-to-standards is in the cranium, in the prescribed methodology, not in the tool, and therefore it is possible for a qualified, standards-aware person to produce standard-compliant drawings (for small projects) with any almost tool. And note that those horrible tools which misrepresent themselves do not provide the standard notation; the vast majority of "data models" and "entity relation diagrams" out there, are grossly sub-standard.

Standards re Relational Databases

Standards or precise definitions or specific guidelines, for the following have existed for over 30 years. In progressive order, each containing the previous:

  1. Normalisation (for any Database)
    (a Standard is not required, it is an engineering principle; common across many disciplines; a simple process for those qualified; and absence of it is easily identified.)

  2. Relational Databases
    The Relational Model: E F Codd & C J Date

  3. The famous Twelve Rules of Relational Compliance
    E F Codd

  4. Relational Data Modelling
    IDEF1X; 1980's; NIST Standard FIPS 184 of 1993

There are many suppliers who have practised these methodologies, as prescribed, thereby conforming to the Standards, for up to 30 years.

  • Note, there is only one Relational Data Modelling Standard, there is no conflict.

  • Note, the notation is just what you see on the surface, the result, however it does identify that other notations have info missing from them, and the underlying methodology was not followed.

  • Note well, that Normalisation pre-dated the Relational Model, and was taken as given; that is why the RM does not have specific references to Normalisation as a principle, and only identifies the Normal Forms, in terms of the requirement for Relational Databases.

If you are genuinely qualified as a Relational Database Modeller, you will be intimately familiar with the first three; if you are a standard-compliant Relational Database Modeller, you will be intimately familiar with the fourth. Again, you cannot reasonably expect to comply with the Standard merely by learning the IDEF1X Notation, you need to actually learn and practise the methodology, but learning the Notation may be a reasonable introduction.

[Compliance to] Standard is Demanded [by Some]

There are aware, responsible customers, who demand compliance with these Standards.

And then there is the rest of field, both the unaware customers and the unaware suppliers, and everything in-between.

Which Notation ?

For most standards-aware practitioners, there is no need to discuss "which notation to use", given that there is only one Standard. Why would I draw a diagram (using either an expensive tool in a large project, or a simple drawing tool to answer a question on Stack Overflow), using some other notation when there is no other standard ? Why would I convey less than the Standard information, when I can convey the Standard info just as easily ? Why would I avoid using the Standard, when I know that using the Standard provides the formal confidence that the Data Model is correct, and will stand up to scrutiny (as opposed to the imagined confidence that is punctured by most cursory questioning) ?

If, and when, some qualified, recognised organisation comes up with a new methodology (and believe me, they do, all the time), we look into it. If and when the methodology achieves academic peer recognition and acknowledgement, we will take it seriously, try it out, become adept with it. If and when it is declared a Standard by the international standards bodies (as opposed to single vendor), we will provide it. Until then, we provide the full compliance to Standards that do exist.

Future Notations & Conventions

The couple of hundred single-vendor offerings in the last 20 years were not worth the time spent in investigation. Therefore single-vendor conventions, be they labelled "standards" or not, are not worth the time spent in investigation. In fact, since the Standard exists, and pre-dated the advent of the single vendor, any single-vendor offering would be an implicit declaration that they cannot comply with the Standard, and they are offering an anti-standard as a substitution.

Responses to Comments

  1. The easiest way to refute an amateur's allegation that some rubbish is a "standard" is to ask for its ISO/ANSI/IEC/NIST/etc publication data. As per (4) above IDEF1X is a published Standard, easy to look up.

  2. MS is famous for publishing anti-standards, and calling them "standards". The correct term is "convention". Wikipedia might call some MS notation a standard this week, but I've already noted that Wikipedia is not a reliable source. Remember, a single-vendor offering is, by definition, not a standard.

  3. IDEF1X is also a business process modelling standard

Not quite. The IDEF1X link will take you to the organisation that is most responsible for publicising it and educating people about it. If you check the tabs on that page, you will find several standards. One of the great powers (beauties?) of Standards is that they are integrated:

  • IDEF stands for I ntegrated Def inition
    • 0 is for Function Modelling
    • 1 is for Information Modelling - X is for Relational Database Modelling
  • they are all Standards published by NIST
    .
    I state that in my IDEF1X Notation document as well.
    .

11 Dec 10

  1. What is your attitude to design databases by UML notation (diagram)? The rationale is that it is broadly (and ubiquitously) known and to minimize the number of notations to know for this and that purposes

First, I would ask, show me a UML "Relational Data Model" that has anywhere near the exposition of detail and subtlety of an IDEF1X model, and then we have something to discuss. Until then, it is idle speculation, pub talk by non-relational people, about what they do not know, from a framework of what they do know.

But I won't avoid the question.

Second, there is a big problem, with horrendous consequences, when people have a fixed mindset about an area of knowledge (Good Thing), but then they approach every other area that they do not know, with the same mindset, unwilling to learn the specialised skills. Those poor people have not read about Maslow's Hammer. The OO types are the biggest offenders. If I have answered this question once, I have answered it ten times, and I have only been here 3 weeks. They ask, as if they were the first person to find this problem, "how do I persist my object classes into a database", and they treat the database (forget Relational) like it is a rubbish bin.

Scott Ambler and Martin Fowler have a lot to answer for when they meet their creator. Complete idiots, except for the income. First they write books on how to model objects the wrong way. Then they turn around (wait for it) and write books about how to fix the problem that they created. Sinful. And this isn't just my opinion, many other real industry leaders (as opposed to published idiots) make similar comments, they are famously written up in "Laugh or Cry" pages. Imagine "refactoring" a database. Only someone who has never read anything about databases would do that. A whole textbook on how to do that. Written by fools who have never seen a real database.

Any serious, experienced modeller knows that if you design (model) the database correctly, it never needs refactoring.

The only "databases" that need refactoring are those created by people who treat the db like trash, after reading said "books", and they have explicit steps, on how to keep trashing it, over and over again. You wait, next year they will have "multifactoring".

What's the point ? They never treated the database with respect; never learned about it; how to approach data transfers; how to model it. They just "modelled" it with a Hammer. To someone like me, who fixes these problems in a way that they never come back, "How do I model my multi-level objects classes" tells me immediately they are so clueless, they are "persisting" their Object models into the db, and have not even read enough to understand that the accurate question is "How do I model my Subtypes".

These are the issues on the surface. The flaws are fundamental and deeper, and affect everything they do re the database. Don't believe me, wait just a small amount of time after the "app" goes into production, and it hits the famous wall. They hit it so often, they even have an OO name for it: Object Relational Impedance Mismatch. Very scientific sounding name for plain stupidity. It hasn't occurred to them that if they designed the Relational database as a Relational database, and the OO app as an OO app, with a nice defined boundary, a transport layer between them, there would never be "Object Relational Impedance Mismatch".

The app (any language) and the db is like a good marriage. Each is completely different, they have their own needs, and they need each other. When they work together, it is a marriage made in heaven. As the great prophet Khalil Gibran wrote On Marriage:

Fill each other's cup but drink not from one cup ...
For the pillars of the temple stand apart,
And the oak tree and the cypress grow not in each other's shadow

When one partner treats the other like a slave, a receptacle, like they need not be recognised and understood, divorce and domestic violence are only a short distance away. Refactoring is merely a set of steps on how to make the right choice for your next mail order bride, how to train them to do the chores. Fixes the symptom for this month, but it does not go anywhere near the causative problem.

  • you can't "persist" object classes into a database. It is 2010, we have been writing ACID transactions for 30 years, not persistence objects. If you write persistence objects, you will have a single user app, with no concurrency, massively inefficient, full of broken transactions and data integrity problems.

  • you can't "design" databases like they are "object classes", because they aren't objects or classes. So stop wasting time, and learn how to design data in a multi-user location with some integrity. Or give the job to someone who knows how.

  • using OO notation or UML notation treats the database as a collection of objects, it only reinforces the Hammer, and makes sure it is the latest hardened steel with an imported Elm handle.

  • you can have a perfectly good marriage with a mail order bride. All it takes is a bit of recognition and respect.

    • that means, learn the terminology and the notation. Even if you gave the job to someone skilled, when they give you the finished diagram, you need to understand it. that is the boundary. You can't go "EeeeK! I've never seen that before".

    • you can't have some of the features of the database, but ignore the other fundamental requirements. I am certainly not saying that it is all-or-nothing, that too is immature. But you have to have a basic understanding of the person you are marrying; the better the understanding, the better the marriage.

    • you cannot open the database box, without addressing multiple online users; concurrency (and thus data currency); transactions; data and referential integrity; etc. These idiots (Fowler and Ambler, not the readers) are re-inventing the wheel, and they have not reached the wooden spoke stage yet; they have not recognised that the fat round thing that is bolted together is the impedance itself. Their "persistence objects" suffer all the problems (such as Lost Updates, avoiding low concurrency) that we eliminated 30 years ago

  • data that is modelled correctly does not change (it is easily extended, but that which exists, does not change). Apps come and go like ships. Object classes come and go with the developer. Therefore, if there were to be an order in the hierarchy (instead of an equal relationship), then the object should be modelled after the data.

    • note also, that well written apps (to Standard) are impervious to such changes; apps which take the "database is a slave" approach are brittle, and break at the slightest change; these are grossly sub-standard apps. But the OO people do not see that, they see "Impedance Mismatch".

    • if (a) the app and the database have reasonable independence, and (b) the boundaries are clear, each side can be changed and extended without affecting the other side.

    • alternately, if the app is truly the one-and-only main event, then to make it successful (avoid "refactoring" every year or so; write it correctly, once, so that it lasts), forget about databases, keep your objects on the C:\ drive, and persist them.

That's why, twenty years ago, some of us were saying, publishing articles, that Ambler and Fowler have it backwards. It is no wonder they keep crashing into things and refactoring themselves.

It is noteworthy that the secret behind Agile, is that it is fully Normalised. That is a 180 degree change for Ambler, his published works, so it is no surprise that it is something he cannot herald and declare openly.

And just to make sure it doesn't get lost in the wash. The Notation is on the surface, but it is telling, of what is inside. IDEF1X tells you about the Relational mindset of the person who modelled the database. UML Notation for a "relational database" tells you the mindset of the person who factored the data heap, and who expects to refactor it many, many times. Choose carefully.

I have more than a Hammer in my toolbox.

  • When I model data, I use IDEF1X
  • When I model functions, I use IDEF0 or SSADM (depending on the maturity of the users)
  • When I model objects, I use UML

I ride horses, shoot deer, fight fires, and chase women. Each activity has a different set of principles and rules, which I must follow in order for me to succeed reasonably. Imagine what life would be like if I mixed them up. Or if I could only shoot.


For ER models I prefer the IEM or Barker style just because I find that more people understand the crow's feet notation. If you want a model to be understood by a wider audience than yourself then it makes sense to use a recognised standard notation.

As for database vendor tools, that depends. I've never found it essential to use any particular tool, except where a customer was already using one. Oracle and Sybase have decent diagram tools. Microsoft Visio supports the standard notations, although as a database design tool it isn't as powerful as many others.

The only really bad example I can think of is the so-called design tool built into the Microsoft SQL Server toolset. It's just a joke. Completely unusable for any serious purpose and I don't know anyone who uses it except in Microsoft publications.