Why isn't SQL ANSI-92 standard better adopted over ANSI-89?

At every company I have worked at, I have found that people are still writing their SQL queries in the ANSI-89 standard:

select a.id, b.id, b.address_1
from person a, address b
where a.id = b.id

rather than the ANSI-92 standard:

select a.id, b.id, b.address_1
from person a
inner join address b
on a.id = b.id

For an extremely simple query like this, there's not a big difference in readability, but for large queries I find that having my join criteria grouped in with listing out the table makes it much easier to see where I might have issues in my join, and let's me keep all my filtering in my WHERE clause. Not to mention that I feel that outer joins are much intuitive than the (+) syntax in Oracle.

As I try to evangelize ANSI-92 to people, are there any concrete performance benefits in using ANSI-92 over ANSI-89? I would try it on my own, but the Oracle setups we have here don't allow us to use EXPLAIN PLAN - wouldn't want people to try to optimize their code, would ya?

According to "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, of the six or eight RDBMS brands they tested, there was no difference in optimization or performance of SQL-89 versus SQL-92 style joins. One can assume that most RDBMS engines transform the syntax into an internal representation before optimizing or executing the query, so the human-readable syntax makes no difference.

I also try to evangelize the SQL-92 syntax. Sixteen years after it was approved, it's about time people start using it! And all brands of SQL database now support it, so there's no reason to continue to use the nonstandard (+) Oracle syntax or *= Microsoft/Sybase syntax.

As for why it's so hard to break the developer community of the SQL-89 habit, I can only assume that there's a large "base of the pyramid" of programmers who code by copy & paste, using ancient examples from books, magazine articles, or another code base, and these people don't learn new syntax abstractly. Some people pattern-match, and some people learn by rote.

I am gradually seeing people using SQL-92 syntax more frequently than I used to, though. I've been answering SQL questions online since 1994.

Well the ANSI092 standard includes some pretty heinous syntax. Natural Joins are one and the USING Clause is another. IMHO, the addition of a column to a table shouldn't break code but a NATURAL JOIN breaks in a most egregious fashion. The "best" way to break is by compilation error. For example if you SELECT * somewhere, the addition of a column could fail to compile. The next best way to fail would be a run time error. It's worse because your users may see it, but it still gives you a nice warning that you've broken something. If you use ANSI92 and write queries with NATURAL joins, it won't break at compile time and it won't break at run time, the query will just suddenly start producing wrong results. These types of bugs are insidious. Reports go wrong, potentially financial disclosure are incorrect.

For those unfamiliar with NATURAL Joins. They join two tables on every column name that exists in both tables. Which is really cool when you have a 4 column key and you're sick of typing it. The problem comes in when Table1 has a pre-existing column named DESCRIPTION and you add a new column to Table2 named, oh I don't know, something innocuous like, mmm, DESCRIPTION and now you're joining the two tables on a VARCHAR2(1000) field that is free form.

The USING clause can lead to total ambiguity in addition to the problem described above. In another SO post, someone showed this ANSI-92 SQL and asked for help reading it.

SELECT c.* 
FROM companies AS c 
JOIN users AS u USING(companyid) 
JOIN jobs AS j USING(userid) 
JOIN useraccounts AS us USING(userid) 
WHERE j.jobid = 123

This is completely ambiguous. I put a UserID column in both Companies and user tables and there's no complaint. What if the UserID column in companies is the ID of the last person to modify that row?

I'm serious, Can anyone explain why such ambiguity was necessary? Why is it built straight into the standard?

I think Bill is correct that there is a large base of developer who copy/paste there way through coding. In fact, I can admit that I'm kind of one when it comes to ANSI-92. Every example I ever saw showed multiple joins being nested in parentheses. Honesty, that makes picking out the tables in the sql difficult at best. But then an SQL92 evangilist explained that would actually force a join order. JESUS... all those Copy pasters I've seen are now actually forcing a join order - a job that's 95% of the time better left to optimizers especially a copy/paster.

Tomalak got it right when he said,

people don't switch to new syntax just because it is there

It has to give me something and I don't see an upside. And if there is an upside, the negatives are an albatross too big to be ignored.

A few reasons come to mind:

people do it out of habit
people are lazy and prefer the "old style" joins because they involve less typing
beginners often have their problems wrapping their heads around the SQL-92 join syntax
people don't switch to new syntax just because it is there
people are unaware of the benefits the new (if you want to call it that) syntax has, primarily that it enables you to filter a table before you do an outer join, and not after it when all you have is the WHERE clause.

For my part, I do all my joins in the SQL-92 syntax, and I convert code where I can. It's the cleaner, more readable and powerful way to do it. But it's hard to convince someone to use the new style, when they think it hurts them in terms of more typing work while not changing the query result.

In response to the NATURAL JOIN and USING post above.

WHY would you ever see the need to use these - they weren't available in ANSI-89 and were added for ANSI-92 as what I can only see as a shortcut.

I would never leave a join to chance and would always specify the table/alias and id.

For me, the only way to go is ANSI-92. It is more verbose and the syntax isn't liked by ANSI-89 followers but it neatly separates your JOINS from your FILTERING.

First let me say that in SQL Server the outer join syntax (*=) does not give correct results all the time. There are times when it interprets that as a cross join and not an outer join. So right there is a good reason to stop using it. And that outer join syntax is a deprecated feature and will not be in the next version of SQL Server after SQL Server 2008. You'll still be able to do the inner joins but why on earth would anyone want to? They are unclear and much much harder to maintain. You don't easily know what is part of the join and what is really just the where clause.

One reason why I believe you should not use the old syntax is that understanding joins and what they do and do not do is a critical step for anyone who will write SQL code. You should not write any SQL code without understanding joins thoroughly. If you understand them well, you will probably come to the conclusion that the ANSI-92 syntax is clearer and easier to maintain. I've never met a SQL expert who didn't use the ANSI-92 syntax in preference to the old syntax.

Most people who I have met or dealt with who use the old code, truly don't understand joins and thus get into trouble when querying the database. This is my personal experience so I'm not saying it is always true. But as a data specialist, I've had to fix too much of this junk through the years not to believe it.

Why isn't SQL ANSI-92 standard better adopted over ANSI-89?

Related

Recent Posts