Performance of inner join compared to cross join

The effect of issuing an inner join is the same as stating a cross join with the join condition in the WHERE-clause. I noticed that many people in my company use cross joins, where I would use inner joins. I didn't notice any significant performance gain after changing some of these queries and was wondering if it was just a coincidence or if the DBMS optimizes such issues transparently (MySql in our case). And here a concrete example for discussion:

SELECT User.*
FROM User, Address
WHERE User.addressId = Address.id;

SELECT User.*
FROM User
INNER JOIN Address ON (User.addressId = Address.id);

Solution 1:

Cross Joins produce results that consist of every combination of rows from two or more tables. That means if table A has 6 rows and table B has 3 rows, a cross join will result in 18 rows. There is no relationship established between the two tables – you literally just produce every possible combination.

With an inner join, column values from one row of a table are combined with column values from another row of another (or the same) table to form a single row of data.

If a WHERE clause is added to a cross join, it behaves as an inner join as the WHERE imposes a limiting factor.

As long as your queries abide by common sense and vendor specific performance guidelines (i), I like to think of the decision on which type of join to use to be a simple matter of taste.

(i) Vendor Specific Performance Guidelines

  1. MySQL Performance Tuning and Optimization Resources
  2. PostgreSQL Performance Optimization

Solution 2:

There is no difference other than the inner join is a lot clearer because it defines the join, leaving the where clause to be the actual limiting condition.

Solution 3:

Use EXPLAIN to view the query plan for both queries, and see if there's any difference. Quite possibly MySQL will use the same execution plan in both cases. I use the INNER JOIN syntax mainly because it's a lot clearer.

Solution 4:

I find that work-places that allow the first syntax (comma separated tables) tend to have significant time taken up debugging cases where more rows are returned than intended. Unintentional cross joins are the bane of a system, and can bring even the most well-tuned database to it's knees. It has brought our pre-prod system to a screeching halt on at least two occasions in the last year.

The second syntax (join syntax) forces the writer to think about how the tables are joined together first, and then only return the interesting rows. It is impossible to accidentally do a cross join using this syntax, and thus the danger of accidental poorly performing queries is reduced.

However, that issue aside, I have never noticed any speed difference between the two syntaxes in any systems I have had.

Solution 5:

The first example is functionally the same as the second example. However, this syntax should be avoided for several reasons. First it is much easier to accidentally get a cross join when using this syntax especially when there are mulitple joins in the table. If you see a lot of this type of query with the keyword distinct, you probably have someone who is trying to fix the cross joins.

Next, the left and right join syntax using the older style is deprecated and will no longer be supported. Further, it doesn't work correctly now anyway. Sometimes it misinterprets the outer join and sends back the wrong results set. So any queries you have using = or = in the where clause should immediately be replaced.

Third, ANSI standard joins are easier to understand and maintain. An understanding of joins is one of the most critical basic skills that anyone querying any relational database needs to have. It has been my experience that some people who use the older style don't really understand joins and how they work and thus write queries that do not actually do what they intended.