Difference between INNER JOIN and LEFT SEMI JOIN
Solution 1:
An INNER JOIN
can return data from the columns from both tables, and can duplicate values of records on either side have more than one match. A LEFT SEMI JOIN
can only return columns from the left-hand table, and yields one of each record from the left-hand table where there is one or more matches in the right-hand table (regardless of the number of matches). It's equivalent to (in standard SQL):
SELECT name
FROM table_1 a
WHERE EXISTS(
SELECT * FROM table_2 b WHERE (a.name=b.name))
If there are multiple matching rows in the right-hand column, an INNER JOIN
will return one row for each match on the right table, while a LEFT SEMI JOIN
only returns the rows from the left table, regardless of the number of matching rows on the right side. That's why you're seeing a different number of rows in your result.
I am trying to get the names within table_1 that only appear in table_2.
Then a LEFT SEMI JOIN
is the appropriate query to use.
Solution 2:
Suppose there are 2 tables TableA and TableB with only 2 columns (Id, Data) and following data:
TableA:
+----+---------+
| Id | Data |
+----+---------+
| 1 | DataA11 |
| 1 | DataA12 |
| 1 | DataA13 |
| 2 | DataA21 |
| 3 | DataA31 |
+----+---------+
TableB:
+----+---------+
| Id | Data |
+----+---------+
| 1 | DataB11 |
| 2 | DataB21 |
| 2 | DataB22 |
| 2 | DataB23 |
| 4 | DataB41 |
+----+---------+
Inner Join on column Id
will return columns from both the tables and only the matching records:
.----.---------.----.---------.
| Id | Data | Id | Data |
:----+---------+----+---------:
| 1 | DataA11 | 1 | DataB11 |
:----+---------+----+---------:
| 1 | DataA12 | 1 | DataB11 |
:----+---------+----+---------:
| 1 | DataA13 | 1 | DataB11 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB21 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB22 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB23 |
'----'---------'----'---------'
Left Join (or Left Outer join) on column Id
will return columns from both the tables and matching records with records from left table (Null values from right table):
.----.---------.----.---------.
| Id | Data | Id | Data |
:----+---------+----+---------:
| 1 | DataA11 | 1 | DataB11 |
:----+---------+----+---------:
| 1 | DataA12 | 1 | DataB11 |
:----+---------+----+---------:
| 1 | DataA13 | 1 | DataB11 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB21 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB22 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB23 |
:----+---------+----+---------:
| 3 | DataA31 | | |
'----'---------'----'---------'
Right Join (or Right Outer join) on column Id
will return columns from both the tables and matching records with records from right table (Null values from left table):
┌────┬─────────┬────┬─────────┐
│ Id │ Data │ Id │ Data │
├────┼─────────┼────┼─────────┤
│ 1 │ DataA11 │ 1 │ DataB11 │
│ 1 │ DataA12 │ 1 │ DataB11 │
│ 1 │ DataA13 │ 1 │ DataB11 │
│ 2 │ DataA21 │ 2 │ DataB21 │
│ 2 │ DataA21 │ 2 │ DataB22 │
│ 2 │ DataA21 │ 2 │ DataB23 │
│ │ │ 4 │ DataB41 │
└────┴─────────┴────┴─────────┘
Full Outer Join on column Id
will return columns from both the tables and matching records with records from left table (Null values from right table) and records from right table (Null values from left table):
╔════╦═════════╦════╦═════════╗
║ Id ║ Data ║ Id ║ Data ║
╠════╬═════════╬════╬═════════╣
║ - ║ ║ ║ ║
║ 1 ║ DataA11 ║ 1 ║ DataB11 ║
║ 1 ║ DataA12 ║ 1 ║ DataB11 ║
║ 1 ║ DataA13 ║ 1 ║ DataB11 ║
║ 2 ║ DataA21 ║ 2 ║ DataB21 ║
║ 2 ║ DataA21 ║ 2 ║ DataB22 ║
║ 2 ║ DataA21 ║ 2 ║ DataB23 ║
║ 3 ║ DataA31 ║ ║ ║
║ ║ ║ 4 ║ DataB41 ║
╚════╩═════════╩════╩═════════╝
Left Semi Join on column Id
will return columns only from left table and matching records only from left table:
┌────┬─────────┐
│ Id │ Data │
├────┼─────────┤
│ 1 │ DataA11 │
│ 1 │ DataA12 │
│ 1 │ DataA13 │
│ 2 │ DataA21 │
└────┴─────────┘