Required to join 2 tables with their FKs in a 3rd table
so basically I `m following a tutorial question which asks me the below. I am not too sure how to join 2 tables which do not contain the others FK, their (i.e. both of their FKs) are located in a 3rd table. Could I get some help/explanation?
My Answer
SELECT Forest.Fo_name, Species.Sp_name, Species.Sp_woodtype
FROM Forest
INNER JOIN Species
ON Tree.Tr_species=Tree.Tr_forest
WHERE Fo_loc='ARTIC'
ORDER BY Fo_name, Sp_name
"For forests found in the regions coded as "ARTIC" list the forest name & species name and species wood type found therein. Eliminate any duplicates and order the output by forest name and species name"
Species table
+--------------+------------------+------+--------------------------------+
| Field | Type | Key | Glossary |
+--------------+------------------+------+--------------------------------+
| sp_name | C(10) | PK | Species Name |
| sp_woodtype | C(10) | | Wood Yielded by tree |
| sp_maxht | I | | Max. Height |
+--------------+------------------+------+--------------------------------+
Forest table
+--------------+------------------+------+--------------------------------+
| Field | Type | Key | Glossary |
+--------------+------------------+------+--------------------------------+
| Fo_name | C(10) | PK | Forest Name |
| Fo_size | I | | Forest Area |
| Fo_loc | C(10) | | Geographical Area |
| Fo_comp | C(10) | | Forest Owner |
+--------------+------------------+------+--------------------------------+
Tree table
+--------------+------------------+------+---------------------------------------------+
| Field | Type | Key | Glossary |
+--------------+------------------+------+---------------------------------------------+
| Tr_species | C(10) | FK | (FK of species.sp_name |
| Tr_forest | C(10) | FK | (FK of forest.fo_name |
| Tr_numb | I | PK | Sequence number |
| Tr_planted | Date | | Date of planting |
| Tr_loc | C(10) | | Forest quadrant |
| Tr_parent | I | FK | (FK of tree.numb) procreating tree reference|
+--------------+------------------+------+---------------------------------------------+
C(10) & I stand for character (10) & Integer respectively
Solution 1:
The Tree
table is the connection between the Forest table and the Species table. Think of it as two steps:
1) Starting from the Forest table, join to the Tree
table (from Forest.Fo_name
to Tree.Tr_forest
)
2) Now that the Tree is known, join to the Species
table (from Tree.species
to Species.sp_name
)
I'd write the final query like this:
SELECT Forest.Fo_name, Species.Sp_name, Species.Sp_woodtype
FROM Forest
JOIN Tree ON Forest.Fo_name=Tree.Tr_forest
JOIN Species ON Tree.species=Species.sp_name
WHERE Fo_loc='ARTIC'
ORDER BY Fo_name, Sp_name
Solution 2:
Foreign keys are not needed to join tables!
So the answer to how to join tables when no FK is between them is to just join them.
The real question is how do we choose to join what tables (or combine in any other way)?
Statements & tables
Every base table comes with a predicate--statement template parameterized by column names. The table value is the rows that make its predicate into a true proposition--statement.
// species [name] yields [woodtype] and has max height [maxht]
Species(name,woodtype,maxht)
// forest [name] has area [size] in area [loc] and owner [comp]
Forest(name,size,loc,comp)
// tree group [numb] is of species [species] in forest [forest] and was planted in [planted] in quadrant [loc] on date [date] with parent tree group [parent]
Tree(species,forest,numb,planted,loc,parent)
A query also has a predicate. Its value also is the rows that make its predicate true. Its predicate is built up according to its FROM
, WHERE
and other clauses. A table alias names a table value like its base table but with columns prefixed by the alias. So its predicate is its base table's predicate using alias-prefixed columns.
Species s
holds rows satisfying
species [s.name] yields [s.woodtype] and has max height [s.maxht]
CROSS & INNER JOIN put AND between predicates; UNION puts OR between them; EXCEPT inserts AND NOT and ON & WHERE AND in a condition; SELECT renames, adds & drops columns. (Etc for other operators.) So
Species s CROSS JOIN Forest f
holds rows where
species [s.name] yields [s.woodtype] and has max height [s.maxht]
AND forest [f.name] has area [f.size] in area [f.loc] and owner [f.comp]
(No matter what the constraints are!) If you wanted only the rows above having forests named after their wood type then you would just add a condition via ... WHERE f.name=s.woodtype
because that makes the value be rows satisfying ... AND f.name=s.woodtype
.
For forests found in the regions coded as "ARCTIC" list the forest name & species name and species wood type found therein. Eliminate any duplicates and order the output by forest name and species name.
That is a big informal predicate that the rows returned are to satisfy. If we try to rewrite it using only the predicates that we have been given plus AND, OR, AND NOT (etc) then we can only do it by AND
ing all three given predicates (hence, JOIN
of the base table names) and adding AND Forest.loc='ARCTIC'
(hence, ON
or WHERE
that condition).
FKs (etc) and querying (not)
PKs and FKs are special cases of integrity constraints. Given the predicates and what situations can arise, only some database values can arise. That's what integrity constraints describe. They let the DBMS keep out database values that shouldn't arise. (Also, optimize query execution.) Because a name is unique in Species we declare it a key. Ditto for Forest name and Tree numb. Because a species in Tree is a name in Species and name is a key of Species we declare FK Tree.species->Species.name. Ditto for forest and parent. Nothing to do with enabling joins. (Although they imply that a query result satisfies certain constraints, too.)
It does not matter to querying what the constraints are. If there were Tree species values that did not appear as any Species name value because the business rules or Tree or Species predicate(s) were different then there would be no FK Tree.species->Species.name. But each query would continue to return the rows satisfying its predicate as expressed in terms of base table predicates. (Since the possible business situations or predicate(s) would be different, the input rows could be different so output rows could be different.)
What determines query SQL
So the answer to how we choose what tables to join (or combine in any other way) is that we arrange base table names, JOIN, UNION, EXCEPT and WHERE (etc) as appropriate to give a query expression whose predicate is the one we want our rows to satisfy. This is usually taught as something informal to be done by feel but now you know what ties the SQL to the natural language. And constraints are irrelevant.
NOTE: The preceding assumes that we return no duplicates from queries. The reason why there are no duplicates in tables in the relational model is so that the above correspondence between table operators and logic connectives holds. However SQL tables can have duplicates. Where SQL differs from the relational model (in its many ways), there querying becomes less (literally) logical.
Re joins following foreign keys
Re choosing tables
Is there any rule of thumb to construct SQL query from a human-readable description?
Solution 3:
You could do multiple joins. Link the tree table to your main table forest, then link the species table:
SELECT
Forest.Fo_name,
Species.Sp_name,
Species.Sp_woodtype
FROM
Forest
INNER JOIN Tree ON Tree.Tr_forest=Forest.Fo_name
INNER JOIN Species ON Tree.Tr_species = Species.sp_name
WHERE
Fo_loc='ARTIC'
ORDER BY Fo_name, Sp_name
Solution 4:
Try the SQL 99 Method
SELECT DISTINCT F.Fo_name, S.Sp_name, Sp_woodtype
FROM Forest F, Species S, Tree T
WHERE F.Fo_name = T.Tr_Forest
AND S.Sp_name = Tr_species
AND f.Fo_loc = 'ARCTIC';
A F S are aliases used to make the SQL shorter and neater.
DISTINCT will remove duplicits.