Create a report with an query

I have a problem. Consider the following fact and dimension tables in a ROLAP system that collects values of harmful substances measured in foods that are sold in supermarkets.

Fact table:

• Contaminants (TimeID, ShopID, FoodID, substance, quantityPerOunce)
This describes which harmful substance in which quantity was found on a given
food in a given supermarket at a given time.

Dimension tables:

• Time (TimeID, dayNr, dayName, weekNr, monthNr, year)
• Food (FoodID, foodName, brand, foodType)
  Example data: (43, egg, Bioland, animalProduct)
• Place (ShopID, name, street1, region, country)

Write one SQL statement to create a report that answers the following query:

  • List the minimum quantities of the substance "PCB" in animal products and vegetables (both are foodTypes) that were measured per year in the regions Sachsen, Thüringen, and Hessen in Germany.
  • The result should contain years, regions, and the minimum values.

With the same statement, also list

  • the minimum values per year (i.e. aggregating over all regions in each year)
  • as well as a grand total with the minimum quantity of PCB in the mentioned regions for animal products and vegetables over all years and all regions.

SQL query

SELECT years, regions, min(quantityPerOunce)
FROM Contaminants as c, Time as t, Food as f, Place as p
WHERE c.TimeID = t.TimeID
AND c.FoodID = f.FoodID
AND c.ShopdID = p.ShopID
AND substance = "PCB"
AND foodType = "vegetables"
AND foodType = "animalProducts"
GROUP BY regions;

I don't know how to solve this kind of exercise. I tried it, but I don't know. And the join should be Equi-Join even if this not the best way.


Solution 1:

You are close. First, remember that in GROUP BY queries, the non-aggregate fields in your SELECT must also appear on the GROUP BY line. So, you should have:

GROUP BY years, regions;

Further, if you use this:

foodType = 'vegetables' AND foodType = 'animalProducts'

the query will return nothing, because the foodType can't be both at the same time.

As such, you need this:

(foodType = 'vegetables' OR foodType = 'animalProducts')

or alternatively:

foodType IN ('vegetables','animalProducts')

Your query assumes that regions only contains the three listed regions. If you aren't 100% sure about that, it would be better to specify them explicitly with:

AND regions IN ('Sachsen', 'Thüringen', 'Hessen')

This alone also assumes that these regions are only in Germany. This may be true. It might not be though, so it would be safest to also add:

AND country = 'Germany'

So, something along these lines:

SELECT years, regions, MIN(quantityPerOunce) AS min_quantityPerOunce
FROM Contaminants as c, Time as t, Food as f, Place as p
WHERE c.TimeID = t.TimeID
AND c.FoodID = f.FoodID
AND c.ShopdID = p.ShopID
AND substance = 'PCB'
AND foodType IN ('vegetables','animalProducts')
AND regions IN ('Sachsen', 'Thüringen', 'Hessen')
AND country = 'Germany'
GROUP BY years, regions;

Forgive me if I'm mistaken, but it does seem like this might be a school assignment, so it may help to think about general principles in the future:

  1. Identify ALL the nouns in the problem statement (the names of the regions, the name of the country, the names of the food types, the name of the substance) and make sure they are all represented in the query. They likely wouldn't be mentioned in the problem statement / client request if they weren't important. This is a good rule of thumb for professional settings as well as educational settings.

  2. As a rule, fields in the SELECT which aren't aggregates must also be in the GROUP BY. You can have fields in the GROUP BY which are not in the SELECT, but this is far less common.

  3. For parts of the request which list some items from the same field (regions, for example), use field IN (item1,item2,...,itemX) to allow an OR operator on each of the items.

As an addendum, if you have a dimension table called Time, you may want to enclose the name in double-quotes in some systems to avoid confusion with what is normally a system name of some kind.