SQL Select from table where joined values from a second table are a subset of values from a third table
I have the following tables in MS SQL Server: Tasks, Users, Tags, TaskTags (maps a task to a tag), and UserTags (maps a user to a tag).
Given a User U, I want to find all tasks T where every tag of T is also a tag of U (e.g. a task should be returned if its tags are a subset of the user's tags).
Here is a table script with some sample data (it can be run at http://sqlfiddle.com/ with MS SQL Server 17):
CREATE TABLE [dbo].[Tasks](
[TaskId] [int] NOT NULL PRIMARY KEY,
[TaskName] [nvarchar](MAX) NOT NULL
)
CREATE TABLE [dbo].[Users](
[UserId] [int] NOT NULL PRIMARY KEY,
[UserName] [nvarchar](MAX) NOT NULL
)
CREATE TABLE [dbo].[Tags](
[TagId] [int] NOT NULL PRIMARY KEY,
[TagName] [nvarchar](MAX) NOT NULL
)
CREATE TABLE [dbo].[TaskTags](
[TaskId] [int] NOT NULL,
[TagId] [int] NOT NULL
)
CREATE TABLE [dbo].[UserTags](
[UserId] [int] NOT NULL,
[TagId] [int] NOT NULL
)
INSERT INTO Tasks VALUES (1,'Task for all SWEs');
INSERT INTO Tasks VALUES (2,'Task for USA SWEs');
INSERT INTO Tasks VALUES (3,'Task for all PMs');
INSERT INTO Tasks VALUES (4,'Task for Europe PMs');
INSERT INTO Users VALUES (1,'Europe SWE');
INSERT INTO Users VALUES (2,'USA SWE');
INSERT INTO Users VALUES (3,'Europe PM');
INSERT INTO Users VALUES (4,'USA PM');
INSERT INTO Tags VALUES (1,'swe');
INSERT INTO Tags VALUES (2,'pm');
INSERT INTO Tags VALUES (3,'usa');
INSERT INTO Tags VALUES (4,'europe');
INSERT INTO TaskTags VALUES (1,1);
INSERT INTO TaskTags VALUES (2,1);
INSERT INTO TaskTags VALUES (2,3);
INSERT INTO TaskTags VALUES (3,2);
INSERT INTO TaskTags VALUES (4,2);
INSERT INTO TaskTags VALUES (4,4);
INSERT INTO UserTags VALUES (1,1);
INSERT INTO UserTags VALUES (1,4);
INSERT INTO UserTags VALUES (2,1);
INSERT INTO UserTags VALUES (2,3);
INSERT INTO UserTags VALUES (3,2);
INSERT INTO UserTags VALUES (3,4);
INSERT INTO UserTags VALUES (4,2);
INSERT INTO UserTags VALUES (4,3);
I was able to figure out the inverse of this problem, when the Task T is given. E.g. given Task T, return all Users U where the tags of T are a subset of U. Here is that query:
WITH thisTaskTags AS (
SELECT DISTINCT TaskTags.TagId
FROM TaskTags
WHERE TaskTags.TaskId = @taskId
)
SELECT UserTags.UserId
FROM UserTags JOIN thisTaskTags
ON UserTags.TagId = thisTaskTags.TagId CROSS JOIN
(SELECT COUNT(*) AS keycnt FROM thisTaskTags) k
GROUP BY UserTags.UserId
HAVING COUNT(thisTaskTags.TagId) = MAX(k.keycnt)
When @taskId = 1, UserIds 1 and 2 are returned, and when @taskId = 2, only UserId 2 is returned (correct behavior).
However when I tried to convert this to returning all tasks a given user should have, I ran into trouble. I tried this query:
WITH thisUserTags AS (
SELECT DISTINCT UserTags.TagId
FROM UserTags
WHERE UserTags.UserId = @userId
)
SELECT TaskTags.TaskId
FROM TaskTags JOIN thisUserTags
ON thisUserTags.TagId = TaskTags.TagId CROSS JOIN
(SELECT COUNT(*) AS keycnt FROM thisUserTags) k
GROUP BY TaskTags.TaskId
HAVING COUNT(thisUserTags.TagId) = MAX(k.keycnt);
However this only returns tasks where all the task tags match all the user tasks, e.g. if U had tags: [a,b,c] it would only get tasks with tags: [a,b,c] instead of [a], [b], [b,c], etc.
With concrete examples, if you set @userId = 1, no task IDs are returned, when the correct output would be getting 1 row, Task ID = 1. And when @userId = 2, only taskID 2 is returned, when both taskIDs 1 and 2 should be returned (i.e. if a task only has the "swe" tag, all "swe" users should get it, but if a task has both "swe" and "usa", only users who have both of those tags should get it).
I also tried this query:
SELECT DISTINCT Tasks.TaskId FROM Tasks
INNER JOIN TaskTags ON TaskTags.TaskId = Tasks.TaskId
WHERE TaskTags.TagId IN (SELECT TagId from UserTags where UserId = @userId)
GROUP BY Tasks.TaskId
But the issue with this is it returns any task that has any tag in common, so U with tags: [a,b,c] would get T with tags: [b,d] even though U doesn't have tag d.
Again with concrete examples, if @userId = 1, taskIDs 1,2, and 4 are returned, when only taskIds 1 and 2 should be returned (task ID 4 should only be assigned to users with both tags "europe" and "pm", here it is erroneously being assigned to a user with tags "europe" and "swe" due to the common "europe" tag).
Could someone shed some light here?
This is a classic Relational Division With Remainder question.
You just need to frame it right:
- You want all
Tasks
... - ... whose
TaskTags
divide the set of allUserTags
for a givenUser
- There can be a remainder of
UserTags
but not a remainder ofTaskTags
so the former is the dividend, the latter is the divisor.
A typical solution (there are many) is to left join the dividend to the divisor, group it up, then ensure that the number of matched dividends is the same as the number of divisors. In other words, all divisors have a match.
Since you only seem to want the Tasks
but not their TaskTags
, you can do all this in an EXISTS
subquery:
DECLARE @userId int = 1;
SELECT *
FROM Tasks t
WHERE EXISTS (SELECT 1
FROM TaskTags tt
LEFT JOIN UserTags ut ON ut.TagId = tt.TagId
AND ut.UserId = @userId
WHERE tt.TaskId = t.TaskId
HAVING COUNT(*) = COUNT(ut.UserId)
);
db<>fiddle
You're probably looking for something like the following...
declare @userId int = ...;
select Tasks.TaskId
from Tasks
where 0 = (
select count(1)
from (
select TagId from TaskTags where TaskTags.TaskId=Tasks.TaskId
except
select TagId from UserTags where UserTags.UserId=@userId
) TaskSpecificTags
);
It's not clear if you also want to return Tasks with 0 Tags so you may need to test that condition as well.