connect users to each other as 'aliases' in an SQL database without duplicating data-entries
Need a little advice. I got a project to store users in a database and to identify when more two or more users are actually the same person. So if I have Bob, Joe, Tom, Sam, Fred, and Lucy registered, and later on discover (either through admission or other means) that Joe, Tom, and Lucy are actually the same person, I'd like to record that relationship in such a way that if I search for any one of them the other names or 'aliases' also appear, regardless of the connections.
It could be known at different times that Tom = Lucy, and later Joe = Lucy, or there is a linear progression of Joe -> Tom -> Lucy but if I search for Lucy, both Tom and Joe should appear as connections or aliases and vice-versa.
Just looking for ideas on user table setup that would allow this to be possible with a simple search query without needing to build the connections with an extra 'aliases' field that is filled out by hand (ie putting every combo of connected users into a string field as a bad example). Sub table relations is where I am thinking at the moment. Not sure how to go about it in the optimal way without generating duplicated manual data entry.
Pseudo SQL is fine. Mostly trying to generate ideas on how best to accomplish this. Thanks for your input in advance!
This is essentially a connected components problem. Think of the data as an undirected graph where vertices are people and edges denote alias. We want to find connected components and update them whenever a new edge (alias) is added.
We can have a table of names with an additional alias
column whose value denote the component in which the corresponding row belongs to. Initially each person will be in their own component (alias
will be unique initially). When an alias between say person A
and B
is to be established, we update the alias
column of all persons in component that belongs to B
(or A
) with the value of alias
column of A
(or B
). This will merge the components.
CREATE TABLE persons (
id INTEGER PRIMARY KEY,
name varchar(100),
alias INTEGER NOT NULL
)
INSERT INTO persons (id, name, alias)
VALUES (1, 'Bob', 1), (2, 'Joe', 2), (3, 'Tom', 3),
(4, 'Sam', 4), (5, 'Fred', 5), (6, 'Lucy', 6)
-- Tom = Lucy
UPDATE persons
SET alias = (SELECT alias FROM persons WHERE name = 'Tom')
WHERE alias = (SELECT alias FROM persons WHERE name = 'Lucy')
-- Joe = Lucy
UPDATE persons
SET alias = (SELECT alias FROM persons WHERE name = 'Joe')
WHERE alias = (SELECT alias FROM persons WHERE name = 'Lucy')
-- linear progression of Tom = Lucy = Joe
-- Run the above queries one after another
-- Search for all alias of Tom
SELECT *
FROM persons WHERE alias = (SELECT alias FROM persons WHERE name = 'Tom')
Here is a SQL Fiddle with working example of queries.