SQL Server insert if not exists best practice [closed]

I have a Competitions results table which holds team member's names and their ranking on one hand.

On the other hand I need to maintain a table of unique competitors names:

CREATE TABLE Competitors (cName nvarchar(64) primary key)

Now I have some 200,000 results in the 1st table and when the competitors table is empty I can perform this:

INSERT INTO Competitors SELECT DISTINCT Name FROM CompResults

And the query only takes some 5 seconds to insert about 11,000 names.

So far this is not a critical application so I can consider truncate the Competitors table once a month, when I receive the new competition results with some 10,000 rows.

But what is the best practice when new results are added, with new AND existing competitors? I don't want to truncate existing competitors table

I need to perform INSERT statement for new competitors only and do nothing if they exists.

Semantically you are asking "insert Competitors where doesn't already exist":

INSERT Competitors (cName)
SELECT DISTINCT Name
FROM CompResults cr
WHERE
   NOT EXISTS (SELECT * FROM Competitors c
              WHERE cr.Name = c.cName)

Another option is to left join your Results table with your existing competitors Table and find the new competitors by filtering the distinct records that don´t match int the join:

INSERT Competitors (cName)
SELECT  DISTINCT cr.Name
FROM    CompResults cr left join
        Competitors c on cr.Name = c.cName
where   c.cName is null

New syntax MERGE also offer a compact, elegant and efficient way to do that:

MERGE INTO Competitors AS Target
USING (SELECT DISTINCT Name FROM CompResults) AS Source ON Target.Name = Source.Name
WHEN NOT MATCHED THEN
    INSERT (Name) VALUES (Source.Name);

Don't know why anyone else hasn't said this yet;

NORMALISE.

You've got a table that models competitions? Competitions are made up of Competitors? You need a distinct list of Competitors in one or more Competitions......

You should have the following tables.....

CREATE TABLE Competitor (
    [CompetitorID] INT IDENTITY(1,1) PRIMARY KEY
    , [CompetitorName] NVARCHAR(255)
    )

CREATE TABLE Competition (
    [CompetitionID] INT IDENTITY(1,1) PRIMARY KEY
    , [CompetitionName] NVARCHAR(255)
    )

CREATE TABLE CompetitionCompetitors (
    [CompetitionID] INT
    , [CompetitorID] INT
    , [Score] INT

    , PRIMARY KEY (
        [CompetitionID]
        , [CompetitorID]
        )
    )

With Constraints on CompetitionCompetitors.CompetitionID and CompetitorID pointing at the other tables.

With this kind of table structure -- your keys are all simple INTS -- there doesn't seem to be a good NATURAL KEY that would fit the model so I think a SURROGATE KEY is a good fit here.

So if you had this then to get the the distinct list of competitors in a particular competition you can issue a query like this:

DECLARE @CompetitionName VARCHAR(50) SET @CompetitionName = 'London Marathon'

    SELECT
        p.[CompetitorName] AS [CompetitorName]
    FROM
        Competitor AS p
    WHERE
        EXISTS (
            SELECT 1
            FROM
                CompetitionCompetitor AS cc
                JOIN Competition AS c ON c.[ID] = cc.[CompetitionID]
            WHERE
                cc.[CompetitorID] = p.[CompetitorID]
                AND cc.[CompetitionName] = @CompetitionNAme
        )

And if you wanted the score for each competition a competitor is in:

SELECT
    p.[CompetitorName]
    , c.[CompetitionName]
    , cc.[Score]
FROM
    Competitor AS p
    JOIN CompetitionCompetitor AS cc ON cc.[CompetitorID] = p.[CompetitorID]
    JOIN Competition AS c ON c.[ID] = cc.[CompetitionID]

And when you have a new competition with new competitors then you simply check which ones already exist in the Competitors table. If they already exist then you don't insert into Competitor for those Competitors and do insert for the new ones.

Then you insert the new Competition in Competition and finally you just make all the links in CompetitionCompetitors.

You will need to join the tables together and get a list of unique competitors that don't already exist in Competitors.

This will insert unique records.

INSERT Competitors (cName) 
SELECT DISTINCT Name
FROM CompResults cr LEFT JOIN Competitors c ON cr.Name = c.cName
WHERE c.Name IS NULL

There may come a time when this insert needs to be done quickly without being able to wait for the selection of unique names. In that case, you could insert the unique names into a temporary table, and then use that temporary table to insert into your real table. This works well because all the processing happens at the time you are inserting into a temporary table, so it doesn't affect your real table. Then when you have all the processing finished, you do a quick insert into the real table. I might even wrap the last part, where you insert into the real table, inside a transaction.

The answers above which talk about normalizing are great! But what if you find yourself in a position like me where you're not allowed to touch the database schema or structure as it stands? Eg, the DBA's are 'gods' and all suggested revisions go to /dev/null?

In that respect, I feel like this has been answered with this Stack Overflow posting too in regards to all the users above giving code samples.

I'm reposting the code from INSERT VALUES WHERE NOT EXISTS which helped me the most since I can't alter any underlying database tables:

INSERT INTO #table1 (Id, guidd, TimeAdded, ExtraData)
SELECT Id, guidd, TimeAdded, ExtraData
FROM #table2
WHERE NOT EXISTS (Select Id, guidd From #table1 WHERE #table1.id = #table2.id)
-----------------------------------
MERGE #table1 as [Target]
USING  (select Id, guidd, TimeAdded, ExtraData from #table2) as [Source]
(id, guidd, TimeAdded, ExtraData)
    on [Target].id =[Source].id
WHEN NOT MATCHED THEN
    INSERT (id, guidd, TimeAdded, ExtraData)
    VALUES ([Source].id, [Source].guidd, [Source].TimeAdded, [Source].ExtraData);
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT id, guidd, TimeAdded, ExtraData from #table2
EXCEPT
SELECT id, guidd, TimeAdded, ExtraData from #table1
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT #table2.id, #table2.guidd, #table2.TimeAdded, #table2.ExtraData
FROM #table2
LEFT JOIN #table1 on #table1.id = #table2.id
WHERE #table1.id is null

The above code uses different fields than what you have, but you get the general gist with the various techniques.

Note that as per the original answer on Stack Overflow, this code was copied from here.

Anyway my point is "best practice" often comes down to what you can and can't do as well as theory.

If you're able to normalize and generate indexes/keys -- great!
If not and you have the resort to code hacks like me, hopefully the above helps.

Good luck!

SQL Server insert if not exists best practice [closed]

Related

Recent Posts