SET versus SELECT when assigning variables?

What are the differences between the SET and SELECT statements when assigning variables in T-SQL?


Quote, which summarizes from this article:

  1. SET is the ANSI standard for variable assignment, SELECT is not.
  2. SET can only assign one variable at a time, SELECT can make multiple assignments at once.
  3. If assigning from a query, SET can only assign a scalar value. If the query returns multiple values/rows then SET will raise an error. SELECT will assign one of the values to the variable and hide the fact that multiple values were returned (so you'd likely never know why something was going wrong elsewhere - have fun troubleshooting that one)
  4. When assigning from a query if there is no value returned then SET will assign NULL, where SELECT will not make the assignment at all (so the variable will not be changed from its previous value)
  5. As far as speed differences - there are no direct differences between SET and SELECT. However SELECT's ability to make multiple assignments in one shot does give it a slight speed advantage over SET.

I believe SET is ANSI standard whereas the SELECT is not. Also note the different behavior of SET vs. SELECT in the example below when a value is not found.

declare @var varchar(20)
set @var = 'Joe'
set @var = (select name from master.sys.tables where name = 'qwerty')
select @var /* @var is now NULL */

set @var = 'Joe'
select @var = name from master.sys.tables where name = 'qwerty'
select @var /* @var is still equal to 'Joe' */

When writing queries, this difference should be kept in mind :

DECLARE @A INT = 2

SELECT  @A = TBL.A
FROM    ( SELECT 1 A ) TBL
WHERE   1 = 2

SELECT  @A
/* @A is 2*/

---------------------------------------------------------------

DECLARE @A INT = 2

SET @A = ( 
            SELECT  TBL.A
            FROM    ( SELECT 1 A) TBL
            WHERE   1 = 2
         )

SELECT  @A
/* @A is null*/

Aside from the one being ANSI and speed etc., there is a very important difference that always matters to me; more than ANSI and speed. The number of bugs I have fixed due to this important overlook is large. I look for this during code reviews all the time.

-- Arrange
create table Employee (EmployeeId int);
insert into dbo.Employee values (1);
insert into dbo.Employee values (2);
insert into dbo.Employee values (3);

-- Act
declare @employeeId int;
select @employeeId = e.EmployeeId from dbo.Employee e;

-- Assert
-- This will print 3, the last EmployeeId from the query (an arbitrary value)
-- Almost always, this is not what the developer was intending. 
print @employeeId; 

Almost always, that is not what the developer is intending. In the above, the query is straight forward but I have seen queries that are quite complex and figuring out whether it will return a single value or not, is not trivial. The query is often more complex than this and by chance it has been returning single value. During developer testing all is fine. But this is like a ticking bomb and will cause issues when the query returns multiple results. Why? Because it will simply assign the last value to the variable.

Now let's try the same thing with SET:

 -- Act
 set @employeeId = (select e.EmployeeId from dbo.Employee e);

You will receive an error:

Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

That is amazing and very important because why would you want to assign some trivial "last item in result" to the @employeeId. With select you will never get any error and you will spend minutes, hours debugging.

Perhaps, you are looking for a single Id and SET will force you to fix your query. Thus you may do something like:

-- Act
-- Notice the where clause
set @employeeId = (select e.EmployeeId from dbo.Employee e where e.EmployeeId = 1);
print @employeeId;

Cleanup

drop table Employee;

In conclusion, use:

  • SET: When you want to assign a single value to a variable and your variable is for a single value.
  • SELECT: When you want to assign multiple values to a variable. The variable may be a table, temp table or table variable etc.