Many KILLED/ROLLBACK tasks running on SQL Server with 0% progress after 2 days

I created a stored procedure which sends an email and accidentally called the stored procedure within itself creating an endless loop. Within a few seconds of executing the stored procedure I realized what I had done and fixed the loop, but it had already created 517 processes. I killed all the SPID's but they are stuck in a KILLED/ROLLBACK state.

This code shows me the processes:

select session_id,handle.percent_complete, *
from sys.dm_exec_requests handle   
outer apply sys.fn_get_sql(handle.sql_handle) spname
where cast(handle.start_time as date) = '2022-01-10' 

spname.text is showing 'xp_sysmail_format_query' for all the SPID's. It's been two days, and all 517 processes have been stuck in this rollback state with 0% progress. We are still able to use all our business applications and execute queries, with the exception of EXEC msdb.dbo.sp_send_dbmail which, when starting even a test email, gets stuck executing and has to be cancelled. This is not good because any auto generated email warnings will not be sent, and all other sql email functions are blocked. I'm not sure what other jobs are being blocked at this time.

This is a huge problem and I cannot find a solution. I've read every post I can find about this. I've tried everything I can think of except restarting the SQL server. Some posts state that restarting the SQL server can fix this and some state not to restart it or that the tasks will just resume in the killed/rollback state when restarted. I tried killing the spids again with statusonly but that just informs me that they are in a rollback state with 0% complete.

Should I restart the server and will this fix anything? Is there another solution other than restoring the DB to a backup that is more than 2 days old and losing all the work the entire business has done in the last couple days?

Any assistance will be greatly appreciated.


Solution 1:

As robust as it is, sometimes (fortunately rarely) SQL Server leaves us no choice when killing a process to adopt the IT mantra of turning it off an on again when the rollback does not complete in a timely fashion.

This can be more prevalent when a transaction enlists external methods or functions, email is notorious for this inparticular.

As unwelcome as it is, it's often the least-expensive in terms of time and should be considered an option soon in the diagnosis process when the low-hanging fruit options have been exhausted.