Our SQL Server stored procedure "dies" - but how can I find the cause?

We have a SQL Server Stored Procedure which does a very long, painful INSERT INTO.. SELECT FROM... command.

It then writes a status message, via a second stored procedure, to say if it was successful or not.

BEGIN TRY

    EXEC [Add_To_Log_Table] 'Starting the INSERT command...'

    INSERT INTO [Very_Large_Table]
    SELECT /* About 30 fields */
    FROM /* Lots of tables */ 

    EXEC [Add_To_Log_Table] 'The INSERT was successful.'

END TRY

BEGIN CATCH
    EXEC [Add_To_Log_Table] 'The INSERT failed, ' + ERROR_MESSAGE()
END CATCH

When our data set is small, this all works beautifully. The [Very_Large_Table] gets populated, and our 'The INSERT was successful.' message gets written to our log table.

But sometimes, when we are dealing with very large data sets, the 'Starting the INSERT command...' message gets written to our log file, Activity Monitor shows that the INSERT is running, but after several minutes, the Stored Procedure just "dies".

The CATCH command simply doesn't kick in, so we have no way to detect or recover from this situation. Is there a way to find out what was the cause ?

I know that SQL Server Management Studio does have some logs (Management \ SQL Server Logs) but these don't show any errors occurring.

I was wondering if SQL Server somehow stores statuses/most recent error messages in one of its internal tables ?

We're running SQL Server 2008 R2, on Windows Server 2008 R2, there's plenty of free hard disc space on the drives used by our database, and the database recovery mode is set to SIMPLE.


Solution 1:

but after several minutes, the Stored Procedure just "dies".

The CATCH command simply doesn't kick in, so we have no way to detect or recover from this situation. Is there a way to find out what was the cause ?

If a command timeout occurs while the proc is being executed, the client API sends an attention to cancel the executing query. The CATCH block will not be executed in this case, but the timeout exception should be raised in the application code. Make sure the application is properly handling and logging these exceptions.

You could create a trace (Extended Events or SQL Trace) to capture attention events if you don't have access to the app code.

Solution 2:

This part looks suspicious..

when we are dealing with very large data sets, the 'Starting the INSERT command...' message gets written to our log file, Activity Monitor shows that the INSERT is running, but after several minutes, the Stored Procedure just "dies".

This can be due to select part taking a lot of time or insert is an overhead due to many indexes or many many factors..

My way of troubleshooting this problem would be below

1.run query again
2.Open a new query windows and run below command,

select status,blocking_session_id,wait_type,last_Wait_type
from
sys.dm_exec_requests where session_id=<<your query sessionid>>

The above query helps you in identyfing where is bottle neck..

Say for example,i ran below query..which inserts data into database and wait type shows me WRITELOG,which means if i commit 100 once,rather one at a time,i can help reduce this wait type..

create table test
(
id int identity(1,1) primary key
)

while 1=1
begin
insert into sometable
default values
end

Not all the times, you could see your session running,some times you will not see it in EXEC_REQUESTS DMV,I that case,you can use below

select * from sys.processes

You may get different Wait type,say like IO..in that case,you will have to troubleshoot based on that and try to add good indexes,so that your select will run faster ,it can also be a memory issue,since ,if you are low memory,sql will swap data back and forth and even a good query will suffer

so in summary ,know what the wait type is and troubleshoot based on that

Solution 3:

Most of the comments assume this process is really running and just needs to be debugged. I have personally experienced a couple instances where it did indeed just "die". Nothing in the error log. No return code. Debug messages would come out right up to the stored procedure call and then nothing and it would be over.

But if you called the stored procedure independently it would work fine. The phenomena would only occur in a more complex situation where a stored procedure called a trigger which called another stored procedure which called another one.

In one case modifying the length of the procedure handled the issue - the actual physical length of it. In trying things out it would die at a later point if I took some code out earlier.

I stumbled on this thread because I'm running into that scenario now and still need to find the reason or the workaround. I'll post when I find it for this instance.