When is it better to write "ad hoc sql" vs stored procedures [duplicate]

SQL Server caches the execution plans for ad-hoc queries, so (discounting the time taken by the first call) the two approaches will be identical in terms of speed.

In general, the use of stored procedures means taking a portion of the code needed by your application (the T-SQL queries) and putting it in a place that is not under source control (it can be, but usually isn't) and where it can be altered by others without your knowledge.

Having the queries in a central place like this may be a good thing, depending upon how many different applications need access to the data they represent. I generally find it much easier to keep the queries used by an application resident in the application code itself.

In the mid-1990's, the conventional wisdom said that stored procedures in SQL Server were the way to go in performance-critical situations, and at the time they definitely were. The reasons behind this CW have not been valid for a long time, however.

Update: Also, frequently in debates over the viability of stored procedures, the need to prevent SQL injection is invoked in defense of procs. Surely, no one in their right mind thinks that assembling ad hoc queries through string concatenation is the correct thing to do (although this will only expose you to a SQL injection attack if you're concatenating user input). Obviously ad hoc queries should be parameterized, not only to prevent the monster-under-the-bed of a sql injection attack, but also just to make your life as a programmer generally easier (unless you enjoy having to figure out when to use single quotes around your values).

Update 2: I have done more research. Based on this MSDN white paper, it appears that the answer depends on what you mean by "ad-hoc" with your queries, exactly. For example, a simple query like this:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 5

... will have its execution plan cached. Moreover, because the query does not contain certain disqualifying elements (like nearly anything other than a simple SELECT from one table), SQL Server will actually "auto-parameterize" the query and replace the literal constant "5" with a parameter, and cache the execution plan for the parameterized version. This means that if you then execute this ad-hoc query:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 23

... it will be able to use the cached execution plan.

Unfortunately, the list of disqualifying query elements for auto-parameterization is long (for example, forget about using DISTINCT, TOP, UNION, GROUP BY, OR etc.), so you really cannot count on this for performance.

If you do have a "super complex" query that won't be auto-parameterized, like:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 5 OR ITEM_COUNT < 23

... it will still be cached by the exact text of the query, so if your application calls this query with the same literal "hard-coded" values repeatedly, each query after the first will re-use the cached execution plan (and thus be as fast as a stored proc).

If the literal values change (based on user actions, for example, like filtering or sorting viewed data), then the queries will not benefit from caching (except occasionally when they accidentally match a recent query exactly).

The way to benefit from caching with "ad-hoc" queries is to parameterize them. Creating a query on the fly in C# like this:

int itemCount = 5;
string query = "DELETE FROM tblSTUFF WHERE ITEM_COUNT > " + 
        itemCount.ToString();

is incorrect. The correct way (using ADO.Net) would be something like this:

using (SqlConnection conn = new SqlConnection(connStr))
{
    SqlCommand com = new SqlCommand(conn);
    com.CommandType = CommandType.Text;
    com.CommandText = 
        "DELETE FROM tblSTUFF WHERE ITEM_COUNT > @ITEM_COUNT";
    int itemCount = 5;
    com.Parameters.AddWithValue("@ITEM_COUNT", itemCount);
    com.Prepare();
    com.ExecuteNonQuery();
}

The query contains no literals and is already fully parameterized, so subsequent queries using the identical parameterized statement would use the cached plan (even if called with different parameter values). Note that the code here is virtually the same as the code you would use for calling a stored procedure anyway (the only difference being the CommandType and the CommandText), so it somewhat comes down to where you want the text of that query to "live" (in your application code or in a stored procedure).

Finally, if by "ad-hoc" queries you mean you're dynamically constructing queries with different columns, tables, filtering parameters and whatnot, like maybe these:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 5

SELECT ID, FIRSTNAME, LASTNAME FROM tblPEEPS 
    WHERE AGE >= 18 AND LASTNAME LIKE '%What the`

SELECT ID, FIRSTNAME, LASTNAME FROM tblPEEPS 
    WHERE AGE >= 18 AND LASTNAME LIKE '%What the`
    ORDER BY LASTNAME DESC

... then you pretty much can't do this with stored procedures (without the EXEC hack which is not to be spoken of in polite society), so the point is moot.

Update 3: Here is the only really good performance-related reason (that I can think of, anyway) for using a stored procedure. If your query is a long-running one where the process of compiling the execution plan takes significantly longer than the actual execution, and the query is only called infrequently (like a monthly report, for example), then putting it in a stored procedure might make SQL Server keep the compiled plan in the cache long enough for it to still be around next month. Beats me if that's true or not, though.

There's nothing about stored procedures that makes them magically speedier or more secure. There are cases where a well-designed stored proc can be faster for certain types of tasks, but the reverse is also true for ad hoc SQL.

Code the way you find most productive.

"Make it right before you make it faster." -- Brian Kernighan

If you are not writing stored procedures, investigate parameterized queries. If you build the SQL yourself including parameter concatenation, you're inviting a SQL injection attack.

There are a couple myths related to this topic that you should disabuse yourself of:

Myth 1: Stored procedures are pre-compiled
http://scarydba.wordpress.com/2009/09/30/pre-compiled-stored-procedures-fact-or-myth/

Myth 2: Ad Hoc SQL queries do not reuse execution plans: http://scarydba.wordpress.com/2009/10/05/ad-hoc-queries-dont-reuse-execution-plans-myth-or-fact/

IMHO procs have the edge when you absolutely need to lock down the database. In these situations, you can use an account that only has rights to execute stored procedures. Additionally, they can provide a layer of abstraction between your app and the database from the DBA perspective.

Likewise, dynamic SQL is better in situations where the query may need to change some and be... well... dynamic. Or if you know you have to port to multiple databases.

Both are just as safe in regards to SQL injection as long as all user inputted values are parameterized.

A lot of things have been said about performance, caching, and security in this thread already, and I won't repeat those points. There are a few things that I haven't read yet in this thread, which is portability issues and roundtrips.

If you are interested in maximum portability of your application across programming languages, then stored procedures is a good idea: the more program logic you store in the database outside the app, the less you have to recode if you're moving to another framework or language. In addition, the code to call a stored procedure is much smaller than the actual raw SQL itself, so the database interface in your application code will have a smaller footprint.
If you need the same logic in multiple applications, then stored procedures are a convenient to have a single definition of that logic that may be re-used by other applications. However, this benefit is often exaggerated as you could also isolate that logic in a library that you share across applications. Of course, if the applications are in different languages, then there is true benefit in stored procedures, as it is probably easier to call a procedure through your language's DB interface than to link to a library written in another language.
If you are interested in RDBMS portability, then stored procedures are likely to become one of your biggest problems. Core features of all major and minor RDBMSs are quite similar. The largest differences can be found in the syntax and available built-in functionality for stored procedures.

Regarding roundtrips:

If you have many multi-statement transactions, or in general, functions in your application that require multiple SQL statements, then performance can improve if you put those multiple statements inside a stored procedure. The reason is that calling the stored procedure (and possibly returning multiple results from it) is just a single roundtrip. With raw SQL, you have (at least) one roundtrip per SQL statement.

When is it better to write "ad hoc sql" vs stored procedures [duplicate]

Related

Recent Posts