When should you use full-text indexing?
We have a whole bunch of queries that "search" for clients, customers, etc. You can search by first name, email, etc. We're using LIKE statements in the following manner:
SELECT *
FROM customer
WHERE fname LIKE '%someName%'
Does full-text indexing help in the scenario? We're using SQL Server 2005.
Solution 1:
It will depend upon your DBMS. I believe that most systems will not take advantage of the full-text index unless you use the full-text functions. (e.g. MATCH/AGAINST in mySQL or FREETEXT/CONTAINS in MS SQL)
Here is two good articles on when, why, and how to use full-text indexing in SQL Server:
- How To Use SQL Server Full-Text Searching
- Solving Complex SQL Problems with Full-Text Indexing
Solution 2:
FTS can help in this scenario, the question is whether it is worth it or not.
To begin with, let's look at why LIKE
may not be the most effective search. When you use LIKE
, especially when you are searching with a %
at the beginning of your comparison, SQL Server needs to perform both a table scan of every single row and a byte by byte check of the column you are checking.
FTS has some better algorithms for matching data as does some better statistics on variations of names. Therefore FTS can provide better performance for matching Smith, Smythe, Smithers, etc when you look for Smith.
It is, however, a bit more complex to use FTS, as you'll need to master CONTAINS
vs FREETEXT
and the arcane format of the search. However, if you want to do a search where either FName or LName match, you can do that with one statement instead of an OR.
To determine if FTS is going to be effective, determine how much data you have. I use FTS on a database of several hundred million rows and that's a real benefit over searching with LIKE
, but I don't use it on every table.
If your table size is more reasonable, less than a few million, you can get similar speed by creating an index for each column that you're going to be searching on and SQL Server should perform an index scan rather than a table scan.
Solution 3:
According to my test scenario:
- SQL Server 2008
- 10.000.000 rows each with a string like "wordA wordB wordC..." (varies between 1 and 30 words)
- selecting count(*) with CONTAINS(column, "wordB")
- result size several hundred thousands
- catalog size approx 1.8GB
Full-text index was in range of 2s whereas like '% wordB %' was in range of 1-2 minutes.
But this counts only if you don't use any additional selection criteria! E.g. if I used some "like 'prefix%'" on a primary key column additionally, the performance was worse since the operation of going into the full-text index costs more than doing a string search in some fields (as long those are not too much).
So I would recommend full-text index only in cases where you have to do a "free string search" or use some of the special features of it...