mysql fulltext search failure
This is what I tested.
mysql> select * from product;
+------------+---------+---------------+
| Id | Product | ProductIdType |
+------------+---------+---------------+
| B00005N5PF | one pen | ASIN |
| B000J5XS3C | | ASIN |
+------------+---------+---------------+
2 rows in set (0.00 sec)
mysql> select * from product p where match (p.Product) against ('pen' in boolean mode);
Empty set (0.00 sec)
mysql> select * from product p where match (p.Product) against ('one pen' in boolean mode);
Empty set (0.00 sec)
This is create statement.
CREATE TABLE product
(
Id VARCHAR(16),
Product VARCHAR(128),
ProductIdType VARCHAR(8),
PRIMARY KEY (Id),
FULLTEXT (Product)
) ENGINE=MyISAM;
Equal sign and 'LIKE' work fine. So Why?
-- drop table testproduct;
CREATE TABLE testproduct
(
Id VARCHAR(16),
prod_name TEXT,
ProductIdType VARCHAR(8),
PRIMARY KEY (Id),
FULLTEXT (prod_name)
) ENGINE=MyISAM;
insert into testproduct (id,prod_name,productidtype) values ('B00005N5PF','one pen and a good price for a pen','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('B570J5XS3C',null,'ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C00ZZ5N5PF','let us get rid of some noise','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D00LL5N5PA','four score and seven years ago our fore...','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('EEEZZ5N5PF','he has a harpoon','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C01ZZ5N5PF','and then we','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('B00ZZ5N5PF','he has a pen in his pocket not a banana','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C02ZZ5N5PF','went to the store','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C03ZZ5N5PF','and decided that we should buy some','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C04ZZ5N5PF','fruit cups or fruit or berries or pebbles','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C037Z5N5PF','then he and her she and it','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('C04K95N5PF','threw some daggers and a harpoon','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D88895N5PF','more noise and some of this','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D34595N5PF','this article about harpoons really drills into the throwing of harpoon or harpoons to those that deserve a harpoon','ASIN');
insert into testproduct (id,prod_name,productidtype) values ('D12395N5PF','and there we go','ASIN');
Full Text Search needs some variety to rid itself of repeat 'noise'. Testing with minimal data will yield poor results. Throw your whole collection at it for anything meaningful to come out. There are settings for minimum size of words even attempted to be searched as seen in some links below.
Stop Words
There are MySql Lists of Stop Words in various languages representing insignificant words skipped during the search process. That list is compiled into the server, but can be overriden as seen in this Manual Page and text:
To override the default stopword list, set the ft_stopword_file system variable. (See Section 5.1.4, “Server System Variables”.) The variable value should be the path name of the file containing the stopword list, or the empty string to disable stopword filtering. The server looks for the file in the data directory unless an absolute path name is given to specify a different directory. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes.
Some Sample Queries
-- select * from testproduct
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('score' IN BOOLEAN MODE);
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('harpoon' IN BOOLEAN MODE);
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('banana' IN BOOLEAN MODE);
SELECT * FROM testproduct WHERE MATCH(prod_name) AGAINST('years' IN BOOLEAN MODE);
to get multiple words matching:
SELECT id,prod_name, match( prod_name )
AGAINST ( '+harpoon +article' IN BOOLEAN MODE ) AS relevance
FROM testproduct
ORDER BY relevance DESC
Gives a real weight in relevance
column:
SELECT id,prod_name, match( prod_name )
AGAINST ( '+harpoon +article' IN NATURAL LANGUAGE MODE) AS relevance
FROM testproduct
ORDER BY relevance DESC
+------------+--------------------------------------------------------------------------------------------------------------------+--------------------+
| id | prod_name | relevance |
+------------+--------------------------------------------------------------------------------------------------------------------+--------------------+
| D34595N5PF | this article about harpoons really drills into the throwing of harpoon or harpoons to those that deserve a harpoon | 3.6207125186920166 |
| EEEZZ5N5PF | he has a harpoon | 1.2845110893249512 |
| C04K95N5PF | threw some daggers and a harpoon | 1.2559525966644287 |
|------------+--------------------------------------------------------------------------------------------------------------------+--------------------+
Lifted the multiple words section from here. Thanks spencer