Mysql: create index on 1.4 billion records

I have a table with 1.4 billion records. The table structure is as follows:

CREATE TABLE text_page (
    text VARCHAR(255),
    page_id INT UNSIGNED
) ENGINE=MYISAM DEFAULT CHARSET=ascii

The requirement is to create an index over the column text.

The table size is about 34G.

I have tried to create the index by the following statement:

ALTER TABLE text_page ADD KEY ix_text (text)

After 10 hours' waiting I finally give up this approach.

Is there any workable solution on this problem?

UPDATE: the table is unlikely to be updated or inserted or deleted. The reason why to create index on the column text is because this kind of sql query would be frequently executed:

SELECT page_id FROM text_page WHERE text = ?

UPDATE: I have solved the problem by partitioning the table.

The table is partitioned into 40 pieces on column text. Then creating index on the table takes about 1 hours to complete.

It seems that MySQL index creation becomes very slow when the table size becomes very big. And partitioning reduces the table into smaller trunks.


Solution 1:

Could it be your system just is not up to the task? I do not use MySQL (SQL Server here), but I know the pain of indexing an 800 million entry table. Basically.... you need the right hardware for that (as in: lots of fast discs). I now use nearly a dozen Velociraptors and performance is great ;)

SQL Servers (not as MS SQL Server, but as database servers using SQL) live and die with disc access, and normal discs are just not up to the task of larger operations.

Solution 2:

You might want to create an index on first (for instance, 10) characters of the text field.

From the Docs:

Indexes can be created that use only the leading part of column values, using col_name(length) syntax to specify an index prefix length:

CREATE INDEX ix_text ON text_page (text(10))

Solution 3:

I have solved the problem by partitioning the table.

The table is partitioned into 40 pieces on column text. Then creating index on the table takes about 1 hours to complete.

It seems that MySQL index creation becomes very slow when the table size becomes very big. And partitioning reduces the table into smaller trunks.

Solution 4:

Set the sort_buffer_size to 4GB (or however much you can depending on how much memory you have).

Right now the create index is doing a sort but since you have a 32MB sort_buffer_size, it is basically thrashing the hard drive needlessly.