Maintenance of tables: do I need to REINDEX a table after truncating and repopulating?
Solution 1:
No, you don't generally need to reindex after TRUNCATE
- and if you do, you're much better off dropping the indexes, loading data, then re-creating the indexes at the end.
It's somewhat similar to this answer about cluster - Pg automatically drops the index during TRUNCATE
and then incrementally rebuilds it as you insert data, so there's no index bloat held over from before the TRUNCATE
.
You may get somewhat more compact and efficient indexes if you drop the indexes, truncate, insert the data, and recreate the indexes. They'll certainly be faster to build. The difference in index performance once built is unlikely to be enough to warrant the extra effort for most applications using just b-tree indexes, but the difference in the time required to populate tables can be well worth it. If you're using GiST or (especially) GIN it's really best to drop the index and re-create at the end.
If it's convenient to do so, drop the indexes and add them back at the end, just don't worry too much if this isn't practical for you.
For a regular b-tree in my test an incrementally created composite index was 3720kb vs a one-off created index of 2208kb. Build-time was 164ms (inserts) + 347ms (index) vs 742ms (inserts+index). This difference is significant, but not enough to be a huge concern unless you're doing large scale DW. A REINDEX
took a further 342ms after the inserts+index run. See
So, @TomTom is right (unsurprisingly) in that it can be worth dropping and re-creating indexes if it's convenient to do so, like if you're bulk-populating tables for OLAP work.
However, reindexing is likely to be the wrong answer since it means you do a whole bunch of expensive work to create an index you then throw away. Drop the index and re-create it instead of reindexing.
Demo session:
regress=# -- Create, populate, then create indexes:
regress=# CREATE TABLE demo (someint integer, sometext text);
CREATE TABLE
regress=# \timing on
regress=# INSERT INTO demo (someint, sometext)
SELECT x, (x%100)::text
FROM generate_series(1,100000) x;
INSERT 0 100000
Time: 164.678 ms
regress=# CREATE INDEX composite_idx ON demo(sometext, someint);
CREATE INDEX
Time: 347.958 ms
regress=# SELECT pg_size_pretty(pg_indexes_size('demo'::regclass));
pg_size_pretty
----------------
2208 kB
(1 row)
regress=# -- Total time: 347.958+164.678=512.636ms, index size 2208kB
regress=# -- Now, with truncate and insert:
regress=# TRUNCATE TABLE demo;
TRUNCATE TABLE
regress=# INSERT INTO demo (someint, sometext)
SELECT x, (x%100)::text
FROM generate_series(1,100000) x;
INSERT 0 100000
Time: 742.813 ms
regress=# SELECT pg_size_pretty(pg_indexes_size('demo'::regclass));
pg_size_pretty
----------------
3720 kB
(1 row)
regress=# -- Total time 742ms, index size 3720kB
regress=# -- Difference: about 44% time increase, about 68% index size increase.
regress=# -- Big-ish, but whether you care depends on your application. Now:
regress=# REINDEX INDEX composite_idx ;
REINDEX
Time: 342.283 ms
regress=# SELECT pg_size_pretty(pg_indexes_size('demo'::regclass));
pg_size_pretty
----------------
2208 kB
(1 row)
regress=# -- Index is back to same size, but total time for insert with progressive
regress=# -- index build plus reindex at the end us up to 1084.283, twice as long as
regress=# -- dropping the indexes, inserting the data, and re-creating the indexes took.
So:
For OLAP, drop indexes, insert, re-create indexes.
For OLTP you'll probably just want to stick with progressive index builds. Consider a non-100% fillfactor on indexes to reduce insert costs.
Avoid inserting with progressive index builds then re-indexing, it's the worst of both worlds.
Of course, the sizes used in this test are toy table sizes, so you should repeat this testing on a sample of your real world data and indexes to get a solid idea of how much difference it makes for you. I repeated these tests with a scale factor 100 greater than the above and consistently found the index was almost exactly twice the size if build incrementally, though the relative build time difference actually fell for this particular test.
So: Test with your data and schema.