How can I check if my DB needs more RAM?

How would you check if your postgresql DB instance needs more RAM memory to handle their current working data?


Solution 1:

If all you are on Linux, your total physical RAM should be larger than your database size on disk in order to minimize I/O. Eventually it the entire database will be in the OS read cache and I/O will be limited to committing changes to disk. I prefer to find the DB size by running "du -shc $PGDATA/base"--that method aggregates all databases into a single number. As long as you are larger than that, it should be fine.

Additionally, you can look at the cache hit rate of heap and index block fetches. These measure the rate of hits into PostgreSQL's shared buffers. The numbers can be a little misleading--even though it may have been a miss in the shared buffers cache, it may still be a hit in the OS read cache. Still, hits in shared buffers are still less expensive than hits in the OS read cache (which are, in turn, less expensive by a couple of orders of magnitude than having to go back to disk).

In order to look at the shared buffers hit rate, I use this query:

SELECT relname, heap_blks_read, heap_blks_hit,
    round(heap_blks_hit::numeric/(heap_blks_hit + heap_blks_read),3)
FROM pg_statio_user_tables
WHERE heap_blks_read > 0
ORDER BY 4
LIMIT 25;

This gives you the top 25 worst offenders where the buffer cache is missed for all tables where at least one block had to be fetched from "disk" (again, which could be either the OS read cache or actual disk I/O). You can increase the value in the WHERE clause or add another condition for heap_blks_hit to filter out seldom-used tables.

The same basic query can be used to check the per-table total index hit rate by globally replacing the string "heap" with "idx". Take a look at pg_statio_user_indexes to get a per-index breakdown.

A quick note about shared buffers: a good rule of thumb for this in Linux is to set the configuration parameter shared_buffers to 1/4 of RAM, but no more than 8GB. This is not a hard-and-fast rule, but rather a good starting point for tuning a server. If your database is only 4GB and you have a 32GB server, 8GB of shared buffers is actually overkill and you should be able to set this to 5 or 6 GB and still have room for future growth.

Solution 2:

I made this SQL to show the tables vs disk hits ratio:

-- perform a "select pg_stat_reset();" when you want to reset counter statistics
with 
all_tables as
(
SELECT  *
FROM    (
    SELECT  'all'::text as table_name, 
        sum( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) + coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk, 
        sum( (coalesce(heap_blks_hit,0)  + coalesce(idx_blks_hit,0)  + coalesce(toast_blks_hit,0)  + coalesce(tidx_blks_hit,0))  ) as from_cache    
    FROM    pg_statio_all_tables  --> change to pg_statio_USER_tables if you want to check only user tables (excluding postgres's own tables)
    ) a
WHERE   (from_disk + from_cache) > 0 -- discard tables without hits
),
tables as 
(
SELECT  *
FROM    (
    SELECT  relname as table_name, 
        ( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) + coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk, 
        ( (coalesce(heap_blks_hit,0)  + coalesce(idx_blks_hit,0)  + coalesce(toast_blks_hit,0)  + coalesce(tidx_blks_hit,0))  ) as from_cache    
    FROM    pg_statio_all_tables --> change to pg_statio_USER_tables if you want to check only user tables (excluding postgres's own tables)
    ) a
WHERE   (from_disk + from_cache) > 0 -- discard tables without hits
)
SELECT  table_name as "table name",
    from_disk as "disk hits",
    round((from_disk::numeric / (from_disk + from_cache)::numeric)*100.0,2) as "% disk hits",
    round((from_cache::numeric / (from_disk + from_cache)::numeric)*100.0,2) as "% cache hits",
    (from_disk + from_cache) as "total hits"
FROM    (SELECT * FROM all_tables UNION ALL SELECT * FROM tables) a
ORDER   BY (case when table_name = 'all' then 0 else 1 end), from_disk desc

enter image description here

Solution 3:

It also works, as said in Heroku doc:

SELECT
    'cache hit rate' AS name,
     sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio
FROM pg_statio_user_tables;