postgresql track counts and autovacuum is not working

In the startup log entries indicate the autovacuum not working. I query the pg_stat_user_tables table and the last_vacuum and last_autovacuum columns are empty in spite of the vacuum query I ran just before. Connecting pgadmin to the database gives the indication that the vacuum is not working.

I am using postgresql on two Ubuntu Azure VM's. One VM is set up to be the master, the second is the replicated database by means of streaming. Roughly discribed in https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-on-postgresql-on-an-ubuntu-12-04-vps.

All seems so work except for the autovacuum. During the startup the following error is logged:

LOG:  test message did not get through on socket for statistics collector
LOG:  disabling statistics collector for lack of working socket
WARNING:  autovacuum not started because of misconfiguration
HINT:  Enable the "track_counts" option.
LOG:  database system was shut down at 2017-01-19 14:07:13 UTC
DEBUG:  checkpoint record is at 38/F6000028

In the postgresql.config I use the following settings:

track_counts = on  
autovacuum = on
log_autovacuum_min_duration = 200 
autovacuum_max_workers = 1  
autovacuum_naptime =960
autovacuum_vacuum_threshold = 128 
autovacuum_analyze_threshold = 256

A query (select * from pg_stat_user_tables) on the database to find the last (auto)vacuum gives empty colums for the last (auto)vacuum in stead of an datetime. Were just before I ran the VACUUM FULL VERBOSE; and this gave me vacuum results.

If I query for the vacuum settings with:

select *
from pg_settings 
where name like 'autovacuum%'

This is the result:

"autovacuum";"on"<br />
"autovacuum_analyze_scale_factor";"0.1"
"autovacuum_analyze_threshold";"256"
"autovacuum_freeze_max_age";"200000000"
"autovacuum_max_workers";"1"<br />
"autovacuum_multixact_freeze_max_age";"400000000"
"autovacuum_naptime";"960"<br />
"autovacuum_vacuum_cost_delay";"20"
"autovacuum_vacuum_cost_limit";"-1"
"autovacuum_vacuum_scale_factor";"0.2"
"autovacuum_vacuum_threshold";"128"
"autovacuum_work_mem";"-1"

These are the 'track_' results:

"track_activities";"on"
"track_activity_query_size";"1024"
"track_commit_timestamp";"off"
"track_counts";"off"
"track_functions";"none"
"track_io_timing";"off"

The pg_hba.conf (without the replication and network/user settings) looks like this:

local   all             all                                     trust
host    all             all             localhost               trust
host    all             all             10.1.1.5/32             md5
host    all             all             127.0.0.1/32            md5
host    all             all             0.0.0.0 0.0.0.0         md5

the /etc/hosts:

127.0.0.1       localhost
127.0.1.1       ubuntu
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

This is the result of 'netstat -ant|grep 5432' It if cleaned up and formatted.

User@Machine:/datadrive/log/postgresql/pg_log$ netstat -ant|grep 5432
tcp        0      0 0.0.0.0:5432            0.0.0.0:*               LISTEN
tcp       39      0 InternIpMaster:5432           InternIpSlave:36338          ESTABLISHED
tcp        0      0 InternIpMaster:5432           IpJob:63814     TIME_WAIT
tcp        0      0 InternIpMaster:5432           IpJob:22192      TIME_WAIT
tcp        0      0 InternIpMaster:5432           IpJob:47729      TIME_WAIT
tcp        0      0 InternIpMaster:5432           IpJob:55663      TIME_WAIT
tcp6       0      0 :::5432                 :::*                    LISTEN

I don't expect the autovacuum needed work yet because of the

So during startup the track_counts are disabled on runtime.

I have been looking for solutions changing the iptables. Without any iptable rules it will not work. I have connected to localhost as a host. I have changed the firewall settings in Azure. I opened the 5432 to access the vm from all ip's. I am able to access the database from other other systems. I have reset the conf to the default with only replication changes. I restarted the service many times.

What am i missing?

You want to fix this:

LOG: test message did not get through on socket for statistics collector
LOG: disabling statistics collector for lack of working socket

The stats collector expects UDP packets from localhost. Given than localhost looks fine in your /etc/hosts (specifically it does not resolve to IPv6) the next more plausible explanation is that there's a firewall filtering these packets.

Related: Problem in creating UDP sockets solved with: Found and resolved the problem in creating UDP sockets. It was because of the OS firewall (iptables) restricting in creating UDP sockets.

I want to elaborate on the answer @Daniel gave and the solution to my problem.

I had set up the iptables in order to get acces to postgresql like this:

sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
sudo iptables -A INPUT -i lo -j ACCEPT
sudo iptables -A OUTPUT -o lo -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 5432 -m state --state NEW,ESTABLISHED -j ACCEPT
sudo iptables -A INPUT -j DROP

I assumed this was plenty. However when i used sudo iptables --flush and restarted the postgres server the error disabling statistics collector for lack of working socket was gone.

I also used iptraf to investigate the traffic (sudo apt-get install iptraf sudo iptraf). I noticed that a traffic originated on the ip local (subnet) address of the server but on different ports. This is the traffic on the slave machine (without the azure traffic).

SubnetIpSlave:22
SubnetIpSlave:45622
SubnetIpSlave:44770
SubnetIpSlave:48948
SubnetIpMaster:5432

I assume that this traffic is blocked by the iptables for it is not going via the loopback. Therefore I cleaned the iptables. This is the result:

sudo iptables -A INPUT -i lo -j ACCEPT
sudo iptables -A OUTPUT -o lo -j ACCEPT
sudo iptables -A INPUT -p icmp -j ACCEPT
sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 5432 -j ACCEPT
sudo iptables -A INPUT -s 10.1.1.0/24 -j ACCEPT
sudo iptables -A INPUT -j DROP

I included the subnet. I think this is what makes it work, for the SubnetIpSlave and SubnetIpMaster are in this range. I probably am allowed to remove the ESTABLISHED,RELATED rule.

The log looks like it should:

2017-01-24 09:19:38 UTC [1482-1] LOG:  database system was shut down in recovery at 2017-01-24 09:17:41 UTC
2017-01-24 09:19:38 UTC [1483-1] [unknown]@[unknown] LOG:  incomplete startup packet
2017-01-24 09:19:38 UTC [1482-2] LOG:  entering standby mode
2017-01-24 09:19:38 UTC [1482-3] DEBUG:  checkpoint record is at 5D/F2042CA8

I am happy ;)

postgresql track counts and autovacuum is not working

Related

Recent Posts