Munin aggregate graphs are not working

I know this has been asked in several times on many forums before, but still I am struck with similar problem.

Individual graphs are working fine however, aggregate graphs are not. I don't even get an empty graph (graph without data).

All the machines are running on Ubuntu-12.04 m1.medium ec2 instance. Munin version is 1.4.6.

My munin.conf looks like...

[localhost.localdomain]
address 127.0.0.1
use_node_name yes

[.us-west-1.compute.internal]
address
use_node_name yes

[.us-west-1.compute.internal]
address
use_node_name yes

[.us-west-1.compute.internal]
address
use_node_name yes

[us-west-1.compute.internal;totalcheckpoints]
update no
contacts no

postgres_checkpoints_checkpoints_req.update no  
postgres_checkpoints_checkpoints_req.graph yes  
postgres_checkpoints_checkpoints_req.graph_args --base 1000 -l 0  
postgres_checkpoints_checkpoints_req.cdef 0  
postgres_checkpoints_checkpoints_req.graph_category PG Total Checkpoints  
postgres_checkpoints_checkpoints_req.graph_title Aggregated checkpoints  
postgres_checkpoints_checkpoints_req.graph_vlabel Total Checkpoints  
postgres_checkpoints_checkpoints_req.checkpoints_req_total.label Total checkpoints  
postgres_checkpoints_checkpoints_req.graph_order checkpoints_req_total  
postgres_checkpoints_checkpoints_req.checkpoints_req_total.sum \  
        <internal_ip>.us-west-1.compute.internal:postgres_checkpoints_<internal_ip>.us-west-1.compute.internal_checkpoints_req.checkpoints_req \  
        <internal_ip>.us-west-1.compute.internal:postgres_checkpoints_<internal_ip>.us-west-1.compute.internal_checkpoints_req.checkpoints_req \  
        <internal_ip>.us-west-1.compute.internal:postgres_checkpoints_<internal_ip>.us-west-1.compute.internal_checkpoints_req.checkpoints_req  

I have tried following symblinks in /etc/munin/plugins:

postgres_checkpoints -> /usr/share/munin/plugins/postgres_checkpoints
postgres_checkpoints_ -> /usr/share/munin/plugins/postgres_checkpoints
postgres_checkpoints__ -> /usr/share/munin/plugins/postgres_checkpoints

As munin user following munin commands are working fine and I don't see anything obviously wrong in the output:

sudo su - munin -s /bin/bash
/usr/share/munin/munin-update --debug --nofork
/usr/share/munin/munin-graph --debug --nofork --nolazy
/usr/share/munin/munin-html --debug

telnet returns correct info for plugin postgres_checkpoints:

munin@hostname:~$ telnet 4949
Trying ...
Connected to .
Escape character is '^]'.
# munin node at internal-ip-of-munin-node.us-west-1.compute.internal
config postgres_checkpoints
graph_title PostgreSQL checkpoints
graph_vlabel Checkpoints / minute
graph_category PostgreSQL
graph_info Number of checkpoints per minute
graph_args --base 1000
graph_period minute checkpoints_timed.label Timed checkpoints
checkpoints_timed.info Checkpoints started by timeout
checkpoints_timed.type DERIVE
checkpoints_timed.draw LINE1
checkpoints_req.label Requested
checkpoints
checkpoints_req.info Checkpoints started by request
checkpoints_req.type DERIVE
checkpoints_req.draw STACK
.
fetch postgres_checkpoints
checkpoints_timed.value 2860
checkpoints_req.value 37
.
quit

Logs on munin-master and munin-node do not indicate any obvious errors. Also have verified that everywhere all hostanames are correct fqdn.

Any ideas what am I missing?

I have checked many forums and links. However serverfault is not allowing me to paste more than two links I referred:
1. http://munin-monitoring.org/wiki/aggregate_examples
2. http://blog.loftninjas.org/2010/04/08/an-evening-with-munin-graph-aggregation/

Thanks for attention.


Solution 1:

Finally I got it working. Munin is not that bad, all you need is to spend a couple of nights with it.

I misunderstood the documentation, you need not to mention hostname. Plugin name should be exactly same as on munin nodes. Also the same plugin should exist on Munin-master with __.

So, in /etc/munin/plugins now symblinks looks like:

postgres_checkpoints__ -> /usr/share/munin/plugins/postgres_checkpoints

And here is the new configuration, note the plugin-name after ":" doesn't have hostname in it:

postgres_checkpoints_total.update no  
pg_checkpoints.label Graph label  
postgres_checkpoints_total.graph yes  
postgres_checkpoints_total.graph_args --base 1000 -l 0  
postgres_checkpoints_total.cdef 0  
postgres_checkpoints_total.graph_category PG Total Checkpoints  
postgres_checkpoints_total.graph_title Aggregated checkpoints  
postgres_checkpoints_total.graph_vlabel Total Checkpoints  
postgres_checkpoints_total.checkpoints_req_total.label Total Req checkpoints  
postgres_checkpoints_total.checkpoints_timed_total.label Total Timed checkpoints  
postgres_checkpoints_total.graph_order checkpoints_req_total checkpoints_timed  
postgres_checkpoints_total.checkpoints_req_total.sum \  
        <internal_ip>.us-est-1.compute.internal:postgres_checkpoints.checkpoints_req \  
        <internal_ip>.us-west-1.compute.internal:postgres_checkpoints.checkpoints_req \  
        <internal_ip>.us-west-1.compute.internal:postgres_checkpoints.checkpoints_req

postgres_checkpoints_total.checkpoints_timed_total.sum \  
        <internal_ip>.us-west-.compute.internal:postgres_checkpoints.checkpoints_timed \  
        <internal_ip>.us-west-1.compute.internal:postgres_checkpoints.checkpoints_timed \  
        <internal_ip>.us-west-1.compute.internal:postgres_checkpoints.checkpoints_timed

Also, please note that now in the above configuration I am aggregating 2 functions.