How to fix hanging Server Admin, log shows servermgrd error in getAndLockContext: flock(servermgr_info) FATAL time out

On a 10.7.4 Lion Server installation the Server Admin.app fails to start correctly. The window is drawn and the last known server and its services are displayed, but a spinning wheel keeps spinning on the bottom status bar.

Console.app log shows:

Jun 24 18:19:56 mac01 servermgrd[437]: [437] error in getAndLockContext: flock(servermgr_info) FATAL time out
Jun 24 18:19:56 mac01 servermgrd[437]: [437] process will force-quit to avoid deadlock
Jun 24 18:19:56 mac01 servermgrd[437]: outstanding requests are: (
            {
            Command = getHistory;
            Module = "servermgr_info";
            Timestamp = "2012-06-24 16:17:51 +0000";
        },
            {
            Command = getState;
            Module = "servermgr_info";
            Timestamp = "2012-06-24 16:17:51 +0000";
        },
            {
            Command = Idle;
            Module = "servermgr_info";
            Timestamp = "2012-06-24 16:18:50 +0000";
        }
    )
Jun 24 18:19:56 mac01 com.apple.launchd[1] (com.apple.servermgrd[437]): Exited with code: 1

There are 3 services enabled: dns, firewall, mail. DNS plugin gave an error message which cannot be reproduced.

Other errors in Console.app are:

Jun 28 12:49:58 mac01 ServerBackup[32087]: Error in calling backup command for service postgresql, error :=69
Jun 28 12:51:24 mac01 mds[99]: (Error) Volume: Could not find requested backup type:2 for volume
Jun 28 12:53:01 mac01 mds[99]: (Error) Server: ==== XPC handleXPCMessage XPC_ERROR_CONNECTION_INVALID

(Regression) Other sources suggested to try:

review DNS settings, no problem found there
check the hostname with sudo changeip -checkhostname, no problem found there
destroy the Open Directory installation, OD is not used here
reboot, that did not solve the problem
replace /Library/Preferences/com.apple.servermgrd.plist with a known good copy, that did not help
replace the Server Admin.app itself with a known good copy, also did not improve
issuing a sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist;sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist, as in my answer below, which only partly solves the issue. Server Admin started correctly once, but on second and consecutive launches the spinning wheel keeps spinning again and log shows the deadlock situations again.
untick 'require valid digital signature (SSL)' in server admin prefs just incase you haven't trusted your server cert, no change.
check if it is a user issue, by creating a new admin user and try starting server admin from that login, errors out (kReceivedUnknownError says Server Admin, process will force-quit to avoid deadlock is in the log, but no outstanding requests are in the log)
Changed the connecting server address from 'name.local' to '127.0.0.1', no change.
sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist;sudo rm -r /var/servermgrd;sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist, no change
Server.app > hardware > hostname > settings > uncheck dedicate resources to server services, reboot, check again, reboot, no change
$ diskutil repairPermissions /;sudo reboot, no change
corrected the rDNS entry to match the machine name, so that Server.app no longer displays an alert that the network has changed on every boot.

Other repeating log file entries are:

29-06-12 02:58:01,943 configd: network configuration changed.
29-06-12 02:58:33,575 configd: network configuration changed.
29-06-12 02:58:33,597 configd: network configuration changed.
29-06-12 02:58:34,266 servermgrd: servermgr_ipfilter:ipfw config:Notice:Flushed IPv4 rules
29-06-12 02:58:34,578 servermgrd: servermgr_ipfilter:ipfw config:Notice:Flushed IPv6 rules
29-06-12 02:59:50,707 servermgrd: servermgr_filebrowser:Error:servermgr_filebrowser: Error getting quotas for volume /Volumes/H of type exfat

After hours of trying to isolate the problem, I can reproduce that the flock(servermgr_info) FATAL time out is related to the local DNS service. When DNS service is started the Server Admin.app generates the flock(servermgr_info) FATAL time out, when DNS is then stopped (and $ sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist;sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist to be sure that there is a clean starting point) next time Server Admin.app behaves OK and no flock(servermgr_info) FATAL time out is logged.

Anonymised serveradmin DNS configuration extracted using $ sudo serveradmin settings dns) is here:

dns:acls:_array_index:0:name = "com.apple.ServerAdmin.DNS.public"
dns:acls:_array_index:0:addressMatchList:_array_index:0 = "localhost"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:name = "xyz.nl"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:ipAddresses:_array_index:0 = "1.1.1.237"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:ipAddresses:_array_index:1 = "2.2.1.247"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:ipAddresses:_array_index:2 = "1.2.1.59"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:ipAddresses:_array_index:3 = "2.1.1.23"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:primaryZones = _empty_array
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:allowRecursion = "com.apple.ServerAdmin.DNS.public"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:reverseZones = _empty_array
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:name = "com.apple.ServerAdmin.DNS.public"
dns:isBonjourClientBrowsingEnabled = no

Irony is that localhost is not configured as DNS resolver:

$ networksetup -getdnsservers Ethernet
91.196.170.5
8.8.8.8
8.8.4.4

When DNS service is stopped, Server Admin sometimes still keeps spinning. Having a look at Servername > Network > shows that in the pane with "Network Interfaces" the DNS name is empty.

nslookup -type=ptr z.y.x.91.in-addr.arpa returns the correct value. A dig PTR z.y.x.91.in-addr.arpa for the PTR is ok, and a dig PTR z.y.x.91.in-addr.arpa +trace stops after recursion number 4:

.           77760   IN  NS  m.root-servers.net.
...<cut>...
.           77760   IN  NS  l.root-servers.net.
;; Received 244 bytes from 91.196.170.5#53(91.196.170.5) in 5 ms

in-addr.arpa.       172800  IN  NS  a.in-addr-servers.arpa.
...<cut>...
in-addr.arpa.       172800  IN  NS  f.in-addr-servers.arpa.
;; Received 432 bytes from 192.203.230.10#53(e.root-servers.net) in 7 ms

91.in-addr.arpa.    86400   IN  NS  sec3.apnic.net.
...<cut>...
91.in-addr.arpa.    86400   IN  NS  sns-pb.isc.org.
;; Received 200 bytes from 193.0.9.1#53(f.in-addr-servers.arpa) in 2 ms

170.196.91.in-addr.arpa. 172800 IN  NS  ns2.technotop.nl.
170.196.91.in-addr.arpa. 172800 IN  NS  ns3.technotop.nl.
170.196.91.in-addr.arpa. 172800 IN  NS  ns1.technotop.nl.
;; Received 110 bytes from 193.0.9.5#53(pri.authdns.ripe.net) in 1 ms

;; connection timed out; no servers could be reached

Now there is also a new message logged:

Jun 29 14:49:10 mac01 servermgrd[1039]: Still servicing 0:0 requests after 300 seconds, with 1 sessions outstanding, resetting idleTimer

Update #1 It looks like the issue might be a rDNS (reverse DNS) PTR configuration error. A dig +trace from a 10.5 machine at a different location results in couldn't get address for 'ns3.technotop.nl': not found. The network provider has been asked to fix this flaky (and faulty) DNS configuration.

Update #2 Another suspect is the built-in firewall. When the firewall is stopped there are no hangs. When the firewall is started, the spinning wheel never stops.

Update #3 The provider fixed ns3.technotop.nl by assigning a DNS A record with IP address 91.196.170.72 at August 23th. Result: still a spinning wheel at the bottom right part of Mac OS Server Admin, and after a few minutes

"The service has encountered an error. Try to refresh the view (127.0.0.1/Server). (kReceivedUnkownError).

Followed by a:

"The service has encountered an error. Try to refresh the view (127.0.0.1/Certificates). (kReceivedUnkownError).

Update #4 The next issue is a CentOS version 5.4 software firewall at the servers where both ns1.technotop.nl and ns2.technotop.nl are running on, that kicked in on Mac OS X 10.7.4 Lion + Server + Admin Tools DNS traffic patterns:

123.45.67.89 # lfd: (PERMBLOCK) 123.45.67.89 has had more than 4 temp blocks in the last 86400 secs - Sun Jun 17 21:20:08 2012

After clearing this block in the CentOS firewall, the firewall block is back within 10 hours for each IP address that is running 10.7.4 and is having a reverse DNS name that is authoritatively served by the CentOS 5.4 machines.

Let's see what a whitelist on the CentOS firewall will do for machines running Mac OS 10.7.4.

Question

How to fix this DNS service related servermgrd flock(servermgr_info) error (for instance by factory default resetting dns service)?

I had the same issue. As suggested here, I tracked the problem down to a bad NS record (typo) for the reverse zone.

changeip -checkhostname reported no issues manually checking the forward/reverse all looked good.

But when I specifically checked the NS record for the reverse zone, noticed the typo.

How to fix hanging Server Admin, log shows servermgrd error in getAndLockContext: flock(servermgr_info) FATAL time out

Related

Recent Posts