How to fix hanging Server Admin, log shows servermgrd error in getAndLockContext: flock(servermgr_info) FATAL time out
On a 10.7.4 Lion Server installation the Server Admin.app fails to start correctly. The window is drawn and the last known server and its services are displayed, but a spinning wheel keeps spinning on the bottom status bar.
Console.app log shows:
Jun 24 18:19:56 mac01 servermgrd[437]: [437] error in getAndLockContext: flock(servermgr_info) FATAL time out
Jun 24 18:19:56 mac01 servermgrd[437]: [437] process will force-quit to avoid deadlock
Jun 24 18:19:56 mac01 servermgrd[437]: outstanding requests are: (
{
Command = getHistory;
Module = "servermgr_info";
Timestamp = "2012-06-24 16:17:51 +0000";
},
{
Command = getState;
Module = "servermgr_info";
Timestamp = "2012-06-24 16:17:51 +0000";
},
{
Command = Idle;
Module = "servermgr_info";
Timestamp = "2012-06-24 16:18:50 +0000";
}
)
Jun 24 18:19:56 mac01 com.apple.launchd[1] (com.apple.servermgrd[437]): Exited with code: 1
There are 3 services enabled: dns, firewall, mail. DNS plugin gave an error message which cannot be reproduced.
Other errors in Console.app are:
Jun 28 12:49:58 mac01 ServerBackup[32087]: Error in calling backup command for service postgresql, error :=69
Jun 28 12:51:24 mac01 mds[99]: (Error) Volume: Could not find requested backup type:2 for volume
Jun 28 12:53:01 mac01 mds[99]: (Error) Server: ==== XPC handleXPCMessage XPC_ERROR_CONNECTION_INVALID
(Regression) Other sources suggested to try:
- review DNS settings, no problem found there
- check the hostname with
sudo changeip -checkhostname
, no problem found there - destroy the Open Directory installation, OD is not used here
- reboot, that did not solve the problem
- replace
/Library/Preferences/com.apple.servermgrd.plist
with a known good copy, that did not help - replace the Server Admin.app itself with a known good copy, also did not improve
- issuing a
sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist;sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist
, as in my answer below, which only partly solves the issue. Server Admin started correctly once, but on second and consecutive launches the spinning wheel keeps spinning again and log shows the deadlock situations again. - untick 'require valid digital signature (SSL)' in server admin prefs just incase you haven't trusted your server cert, no change.
- check if it is a user issue, by creating a new admin user and try starting server admin from that login, errors out (kReceivedUnknownError says Server Admin, process will force-quit to avoid deadlock is in the log, but no outstanding requests are in the log)
- Changed the connecting server address from 'name.local' to '127.0.0.1', no change.
-
sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist;sudo rm -r /var/servermgrd;sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist
, no change - Server.app > hardware > hostname > settings > uncheck dedicate resources to server services, reboot, check again, reboot, no change
-
$ diskutil repairPermissions /;sudo reboot
, no change - corrected the rDNS entry to match the machine name, so that Server.app no longer displays an alert that the network has changed on every boot.
Other repeating log file entries are:
29-06-12 02:58:01,943 configd: network configuration changed.
29-06-12 02:58:33,575 configd: network configuration changed.
29-06-12 02:58:33,597 configd: network configuration changed.
29-06-12 02:58:34,266 servermgrd: servermgr_ipfilter:ipfw config:Notice:Flushed IPv4 rules
29-06-12 02:58:34,578 servermgrd: servermgr_ipfilter:ipfw config:Notice:Flushed IPv6 rules
29-06-12 02:59:50,707 servermgrd: servermgr_filebrowser:Error:servermgr_filebrowser: Error getting quotas for volume /Volumes/H of type exfat
After hours of trying to isolate the problem, I can reproduce that the flock(servermgr_info) FATAL time out is related to the local DNS service. When DNS service is started the Server Admin.app generates the flock(servermgr_info) FATAL time out, when DNS is then stopped (and $ sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist;sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.servermgrd.plist
to be sure that there is a clean starting point) next time Server Admin.app behaves OK and no flock(servermgr_info) FATAL time out is logged.
Anonymised serveradmin DNS configuration extracted using $ sudo serveradmin settings dns
) is here:
dns:acls:_array_index:0:name = "com.apple.ServerAdmin.DNS.public"
dns:acls:_array_index:0:addressMatchList:_array_index:0 = "localhost"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:name = "xyz.nl"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:ipAddresses:_array_index:0 = "1.1.1.237"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:ipAddresses:_array_index:1 = "2.2.1.247"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:ipAddresses:_array_index:2 = "1.2.1.59"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:secondaryZones:_array_id:xyz.nl:ipAddresses:_array_index:3 = "2.1.1.23"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:primaryZones = _empty_array
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:allowRecursion = "com.apple.ServerAdmin.DNS.public"
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:reverseZones = _empty_array
dns:views:_array_id:com.apple.ServerAdmin.DNS.public:name = "com.apple.ServerAdmin.DNS.public"
dns:isBonjourClientBrowsingEnabled = no
Irony is that localhost is not configured as DNS resolver:
$ networksetup -getdnsservers Ethernet
91.196.170.5
8.8.8.8
8.8.4.4
When DNS service is stopped, Server Admin sometimes still keeps spinning. Having a look at Servername > Network > shows that in the pane with "Network Interfaces" the DNS name is empty.
nslookup -type=ptr z.y.x.91.in-addr.arpa
returns the correct value. A dig PTR z.y.x.91.in-addr.arpa
for the PTR is ok, and a dig PTR z.y.x.91.in-addr.arpa +trace
stops after recursion number 4:
. 77760 IN NS m.root-servers.net.
...<cut>...
. 77760 IN NS l.root-servers.net.
;; Received 244 bytes from 91.196.170.5#53(91.196.170.5) in 5 ms
in-addr.arpa. 172800 IN NS a.in-addr-servers.arpa.
...<cut>...
in-addr.arpa. 172800 IN NS f.in-addr-servers.arpa.
;; Received 432 bytes from 192.203.230.10#53(e.root-servers.net) in 7 ms
91.in-addr.arpa. 86400 IN NS sec3.apnic.net.
...<cut>...
91.in-addr.arpa. 86400 IN NS sns-pb.isc.org.
;; Received 200 bytes from 193.0.9.1#53(f.in-addr-servers.arpa) in 2 ms
170.196.91.in-addr.arpa. 172800 IN NS ns2.technotop.nl.
170.196.91.in-addr.arpa. 172800 IN NS ns3.technotop.nl.
170.196.91.in-addr.arpa. 172800 IN NS ns1.technotop.nl.
;; Received 110 bytes from 193.0.9.5#53(pri.authdns.ripe.net) in 1 ms
;; connection timed out; no servers could be reached
Now there is also a new message logged:
Jun 29 14:49:10 mac01 servermgrd[1039]: Still servicing 0:0 requests after 300 seconds, with 1 sessions outstanding, resetting idleTimer
Update #1
It looks like the issue might be a rDNS (reverse DNS) PTR configuration error. A dig +trace
from a 10.5 machine at a different location results in couldn't get address for 'ns3.technotop.nl': not found
. The network provider has been asked to fix this flaky (and faulty) DNS configuration.
Update #2 Another suspect is the built-in firewall. When the firewall is stopped there are no hangs. When the firewall is started, the spinning wheel never stops.
Update #3 The provider fixed ns3.technotop.nl by assigning a DNS A record with IP address 91.196.170.72 at August 23th. Result: still a spinning wheel at the bottom right part of Mac OS Server Admin, and after a few minutes
"The service has encountered an error. Try to refresh the view (127.0.0.1/Server). (kReceivedUnkownError).
Followed by a:
"The service has encountered an error. Try to refresh the view (127.0.0.1/Certificates). (kReceivedUnkownError).
Update #4 The next issue is a CentOS version 5.4 software firewall at the servers where both ns1.technotop.nl and ns2.technotop.nl are running on, that kicked in on Mac OS X 10.7.4 Lion + Server + Admin Tools DNS traffic patterns:
123.45.67.89 # lfd: (PERMBLOCK) 123.45.67.89 has had more than 4 temp blocks in the last 86400 secs - Sun Jun 17 21:20:08 2012
After clearing this block in the CentOS firewall, the firewall block is back within 10 hours for each IP address that is running 10.7.4 and is having a reverse DNS name that is authoritatively served by the CentOS 5.4 machines.
Let's see what a whitelist on the CentOS firewall will do for machines running Mac OS 10.7.4.
Question
How to fix this DNS service related servermgrd flock(servermgr_info) error (for instance by factory default resetting dns service)?
I had the same issue. As suggested here, I tracked the problem down to a bad NS record (typo) for the reverse zone.
changeip -checkhostname reported no issues manually checking the forward/reverse all looked good.
But when I specifically checked the NS record for the reverse zone, noticed the typo.