Anyone know how to fix issues with omsa on red hat 5.1 that reports "No Controllers found"?
Solution 1:
I assume you've done the basic troubleshooting steps of restarting OMSA (service dataeng restart
) and making sure IPMI is loaded:
service dataeng stop
service dsm_sa_ipmi start
service dataeng start
One common non-obvious cause of this problem is system semaphore exhaustion. Check your system logs; if you see something like this:
Server Administrator (Shared Library): Data Engine EventID: 0 A semaphore set has to be created but the system limit for the maximum number of semaphore sets has been exceeded
then you're running out of semaphores.
You can run ipcs -s
to list all of the semaphores currently allocated on your system and then use ipcrm -s <id>
to remove a semaphore (if you're reasonably sure it's no longer needed). You might also want to track down the program that created them (using information from ipcs -s -i <id>
) to make sure it's not leaking semaphores. In my experience, though, most leaks come from programs that were interrupted (by segfaults or similar) before they could run their cleanup code.
If your system really needs all of the semaphores currently allocated, you can increase the number of semaphores available. Run sysctl -a | grep kernel.sem
to see what the current settings are. The final number is the number of semaphores available on the system (normally 128). Copy that line into /etc/sysctl.conf
, change the final number to a larger value, save it, and run sysctl -p
to load the new settings.
Solution 2:
Following asciiphil's intructions worked for me. In my case nrpe
had a lot of semaphores open related to open manage. Cleaned them out and restarted everything.
This failed:
omreport chassis memory
Memory Information
Error : Memory object not found
Make sure there are enough semaphores:
sysctl -a | grep kernel.sem
ipcs -s |wc -l
Stop nrpe
which uses omreport
:
/etc/init.d/nrpe stop
Remove nrpe
semaphores:
ipcs -s | awk '/nrpe/ {print "ipcrm -s ",$2} ' | sh
/etc/init.d/dataeng stop
/etc/init.d/dsm_sa_ipmi stop
/etc/init.d/dsm_sa_ipmi start
/etc/init.d/dataeng start
Make sure it started nicely
tail -n 50 /var/log/messages
Test:
omreport chassis memory
Restart nrpe
:
/etc/init.d/nrpe restart
Solution 3:
I ran into this on a host where a Nagios job was scheduled to check Openmanage. It would manifest as a large number of stale semaphores owned by Nagios.
I put in a nightly cron
job to find the stale ones by simply taking two listings 10 minutes apart; anything present in both listings is assumed to be stale. (Adjust for your circumstances, obviously.)
nagioi () {
ipcs -a | awk '$3 == "nagios" { print $2 }'
}
# Run two listings, 10 minutes apart
# The ones which are in both listings are definitely stuck
(nagioi; sleep 600; nagioi) |
sort | uniq -d |
xargs -n 1 -r -t ipcrm -s
Solution 4:
For this failed:
omreport chassis memory Memory Information Error : Memory object not found
Stop srvadmin-services.sh:
srvadmin-services.sh stop
The following command can be used to clear semaphores with the last-op parameter "Not set":
for i in `ipcs -st |grep "Not set"| cut -d ' ' -f1`; do (ipcrm -s $i); echo -e "$i clear."; done
Start srvadmin-services.sh:
srvadmin-services.sh start