NOC Situational Awareness
In our NOC, we maintain situational awareness of all physical security zones (reactive video feeds), some basic information about the physical characteristics of the data centers, the weather and a national news feed. Are there other things you recommend a NOC monitor, or is this considered good enough.
Solution 1:
This is a pretty broad question, but I'm assuming we're avoiding service- or logical-level monitoring (ie, SSH listening, web sites responding properly, disk space, CPU usage, etc.).
Your NOC should be doing both constant hands-off monitoring as well as periodic hands-on/eyes-on monitoring.
Constantly monitor:
- Ambient temperature/humidity from multiple sensors in datacenter
- Power draw for all circuits in datacenter
- AC unit's self-reported load/health
- Video feeds of datacenter interior, all datacenter entrances, and entrance to NOC area
- UPS battery status
- Log all entrance/exits from datacenter (whether by swipe card or manual sign-in)
- Switch port/router interface status events (up/down/change in speed)
Periodic walkthroughs:
- Racks closed and locked
- Any audible or visual hardware alarms (lights, tones, status LCDs)
- Floor tiles in place and in good condition
- Datacenter entrances closed and locked
- Cameras undisturbed
- No unexpected visitors in or around datacenter
- Failed lights, broken windows, damaged doors, anything that makes physical security easier to breach
Also, why bother with a national news feed? Unless you have datacenters distributed throughout the country, it makes much more sense to tune into local news stations. The value of this as a whole is questionable, though, since it's going to be a very high noise:signal ratio. If anything, just subscribe to an RSS feed from your relevant news outlets.
Finally, this may not be what you're looking for, but I've found testing outside my network to be invaluable. Testing for high latency or packet loss to several well known networks (Google, Yahoo, Microsoft, etc.) with small and large packets, checking common websites for proper return codes, and measuring round trip latency for email to/from several popular free mail sites (Yahoo, Hotmail, Gmail) has given me the jump on several subtle problems before my users started calling.
Solution 2:
The Radar available from Securitywizardry.com is always something I've wanted to put up on a big plasma in a NOC/SOC for the cool factor: http://www.securitywizardry.com/radar.htm
It includes security news, "health" levels, emerging threats, and information on tool updates.
Solution 3:
My thought is the walk though physical side of things.
Walk though often, if you are onsite and a 24/7 office then have 2 people walk though seperately at the start of their shift. If they walk though together they will just chat and not see what they should be seeing.
- If not on site try and setup a regular walk though by either an employee or hire a local contractor to do it.
- I would say weekly min, but daily is better.
- Make this more frequent if you have any liquid pipes in there (HVAC chiller lines, domestic water lines, sewage from a bathroom above you, etc)
- Look for the out of place
- Dripping pipes can be caught by a human eye before an electronic sensor
- Check under the floor
- Feel the HVAC vents to make sure they are on
- LISTEN to the UPS's. If they are humming louder then normal you might to check it
- Same goes for any transformers you have
- SMELL, there are a lot of electronics to start frying in there
- Keep it clean
If you have a seperate generator/utility room, check that too
- Make sure the fuel lines are not leaking
- Keep it clean
- Noise and smell
- If the generator is on an auto test, check it after the test, make sure nothing sprung a leak on it.
Its not a bad idea to have your electrical panels thermo-scanned a couple of times a year. This will find gear that is nearing failure since it will appear hotter.
Every failure starts small, if you catch it when small, you can fix it on your schedule
Solution 4:
How about:
NOC staff RSS feeds/Tweets
How do each of you at the NOC share information? Yammer (https://www.yammer.com/) could be a good way to share enterprise-wide information/status (via email/IM/RSS).