How do I deal with the removal/eradication of an unknown worm on our network?
Solution 1:
These are my general suggestions for this kind of process. I appreciate you'll have covered some of them already but its better to be told something twice than miss something important. These notes are orientated towards malware that's spreading on a LAN but could easily be scaled back to deal with more minor infections.
Stopping the rot, and finding the infection source.
Make sure you have an up to date backup of every system and every bit of data on this network that the business cares about. Make sure you note that this restore media may be compromised, so that people don't try and restore from it in 3 months time while your back is turned and infect the network again. If you have a backup from before the infection happened, put this safely to one side too.
Shut down the live network, if you possibly can (you will probably need to do this as part of the cleanup process, at least). At the very least, seriously consider keeping this network, including servers, off the internet until you know what is going on - what if this worm is stealing info?
Don't get ahead of yourself. It's tempting to just say clean build everything at this point, force everyone to change passwords, etc, and call that 'good enough'. While you will probably need to do this sooner or later, it's likely to leave you with pockets of infection if you don't understand what is happening on your LAN. (If you don't want to investigate the infection further go to step 6)
Copy an infected machine to a virtual environment of some kind, isolate this virtual environment from everything else including the host machine before you boot the compromised guest.
Create another couple of clean virtual guest machines for it to infect then isolate that network and use tools like wireshark to monitor the network traffic (time to take advantage of that linux background and create another guest on this virtual LAN that can watch all this traffic without being infected by any Windows worm!) and Process Monitor to monitor changes happening on all these machines. Also consider that the issue may be a well hidden rootkit - try using a reputable tool for finding these but remember that this is a bit of an uphill struggle so finding nothing doesn't mean there is nothing there.
(Assuming you haven't / can't shut down the main LAN) Use wireshark on the main LAN to look at traffic being sent to/from the infected machines. Treat any unexplainable traffic from any machine as potentially suspicious - absence of visible symptoms is not evidence of an absence of any compromise. You should be especially worried about servers and any workstations running business critical information.
Once you have isolated any infected processes on the virtual guests, you should be able to send a sample to the company that made the antivirus software you're using on these machines. They will be keen to examine samples and produce fixes for any new malware they see. In fact, if you have not done so already, you should contact them with your tale of woe as they might have some way of helping.
Try very hard to work out what the original infection vector was - this worm may be an exploit that was hidden inside a compromised website that someone visited, it may have been brought in from someone's home on a memory stick or received by email, to name but a few ways. Did the exploit compromise these machines via a user with admin rights? If so, don't give users admin rights in future. You need to try and make sure the infection source is fixed and you need to see if there is any procedural change you can make to make that infection route more difficult for exploits in the future.
Clean-up
Some of these steps will seem over the top. Heck some of them probably are over the top, especially if you determine that only a few machines are actually compromised, but they should guarantee your network is as clean as it can be. Bosses won't be keen on some of these steps either, but there's not much to be done about that.
Shutdown all machines on the network. All workstations. All servers. Everything. Yes, even the bosses' teenage son's laptop which the son uses to sneak onto the network while waiting for dad to finish work so the son can play 'dubious-javascript-exploit-Ville' on whatever the current social media site du-jour is. In fact, thinking about it, shut this machine down especially. With a brick if that's what it takes.
Start up each server in turn. Apply any fix you've discovered for yourself or have been given by an AV company. Audit the users and groups for any unexplained accounts (both local accounts and AD accounts), audit installed software for anything unexpected and use wireshark on another system to watch traffic coming from this server (If you find any issues at this point then seriously consider rebuilding that server). Shut each system down before you start the next one, so that a compromised machine can't attack the others. Or unplug them from the network, so you can do several at once but they can't talk to each other, its all good.
Once you're as sure as you can be that all your servers are clean, start them up and using wireshark, process monitor, etc. again observe them again for any strange behaviour.
Reset every single user password. And if possible, service account passwords, too. Yeah I know its a pain. We're about to head into "possibly over the top" territory at this point. Your call.
Rebuild all the workstations. Do so one at a time, so that possibly infected machines aren't sitting there idle on the LAN attacking freshly rebuilt ones. Yes this will take a while, sorry about that.
-
If that's not possible then:
Carry out the steps I outlined above for servers on all the "hopefully clean" workstations.
Rebuild all the ones that showed any hint of suspicious activity, and do so while all the "hopefully clean" machines are powered off.
If you haven't already then consider centralised AV that will report problems back to a server where you can watch for problems, centralised event logging, network monitoring, etc. Obviously pick and choose which of these are right for this network's needs and budgets, but there's clearly a problem here, right?
Review user rights and software installs on these machines, and set up a periodic audit to make sure things are still how you expect them to be. Also make sure that users are encouraged to report things asap without being moaned at, encourage a business culture of fixing IT problems rather than shooting the messenger, etc.
Solution 2:
You've done all the things I would do (if I were still a Windows admin) -- The canonical steps are (or were, last time I was a Windows guy):
- Isolate the affected machines.
- Update anti-virus definitions
Run AV/Malware/etc. scans on the whole network - Blow away the affected machines (completely wipe the suckers out) and reinstall.
- Restore user data from backups (making sure it's clean).
Note that there's always a chance the virus/worm/whatever is lurking in email (on your mail server), or inside a macro in a word/excel document -- If the problem comes back you may need to be more aggressive in your cleaning the next time around.
Solution 3:
The first lesson to take from this is that AV solutions aren't perfect. Not even close.
If you are up to date with the AV software vendors, call them. All of them have support numbers for exactly this sort of thing. As a matter of fact they'll probably be very interested in what hit you.
As others have said, take each machine down, wipe it and reinstall. You might take this opportunity to get everyone off of XP anyway. It's been a dead OS for quite some time. At the very least this should involve destroying the HD partitions and reformatting them. Although, it sounds like there aren't that many machines involved, so buying completely new replacements might be a better option.
Also, let your boss(es) know that this just got expensive.
Finally, why in the world would you run all of that off of a single server? (Rhetorical, I know you "inherited" it) A DC should NEVER be accessible from the internet. Fix this by getting the appropriate hardware in place to take care of the functionality you need.