Open Source Filtering of HTTPS Traffic

We use to have a filtering/proxy setup that was configured so that about 500 users were routed through a Unix-based system running SQUID with SquidGuard to block questionable content and log what users were doing. What we found was that there were some people able to bypass the block by going to free online proxies that use the HTTPS protocol. I couldn't find a way to block HTTPS traffic without disabling it altogether, which wasn't going to work for purchasing and HR (or regular users with banking sites).

Is there a way to block or filter HTTPS traffic with SQUID/Squidguard? Or some other open source monitoring programs? How do others deal with it without resorting to commercial appliances?

EDIT: We're not trying to watch the traffic. We're trying to monitor the urls and block them as necessary. We don't want credit card numbers and such...we wanted to know when Johnny was accessing https://schoolssucksoweproxyforyou.com/playboy.com, add the domain to a blockfile, and prevent them from doing it again. See some added comments below...

EDIT2: (Re: speaking of HR, banking, etc...) You may not think about it that way, but we have a person that does purchasing, our district covers 7 buildings scattered around the county so we have someone in charge of distributing materials and inventory, we have the same people working with checking and banking records, we have people in charge of unions for staff and another for faculty, we have people in charge of dealing with HR information like insurance and injury claims and liability...there's a lot of things involved in running schools that I don't think the general public ever thinks about. Plus we do have issues where it was considered overbearing to prevent staff/faculty from doing online banking, or even students from being able to access certain sites for online use (such as web quests) that sometimes required SSL connections to work, so banning HTTPS outright at the router would not be feasible.

Anyway, moving on to your suggestion about blocking only certain people, I didn't see a way to integrate the functionality in for authentication on a per-user basis; if credentials could be passed along by Windows so users didn't need to keep authenticating with yet another password it would be a workable solution but anything I found was kludgey and didn't work reliably. Then it's an issue of getting people to remember yet another password (or students/staff stealing/sharing passwords). I personally liked having filtering done fairly across the board for staff and students rather than making it an issue of teachers can do X while students aren't considered enough of a person to do that same thing.

In the end I couldn't find a way to get the server to reliably authenticate against Active Directory (to centralize user management and reduce their passwords to remember), or if I used a different authentication scheme it would mean another database to keep in sync with...what, last count, ~1200 users or so? with a high churn as every year we have kids in and out of district, graduating, and new ones coming in?...


Solution 1:

I'm not going to get into ethics here - maybe you want to do this on all sites you don't "know" the domain for... anyway, ethical issues aside:

It is possible, definitely need to add bits for squid2, but AFAIK squid3 will do this out of the box. As will some commercial web filter vendors. MITM style attack is generally the only way.

Solution 2:

The entire point of HTTPS traffic is that it's encrypted between the server and the end-user so no one else can snoop on it - including your filters. You won't be able to do any content filtering on it. The only HTTPS filtering you'll be able to do is blocking the SSL port to specific IP addresses.

If you whitelist, you'll have loads of false positives - banks you didn't think of, useful sites that require HTTPS to login or access, etc. If you blacklist, you'll have loads of false negatives - new proxy sites pop up every second.

This is something that needs to be addressed at a policy level, not a technical one. If someone's goofing off on porn sites at work and using proxies to get around your filters, HR should be smacking them on the hand and threatening termination if it continues.

Solution 3:

Here's a very ugly solution that's implemented by a commercial vendor:

  • Replace the browsers' CA certs with your own in-house one

  • When a connection is requested to an unknown address, the proxy connects with its own client, and fetches the sites' cert

  • Then generate a fake cert for that site, signed by your own CA

  • The proxy then effectively acts as a MITM (man-in-the-middle)

You can't do that with stock Squid, but it would take me about a day of mod_perl hacking to implement that with Apache.

Solution 4:

What solution might be to do what IRC servers does sometimes, if they see a connection from XXX.XXX.XXX.XXX they will try to connect to that IP and see if its a open proxy server and if it is they block that IP.

Thats the closest thing i can think of that would be a fully automatic solution, but would require work on your end. That combined with the other suggetions regarding white listing or just manualy look at logs to see and then check the remote ip of the https and see if its hosting a site might be the only soltion