Mac OS X Lion 10.7.2 update breaks SSL

Summary

After updating from 10.7.1 to 10.7.2, neither Safari nor Google Chrome can load GMail. Spinning Beachballs all around.

The problem isn't GMail; Firefox loads GMail just fine.

The problem isn't limited to Safari or Google Chrome; Other applications also have trouble with SSL: Gilgamesh and Safari. Any program that uses WebKit (Google Chrome, Safari) or a Cocoa library (Gilgamesh) to access the Internet has trouble loading secure sites.

The various forums online suggest a handful of fixes, none of which work.

Analysis

Fix #1: Open Keychain Access.app and delete the Unknown certificate.

The 10.7.2 update also prevents Keychain Access from loading. The Keychain program itself Spinning Beachballs.

Fix #2: Delete ~/Library/Keychains/login.keychain and /Library/Keychains/System.keychain.

This temporarily resolves the issue, and lets you load secure sites, but a minute or two after rebooting or hibernating somehow magically undoes the fix, so you have to delete these files over and over.

Fix #3: Delete ~/Library/Application\ Support/Mob* and /Library/Application\ Support/Mob*.

There is a rumor that the new MobileMe/iCloud service ubd is causing the issue. This fix does not resolve the issue.

Fix #4: Open Keychain Access, open the Preferences, and disable OCSP and CRL.

This fix does not resolve the issue.

Fix #5: Use the 10.7.0 -> 10.7.2 combo installer, rather than the 10.7.1 -> 10.7.2 installer.

When I run the combo installer, it stays forever at the "Validating Packages..." screen. The combo installer itself is bugged to He||.

http://speely.files.wordpress.com/2011/10/validating-packages.png

I force-quit the installer, ran "sudo killall installd" to force-quit the background installer process, and reran the combo installer.

Same problem: it stalls at "Validing Packages..."

Recap

The only fix that works is deleting the keychains, but you have to do this every time you reboot or wake from hibernate. There is some evidence that ubd continually corrupts the keychain files, but the suggested ubd fix of deleting ~/Library/Application\ Support/Mob* and /Library/Application\ Support/Mob* does not resolve this issue.

Evidently, something is corrupting the keychain over and over and over.

Also posted on the Apple Support Communities.


Our Mac support person has had success running DiskWarrior to fix the problem. None of his customers have reported the issue popping back up so far.

UPDATE:

I've figured out a fix. The issue is happening because the captive portal replies to EVERYTHING. I adjusted the captive portal's DNS to give bad results for OCSP and CRL sites. I used 127.0.0.1 in this case. The requests now timeout instead of giving back incorrect data. It also works locally by changing "/private/etc/hosts" and adding entries like this:

127.0.0.1    crl.usertrust.com
127.0.0.1    ocsp.usertrust.com
127.0.0.1    crl.incommon.org
127.0.0.1    ocsp.incommon.org

The correct entries may depend on the CA for the certificate. I found these addresses while watching the connection using Wireshark.


Might I add that MobileMe is now iCloud so the folder is not Application Support/Mobi*, but rather Application Support/Ubiquity.

Delete that, though I had mixed results. It only worked 1/3 times. The way that definitely works is deleting:

~/Library/Keychains/login.keychain and /Library/Keychains/System.keychain

Just rinse and repeat. I'm not completely certain when the Keychain Access will break, but at some point (typically about 3 days in for me) everything stops working.

Firefox seems to get around things and if you don't want to do anything at all, you can turn off OCSP in Firefox (about:config) just to login to your wireless portal, and then remember to turn it back on. This won't fix Safari or Chrome though (what's the word on Opera?)

But the best solution is to restore to 10.7.1 or 10.7. I happened to have the DMG file from earlier and it was 10.7.1. Doing a "reinstall" only copies your Lion system files over and keeps your entire install in shape. So you're really just reinstalling the OS but keeping ALL your apps and data. So far this has been perfect. Just remember not to update to 10.7.2 if you're going to roll back.


Turning Off OCSP and CRL checks is avery bad idea. Essentially you are saying you don't care about certificate revocation. This is not good given the number of certificate authorities getting hacked these days. It is why apple upgraded it's security for captive portals. The problem is in the captive portal connection itself. If you go to one, you cannot check for CRL or OCSP because (duh!) you are in the captive portal. Whomever provides this portal, must also poke holes in their firewall to allow you out from the captive portal to check the certificates that the https captive portal page is giving you. We had to do this on our enterprise wireless system before Lion could get anywhere.


After weeks of frustration with this constantly recurring problem (and waiting for Apple to release a fix), I decided to look for any solution that would rollback from the 10.7.2 update. Unfortunately I didn't find any way of doing a rollback to 10.7.1 or 10.7.0.

I therefore decided to use the Lion Recovery and see how it affects the situation. I performed the recovery procedure by following the steps outlined in this Apple support article.

I'm happy to inform now that in my case, the problem has (hopefully) disappeared! For some reason, even though the Lion Recovery process reinstalled OS X back to the 10.7.2 version, I haven't had any problems with Keychain or SSL for over a week now.

I used the online recovery mode and it seems that the OS X version that is downloaded during the recovery process does have some sort of setup that doesn't corrupt the keychain. The Lion Recovery process was super smooth and I did not have to reinstall any apps or recover files from backup (I would still recommend doing backups). My MacBook Pro is version 5,1.

This is the first time for me that Apple's security update has broken OS X in such a big way. I still wonder why they haven't released a fix/update to correct this issue.