HW/SW Design: 2 Petabyte of storage

Well, you didn't mention budget... So buy this now. Data at that scale should probably be left in the hands of a team with experience in that realm. It's nice having support and someone to yell at :)

http://www.racktopsystems.com/products/brickstor-superscalar/

http://www.racktopsystems.com/products/brickstor-superscalar/tech-specs/

4 x Storage Heads BrickStor Foundation Units
10 x BrickStor Bricks (36 x 3.5″ Bay JBOD)
2 x 16-port SAS switch
1 x pullout rackmount KVM
1 x 48U Rack
1 x 10Gb Network Switch (24 x 10Gb non-Blocking)
NexentaStor Plug-ins:VMDC, WORM, HA-cluster or Simple-HA
Onsite installation 5-days
24/7/365 day email and phone support
Onsite Support

Since the application you describe really doesn't seem to be in the realm of clustered storage (given the use-case), use ZFS. You'll get the infinite scalability. You'll get a chance to offload some of the compression to the storage system and you can tell all of your friends about it :)

More than that, the L2ARC caching (using SSDs) will keep the hot data available for analysis at SSD speed.

Edit: Another ZFS-based solution - http://www.aberdeeninc.com/abcatg/petarack.htm


Also, Red Hat is now in the scale-out storage industry.

See: http://www.redhat.com/products/storage/storage-software/


As MDMarra mentions you need Splunk for this, I'm a big user and fan, for very similar volumes as you discuss and right away it'll save you having to buy anywhere near that much storage and reduce all the complexity. One decent sized server (maybe 150-200TB max) will do the job if used with Splunk, it's on-the-fly indexing is perfect for this kind of thing and it's search capabilities far outstrip anything you'll manage yourself. It's not free of course but I'd not consider anything else.