Is there a .NET equivalent to Apache Hadoop? [closed]

Solution 1:

Have you looked at using Hadoop's streaming?

I use it in python all the time :-).

I'm starting to see that the heterogeneous approach is often the best and it looks like other folks are doing the same.

If you look at projects like protocol-buffers or facebook's thrift you see that sometimes it's just best to use an app written in another language and build the glue in the language of your preference.

Solution 2:

Recently, MySpace released their .NET MapReduce framework, Qizmt, as Open Source, so this is also a potential contender in this space.

Solution 3:

See http://research.microsoft.com/en-us/projects/dryadlinq/default.aspx or http://msdn.microsoft.com/en-us/library/dd179423.aspx

Solution 4:

I answered your question in my question here

To say it here in the source:

Microsoft dropped its alternative (Dryad) in favor of Hadoop. Next year they will release MS SQL Server 2012 with Hadoop integration. Azure and Windows Sever support is being developed even as we speak.

It will be available in the first half in 2012.

Hadoop is the #1 BigData platform and is going to be supported by opensource and proprietary source (Java, .Net, Python, ...) even Oracle is adopting it.

If you were developing something, you should wait if you're on the .Net platform.

More information about what is possible will be available here

Solution 5:

I would say that DryadLinq is the closest thing that us .NET folk have to Hadoop. But it depends what you want to use hadoop for. If you are looking for the optimized self maintaining distributed file (DFS) system then DryadLINQ isn't what you are looking for. It has an analog to the DFS but you have to manually build the partitions and distribute each partition.

That being said, if its the distributed execution aspect of Hadoop that you are looking for than DryadLINQ is truly wonderful (and no, i'm not affiliated with MS). As long as you have a Microsoft HPC cluster setup than getting going with DryadLINQ is really easy.

The code you write is really just straight LINQ code, except instead of executing the LINQ on IEnumerable<T> you have to execute it on PartitionedTable<T> (the self build distributed data structure).

What has really been cool about DryadLINQ is the fast turn around time (try, test, adjust, repeat) when developing algorithms. You just write LINQ code to do your calculations and DryadLINQ will take care of the whole distributed execution part. It's the most natural analog I've come across that makes writing code for distributed processing just like writing code for single process processing.