anybody tried neo4j vs titan - pros and cons [closed]
We have a social graph in which in a day we add almost 1 millions of node and twice as many edges. We started with neo4j graph because yes, it is very fast due to fact that its storage is on the same machine on which graph engine runs. But following are the experiences that we would like to share with you about neo4j.
- Not good fit for real time query. We have social structure like twitter. We have to show latest 20 activities (and its associated activities) of all the users that a user follow on his time line. We have some users who follows more than 1000 users. The gremlin query that we wrote for this (if you are interested then we can share gremlin query) really produced so much GC that a server with 8 cpu and 48 gb ram used to freeze and we had to restart the server to get it online again.
- Many a time network partition observed.
- There is not vertex centric index that is very much required in graoh database.
Ultimately we are so much fade up with server performance with gremlin query that we had to change the database to titan.
On titan we are getting reasonable performance and also scaling is very easy as we are using cassandra as backend storage. But mind you that .. using gremlin here also not a good idea as multiget query is very ugly to write and without multiget its query becomes very slow.
Great to see you exploring graph databases. I will speak to the Neo4j part of your question:
More than 30 of the Global 2000 now use Neo4j in production for a wide range of use cases, many of them surprising, even to us! (And we invented the property graph!)
A partial list of customers can be found below: www.neotechnology.com/customers
Neo4j has been in 24x7 production for 10 years, and while the product has of course evolved significantly since then, it's built on a very solid foundation.
Most the companies moving to graph databases--speaking for Neo4j, which is what I know about-- are doing so because either a) their RDBMSs weren't able to handle the scope & scale of their connected query requirements, and/or b) the immense convenience and speed that comes from modeling domains that are a graph (social, network & data center management, fraud, portfolios, identity, etc.) as a graph, not as tables.
For kicks, you can find a number of customer talks here, from the four (soon five) GraphConnect conferences that were held this year in major cities around the world:
http://watch.neo4j.org/
If you're in London, the last one will be held next week: http://www.graphconnect.com
You'll find a summary below of some of the technology behind Neo4j, with some customer examples. To speak very directly to your question about scaling: Neo4j has a unique architecture designed to maximize query response time & query predictability, by allowing horizontal scale-out in such a way that each instance can access the graph without having to hop over the network. (Need more read throughput. Just add instances.) It turns out that this approach works well for 95+% of the graphs out there, including some production customers who have more than half of the Facebook social graph running in a single Neo4j cluster, backing an "always on" 24x7 web site.
www.neotechnology.com/neo4j-scales-for-the-enterprise/
One of the world's largest postal delivery services does all of their real-time package routing with Neo4j. Railroads are building routing systems on Neo4j. Some of the world's largest customers are using them for HR and data governance, alternate-path routing, network & data center management, real-time fraud detection, bioinformatics, etc.
Neo4j's Cypher query language is the only declarative query language built expressly for property graphs. It takes all of the lessons learned from our 13-year old native Java API (which was the basis for Blueprints, which some of the other graph databases have since adopted) and rolls them into a next-generation language. Cypher is a great way to learn graphs, and to develop applications; and there's always the native Java API if you have special needs or value "bare metal" performance (i.e. sub millisecond vs. single-digit millisecond) performance above convenience. Neo4j is built from the ground up to support graphs, and has a graph storage engine that is built to store graphs; unlike some of the more recent additions to the graph database ecosystem, which are architected as graph libraries on top of non-graph databases, and are subject to some of the inherent limitations. (e.g. FlockDB, because it is based on MySQL, will still be very slow for anything greater than one hop.)
Definitely feel free to contact the Neo team if you need anything more specific. We'll be more than happy to help you! http://info.neotechnology.com/ContactUs.html
Good luck!