I'm learning traditional Relational Databases (with PostgreSQL) and doing some research I've come across some new types of databases. CouchDB, Drizzle, and Scalaris to name a few, what is going to be the next database technologies to deal with?


Solution 1:

I would say next-gen database, not next-gen SQL.

SQL is a language for querying and manipulating relational databases. SQL is dictated by an international standard. While the standard is revised, it seems to always work within the relational database paradigm.

Here are a few new data storage technologies that are getting attention currently:

  • CouchDB is a non-relational database. They call it a document-oriented database.
  • Amazon SimpleDB is also a non-relational database accessed in a distributed manner through a web service. Amazon also has a distributed key-value store called Dynamo, which powers some of its S3 services.
  • Dynomite and Kai are open source solutions inspired by Amazon Dynamo.
  • BigTable is a proprietary data storage solution used by Google, and implemented using their Google File System technology. Google's MapReduce framework uses BigTable.
  • Hadoop is an open-source technology inspired by Google's MapReduce, and serving a similar need, to distribute the work of very large scale data stores.
  • Scalaris is a distributed transactional key/value store. Also not relational, and does not use SQL. It's a research project from the Zuse Institute in Berlin, Germany.
  • RDF is a standard for storing semantic data, in which data and metadata are interchangeable. It has its own query language SPARQL, which resembles SQL superficially, but is actually totally different.
  • Vertica is a highly scalable column-oriented analytic database designed for distributed (grid) architecture. It does claim to be relational and SQL-compliant. It can be used through Amazon's Elastic Compute Cloud.
  • Greenplum is a high-scale data warehousing DBMS, which implements both MapReduce and SQL.
  • XML isn't a DBMS at all, it's an interchange format. But some DBMS products work with data in XML format.
  • ODBMS, or Object Databases, are for managing complex data. There don't seem to be any dominant ODBMS products in the mainstream, perhaps because of lack of standardization. Standard SQL is gradually gaining some OO features (e.g. extensible data types and tables).
  • Drizzle is a relational database, drawing a lot of its code from MySQL. It includes various architectural changes designed to manage data in a scalable "cloud computing" system architecture. Presumably it will continue to use standard SQL with some MySQL enhancements.
  • Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store, developed at Facebook by one of the authors of Amazon Dynamo, and contributed to the Apache project.
  • Project Voldemort is a non-relational, distributed, key-value storage system. It is used at LinkedIn.com
  • Berkeley DB deserves some mention too. It's not "next-gen" because it dates back to the early 1990's. It's a popular key-value store that is easy to embed in a variety of applications. The technology is currently owned by Oracle Corp.

Also see this nice article by Richard Jones: "Anti-RDBMS: A list of distributed key-value stores." He goes into more detail describing some of these technologies.

Relational databases have weaknesses, to be sure. People have been arguing that they don't handle all data modeling requirements since the day it was first introduced.

Year after year, researchers come up with new ways of managing data to satisfy special requirements: either requirements to handle data relationships that don't fit into the relational model, or else requirements of high-scale volume or speed that demand data processing be done on distributed collections of servers, instead of central database servers.

Even though these advanced technologies do great things to solve the specialized problem they were designed for, relational databases are still a good general-purpose solution for most business needs. SQL isn't going away.


I've written an article in php|Architect magazine about the innovation of non-relational databases, and data modeling in relational vs. non-relational databases. http://www.phparch.com/magazine/2010-2/september/

Solution 2:

I'm missing graph databases in the answers so far. A graph or network of objects is common in programming and can be useful in databases as well. It can handle semi-structured and interconnected information in an efficient way. Among the areas where graph databases have gained a lot of interest are semantic web and bioinformatics. RDF was mentioned, and it is in fact a language that represents a graph. Here's some pointers to what's happening in the graph database area:

  • Graphs - a better database abstraction
  • Graphd, the backend of Freebase
  • Neo4j open source graph database engine
  • AllegroGraph RDFstore
  • Graphdb abstraction layer for bioinformatics
  • Graphdb behind Directed Edge recommendation engine

I'm part of the Neo4j project, which is written in Java but has bindings to Python, Ruby and Scala as well. Some people use it with Clojure or Groovy/Grails. There is also a GUI tool evolving.

Solution 3:

Might be not the best place to answer with this, but I'd like to share this taxonomy of noSQL world created by Steve Yen (please find it at http://de.slideshare.net/northscale/nosqloakland-200911021)

  1. key‐value‐cache

    • memcached
    • repcached
    • coherence
    • 
infinispan
    • eXtreme
scale
    • 
jboss
cache
    • velocity
    • terracoqa

  2. key‐value‐store

    • 
keyspace
    • 
flare
    • schema‐free
    • 
RAMCloud

  3. eventually‐consistent key‐value‐store

    • dynamo
    • 
voldemort
    • 
Dynomite
    • 
SubRecord
    • 
MongoDb
    • 
Dovetaildb
  4. ordered‐key‐value‐store

    • tokyo
tyrant
    • lightcloud
    • 
NMDB
    • luxio
    • 
memcachedb
    • 
actord

  5. data‐structures server

    • redis

  6. tuple‐store

    • gigaspaces
    • 
coord
    • 
apache
river
  7. object database

    • ZopeDB
    • db4o
    • 
Shoal

  8. document store

    • CouchDB
    • Mongo
    • 
Jackrabbit
    • 
XML
Databases
    • 
ThruDB
    • 
CloudKit
    • 
Perservere
    • 
Riak
Basho
    • 
Scalaris

  9. wide columnar store

    • BigTable
    • Hbase
    • Cassandra
    • Hypertable
    • KAI
    • 
OpenNep