Cassandra: text vs varchar
Does anyone know the difference between the two CQL data types text
and varchar
in Cassandra? The Cassandra documentation describes both types as "UTF-8 encoded string" and nothing more.
text
is just an alias for varchar
!
The documentation:
- Using cql3 - Datastax
- CQL commands - Datastax
- Documentation on Datatypes in CQL - Datastax
- CQL3 Documentation - Apache
EDIT
Here's the link to the C* 1.2 docs. The text vs varchar info is still the same, however this document contains some extra datatypes.
EDIT v2 Documentation links have been updated to the docs for C* 3. I couldn't find a good alternative for the C* 1.2 docs.
Probably you meant the CQL storage types, if not, disregard my answer.
In CQL there has been a ongoing trend to try to distance from the internals of cassandra. Whether that is a good thing, or a bad thing, is open to interpretation. What is relevant, however, is in latest versions of CQL developers have been trying to come up with syntax that is more familiar to people who are not that in depth into cassandra's internals.
If you were to take a look into this SO question, you will get a nice illustration of the situation: Creating column family or table in Cassandra while working Datastax API(which uses new Binary protocol)
In recent CQL versions, some aliases, alien to cassandra, but very well known to DBA's have started to appear. For example, the native to cassandra ColumnFamily has been aliased with Table, and text is just an alias for varchar and vice versa. Again, it is a matter of opinion if that is a good thing or not.
So, in conclusion, you can use varchar and text interchangeably.
Cassandra CQL Data Types text and varchar are synonmys/alias for each other.
- Data Type associated to Varchar is blob(The max theoretical size for a blob is 2 GB)
- Data Type associated to text is Varchar (meaning even you have used text but Cassandra internally treats as Varchar)
- blob type association will not create performance issues because Cassandra stores data in constant hexadecimal number.
- Reads will be faster due to Cassandra queries the right coordinates using primary key (partition key, clustering column) depending on how we design our table.
This threw me too when I started with Cassandra.
Both text and varchar are UTF8 encoded strings and are synonyms for each other, that is they are exactly the same thing.
As an added side note if one comes from a relational world like MS SQL, one would perhaps also be hesitant to use these types (especially TEXT) as the primary field for an entity. TEXT is especially usually associated with big blobs of text content that don't scream primary key to ones 3rd normal form relational mind. But since all Cassandra types are essentially stored as hexadecimal byte arrays on the disk there is no real significant performance when using them as the primary key.