How to create auto increment IDs in Cassandra
We know that it is easy to create auto increment IDs in SQL databases, is there a good solution for it in Cassandra? The IDs should be for key or column name.
How about the following, using Cassandra's Lightweight transactions
1 - Create IDs table:
CREATE TABLE ids (
id_name varchar,
next_id int,
PRIMARY KEY (id_name)
)
2 - Insert every id you'd like to use a global sequence with
For example:
INSERT INTO ids (id_name, next_id)
VALUES ('person_id', 1)
3 - Then, when inserting to a table where you'd like to use an auto-incremented key, do the following:
3.1 - Get the next_id from the ids table:
SELECT next_id FROM ids WHERE id_name = 'person_id'
Let's say the result is next_id = 1
3.2 - Increment next_id, the following way:
UPDATE ids SET next_id = 2 WHERE id_name = 'person_id' IF next_id = 1
The result should look like this:
[{[applied]: True}]
If it was updated successfully, OR
[{[applied]: False, next_id: 2}]
If someone else has already updated it.
So, if you got True, use id '1' - it is yours. Otherwise, increment next_id (or just use the returned next_id) and repeat the process.
Creating a global sequential sequence of number does not really make any sense in a distributed system. Use UUIDs.
(Because you would have to make all participants agree and accept the evolution of the sequence -- under a naive implementation)
There is no good solution.
- Create a column with a number, increase the number and save it to all replicas together with a temporary id, read all replicas and check if the temporary id is "yours", if not do it again.. not a great solution and will not scale.
or
- Build your own id service where you fetch your next id. This service will only be run in a single instance and will be a non scaling scary factor.
As soon as anything goes beyond a single instance the sequencing of id's gets complicated, at least if you want it to scale. That includes relational databases.
there is a counter datatype which can be used. Consider the below example.
CREATE KEYSPACE counterks WITH REPLICATION =
{ 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };
Create a table for the counter column.
CREATE TABLE counterks.page_view_counts
(counter_value counter,
url_name varchar,
page_name varchar,
PRIMARY KEY (url_name, page_name)
);
Load data into the counter column.
UPDATE counterks.page_view_counts
SET counter_value = counter_value + 1
WHERE url_name='www.datastax.com' AND page_name='home';
Take a look at the counter value.
SELECT * FROM counterks.page_view_counts;
Output is:
url_name | page_name | counter_value
------------------+-----------+---------------
www.datastax.com | home | 1
Increase the value of the counter.
UPDATE counterks.page_view_counts
SET counter_value = counter_value + 2
WHERE url_name='www.datastax.com' AND page_name='home';
Take a look at the counter value.
url_name | page_name | counter_value
------------------+-----------+---------------
www.datastax.com | home | 3
Refer this for more details: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
I think IMHO expecting Cassandra to provide an auto-incrementing field is WRONG
Cassandra is a elegant decentralised database, hence expecting it to provide a auto-incrmenting field is, taxing and defeats the original purpose, because this value has to be then maintained in a central place
Hence, don't make any solution which DB based to get a auto-incrementing number
Instead generate the ID in the code or service in your app, which can keep generating random unique IDs and use that to apply on your data model, this way the objective & benefit of Cassandra will not be defeated