How to obtain number of rows in Cassandra table

This is a super basic question but it's actually been bugging me for days. Is there a good way to obtain the equivalent of a COUNT(*) of a given table in Cassandra?

I will be moving several hundreds of millions of rows into C* for some load testing and I'd like to at least get a row count on some sample ETL jobs before I move massive amounts of data over the network.

The best idea I have is to basically loop over each row with Python and auto increment a counter. Is there a better way to determine (or even estimate) the row size of a C* table? I've also poked around Datastax Ops Center to see if I can determine the row size there. If you can, I don't see how it's possible.

Anyone else needed to get a count(*) of a table in C*? If so, how'd you go about doing it?

Solution 1:

Yes, you can use COUNT(*). Here's the documentation.

A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.

Count the number of rows in the users table:
SELECT COUNT(*) FROM users;

Solution 2:

You can use copy to avoid cassandra timeout usually happens on count(*)

cqlsh -e "copy keyspace.table_name (first_partition_key_name) to '/dev/null'" | sed -n 5p | sed 's/ .*//'

How to obtain number of rows in Cassandra table

Solution 1:

Solution 2:

Related

Recent Posts