How to obtain number of rows in Cassandra table
This is a super basic question but it's actually been bugging me for days. Is there a good way to obtain the equivalent of a COUNT(*)
of a given table in Cassandra?
I will be moving several hundreds of millions of rows into C* for some load testing and I'd like to at least get a row count on some sample ETL jobs before I move massive amounts of data over the network.
The best idea I have is to basically loop over each row with Python and auto increment a counter. Is there a better way to determine (or even estimate) the row size of a C* table? I've also poked around Datastax Ops Center to see if I can determine the row size there. If you can, I don't see how it's possible.
Anyone else needed to get a count(*)
of a table in C*? If so, how'd you go about doing it?
Solution 1:
Yes, you can use COUNT(*)
. Here's the documentation.
A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result.
Count the number of rows in the users table:
SELECT COUNT(*) FROM users;
Solution 2:
You can use copy to avoid cassandra timeout usually happens on count(*)
cqlsh -e "copy keyspace.table_name (first_partition_key_name) to '/dev/null'" | sed -n 5p | sed 's/ .*//'