What is the difference between scan and query in dynamodb? When use scan / query?

When creating a Dynamodb table select Primary Keys and Local Secondary Indexes (LSIs) so that a Query operation returns the items you want.

Query operations only support an equal operator evaluation of the Primary Key, but conditional (=, <, <=, >, >=, Between, Begin) on the Sort Key.

Scan operations are generally slower and more expensive as the operation has to iterate through each item in your table to get the items you are requesting.

Example:

Table: CustomerId, AccountType, Country, LastPurchase

Primary Key: CustomerId + AccountType

In this example, you can use a Query operation to get:

  1. A CustomerId with a conditional filter on AccountType

A Scan operation would need to be used to return:

  1. All Customers with a specific AccountType
  2. Items based on conditional filters by Country, ie All Customers from USA
  3. Items based on conditional filters by LastPurchase, ie All Customers that made a purchase in the last month

To avoid scan operations on frequently used operations create a Local Secondary Index (LSI) or Global Secondary Index (GSI).

Example:

Table: CustomerId, AccountType, Country, LastPurchase

Primary Key: CustomerId + AccountType
GSI: AccountType + CustomerId
LSI: CustomerId + LastPurchase

In this example a Query operation can allow you to get:

  1. A CustomerId with a conditional filter on AccountType
  2. [GSI] A conditional filter on CustomerIds for a specific AccountType
  3. [LSI] A CustomerId with a conditional filter on LastPurchase

You are having dynamodb table partition key/primary key as customer_country. If you use query, customer_country is the mandatory field to make query operation. All the filters can be made only items that belongs to customer_country.

If you perform table scan the filter will be performed on all partition key/primary key. First it fetched all data and apply filter after fetching from table.

eg:

here customer_country is the partition key/primary key and id is the sort_key

-----------------------------------

customer_country | name   | id

-----------------------------------
VV               | Tom    | 1

VV               | Jack   | 2

VV               | Mary   | 4

BB               | Nancy  | 5

BB               | Lom    | 6

BB               | XX     | 7

CC               | YY     | 8

CC               | ZZ     | 9

------------------------------------
  • If you perform query operation it applies only on customer_country value. The value should only be equal operator (=).

  • So only items equal to that partition key/primary key value are fetched.

  • If you perform scan operation it fetches all items in that table and filter out data after it takes that data.

Note: Don't perform scan operation it exceeds your RCU.


Its similar as in the relational database.

Get query you are using a primary key in where condition, The computation complexity is log(n) as the most of key structure is binary tree.

while scan query you have to scan whole table then apply filter on every single row to find the right result. The performance is O(n). Its much slower if your table is big.

In short, Try to use query if you know primary key. only scan for only the worst case.

Also, think about the global secondary index to support a different kind of queries on different keys to gain performance objective


In terms of performance, I think it's good practice to design your table for applications to use Query instead of Scan. Because a scan operation always scan the entire table before it filters out the desired values, which means it takes more time and space to process data operations such as read, write and delete. For more information, please refer to the official document