Dynamo Db - Scan vs. Query
Query
A query operation finds items in a table based on the Primary Key attribute and a distinct value to search for.
For example, select an item where the user ID is equal to 212, will select all of the attributes for that item (e.g. first name, surname, e-mail, etc.)
- Use an optional Sort Key name and value to refine results, if you have multiple results.
- If your Sort Key is a timestamp, you can refine the query to only select items with a teimstamp of the last 7 days.
- By deffault, a Query returns all the attributes for the items but you can use the ProjectionExpression parameter if you want the query to only return the specific attributes you want.
- If you only want to see the email address rather than all the attributes.
- Results are always sorted by the Sort Key
- Numeric order - by default in ascending order (e.g. 1, 2, 3, 4)
- ASCII character code values
- You can reverse the order by setting the ScanIndexForward parameter to false (THIS IS ONLY AVAILBLE FOR QUERIES, NOT SCANS)
- By default, queries are Eventually Consistent
- You need to explicitly set the query to be Strongly Consistent
Scan
- A scan operation examines every item in the table
- By default returns all data attributes
- Use the ProjectionExpression parameter to refine the scan to only return the attributes that you want
Query vs. Scan
- A query is much more efficient than a scan.
- Scans dump the entire table, then filters out the values to provide the desired result, removing the unwanted data. This adds an extra step of remove the data you don't want.
- As the table grows, the scan operation takes longer
- Scan operation on a large table can use up the provisioned throughput for a large table in just a single operation.
Ways to improve performance
- You can reduce the impact of a query or scan by setting a smaller page size, which uses fewer operation.
- E.g. set the page size to return 40 items
- Larger number of smaller operations will allow other requests to succeed without throttling.
- Avoid using scan operations if you can. Design tables in a way that you can use the Query, GET, or BatchGetItem APIs.
How to Improve Scan Performance
- By default, a scan operation processes data sequentially in returning 1 MB incremenets before moving on to retrieve the next 1 MB of data. It can only scan one partition at a time.
- You can configure DynamoDB to use Parallel scans instead by logically dividing a table or index into segments and scanning each segment in parallel.
- Best to avoid parallel scans if your table or index is already incurring heavy read / write activity from other applications.