Understanding DynamoDB Condition Expressions

If you're working with DynamoDB, you're likely to rely on Condition Expressions when manipulating items in your table. Condition Expressions can ensure you don't overwrite existing users, allow bank account balances to drop below \$0, or give Admin access to every user in your application.

Yet despite their usefulness, I see Condition Expressions misunderstood quite often. My hunch is that this is due to an underdeveloped mental model of how DynamoDB works and why it makes the choices it makes.

In this post, you'll learn all about DynamoDB's Condition Expressions. First, we'll start with some background on what Condition Expressions are. We'll see why they are helpful and the API operations to which they apply.

Second, we'll discuss how to think about Condition Expressions. Here, we'll build the proper mental model about DynamoDB and scaling in order to understand how Condition Expressions fit in.

Finally, we'll go through some common patterns and mistakes that I see with Condition Expressions. In doing so, you'll see practical examples, including:

Confirming existence or non-existence of an item;
Enforcing multiple unique attributes;
Enforcing business rules;
Enforcing rules based on aggregates.

Let's get started!

What are DynamoDB Condition Expressions?

Before we learn the specifics of DynamoDB Condition Expressions, let's learn what they are and why you would want to use them.

A ConditionExpression is an optional parameter that you can use on write-based operations. If you include a Condition Expression in your write operation, it will be evaluated prior to executing the write. If the Condition Expression evaluates to false, the write will be aborted.

This can be useful in a number of common patterns, such as:

Ensuring uniqueness,
Validating business rules, and
Confirming existence.

By using Condition Expressions, you can reduce the number of trips you make to DynamoDB and avoid race conditions when multiple clients are writing to DynamoDB at the same time.

Condition Expressions can be used in the following write-based API operations:

PutItem
UpdateItem
DeleteItem
TransactWriteItems

Careful observers might notice there's one write-based operation missing -- BatchWriteItem. You cannot use Condition Expressions with BatchWriteItem (BatchWriteItem has a number of deficiencies, and fixing them is a major #awswishlist item for me).

One final note -- Condition Expressions have expanded power within the TransactWriteItems operation. You can use a special action, ConditionCheck, that solely asserts a condition without actually performing a write against that item. Because a DynamoDB Transaction must succeed or fail together, the failure of a ConditionCheck will fail the entire transaction.

How to think about Condition Expressions

Now that we know the basics around Condition Expressions, let's discuss how you should think about Condition Expressions. There's some subtlety around how Condition Expressions are evaluated that confuse folks that are new to DynamoDB.

In the benign case, this can cause frustration around writing Condition Expressions. In the worst case, this can lead to incorrect logic that results in bad data in your table.

Let's take a look at the process for evaluating Condition Expressions in DynamoDB. Before we begin, recall that each item in DynamoDB is uniquely identified by its primary key. You can't have two items with the same primary key. Additionally, each write operation must include the primary key so that you know which item you're manipulating.

When evaluating a Condition Expression, DynamoDB uses the following steps:

Condition Expression process

First, using the primary key given for the write operation, identify the existing item (if any) with that primary key.

Second, evaluate the Condition Expression against the existing item (or null, if there is no existing item).

Finally, proceed with the write operation if the Conditin Expression evaluates to true.

The key point here is when DynamoDB evaluates a Condition Expression, it is comparing it against at most one item in your table.

Condition Expression process with note on one item

We'll look into the practical implications of that in the next section, but first let's understand why DynamoDB processes Condition Expressions in this way.

Recall that DynamoDB puts a premium on predictable performance. It wants a write operation to take the same time when your database is empty as when your database has 10 TB of data.

One of the big sources of unpredictable performance is unbounded queries. If your database needs to validate against an increasing number of records as your data grows, your conditional writes are going to get slower and slower.

To avoid this, DynamoDB doesn't allow you to assert conditions against an unbounded number of items. In fact, it doesn't allow you to make assertions against multiple items. It's going to locate a single item (which will take <10ms regardless of the size of your table), and compare the Condition Expression against that item.

Condition Expressions in action

If you're having trouble grokking this, a walkthrough example may be helpful.

Imagine we have a book review application, similar to Goodreads. Users can sign up for an account and leave reviews for books. Other users can browse through books to find aggregated reviews.

When storing reviews in your table, you might decide on the following primary key pattern:

PK: user#${username}#book#${book}
SK: ${timestamp}

This isn't a pattern I would recommend, for reasons we'll see in a second.

In doing so, you might have some data in your table as follows:

Book reviews table with timestamp as sort key

You can see that the user alexdebrie has reviewed Goldilocks, To Kill a Mockingbird, and Dune.

When alexdebrie tries to add a new review for Goldilocks, you might issue a PutItem request that looks as follows:

const response = await client.putItem({
   TableName: "BookReviews",
   Item: {
     "PK": "user#alexdebrie#book#goldilocks",
     "SK": "2021-01-21T14:59:21"
     ...additional properties ...
   },
   ConditionExpression='attribute_not_exists(PK)'
}).promise()

Notice the ConditionExpression parameter of attribute_not_exists(PK). You could think this means "Only write this item if there are no items with this PK value." However, that's incorrect! Remember that DynamoDB will first look for a single item with the exact primary key value, then do the comparison.

In this case, there is no existing item that has the same primary key (PK and SK) as our new item. Because of that, the Condition Expression will be evaluated against a null item. Because the null item does not have a PK value, the Condition Expression will evaulate to true and the write will succeed.

The core problem here is that we've added a non-deterministic element (timestamp) into our primary key. Using properties like timestamp or UUIDs can be useful in DynamoDB primary keys, but only if you don't want uniqueness on other attributes of that item.

Let's fix our example. We'll change our primary key pattern to be as follows:

PK: user#${username}
SK: book#${book}

Notice that we've removed the non-deterministic element (timestamp) from our primary key, which makes it easier for us to match and uniquely identify items.

Our updated PutItem request would be as follows:

const response = await client.putItem({
   TableName: "BookReviews",
   Item: {
     "PK": "user#alexdebrie",
     "SK": "book#goldilocks"
     ...additional properties ...
   },
   ConditionExpression='attribute_not_exists(PK)'
}).promise()

Using our table below, let's think through the steps:

Books reviews table with proper primary key

First, it will match the existing item for alexdebrie reviewing the book 'Goldilocks'. Then, it will evaluate the Condition Expression. Because the PK attribute exists on the matched item, the Condition Expression evaluates to false and the write is rejected.

With this basic model in mind, we can move on to some common patterns with Condition Expressions.

Common patterns and mistakes

Let's get into some practical use cases of how to be effective with Condition Expressions in DynamoDB. For each use case, I'll describe when you're most likely to see it. I'll also describe any common mistakes I see with a given pattern.

1. Confirming existence or non-existence of an item

The first, and most common, use case for Condition Expressions is to confirm the existence or non-existence of a particular item. When creating an item, you often want to ensure an item with the same primary key doesn't already exist to avoid overwriting data. When updating or deleting an item, you may want to confirm the item exists first to avoid unexpected states in your application.

To handle this, you can use the attribute_exists() and attribute_not_exists() functions in your Condition Expression.

To confirm that you're not overwriting an existing item, your Condition Expression would be attribute_not_exists(pk).

To ensure that the item exists before manipulating it, your Condition Expression would be attribute_exists(pk).

A common mistake I see here is to have multiple statements in the Condition Expression, such as attribute_not_exists(pk) AND attribute_not_exists(sk). While this isn't harmful, the second statement is extraneous. Recall that DynamoDB will first identify an item to compare against, then run the Condition Expression. If DynamoDB finds an item, it will have both the pk and the sk (assuming that's your primary key structure).

Again, this won't do any harm for your running code. However, I prefer to remove the extraneous statement for clarity. It requires less consideration if you need to change the Condition Expression in the future.

2. Enforcing multiple unique attributes

A similar but more pernicious problem when developers try to handle uniqueness on two different attributes with a single item. If you're not careful, you may implement it in a way doesn't handle uniqueness on either attribute!

The canonical example I use here is for a user creation workflow. Imagine that you want to ensure both that the username is unique and that the email address hasn't been used for another user.

I may use a primary key pattern that encodes the username in the pk and the email in the sk, as shown in the following model from DynamoDB Workbench:

Users table with two user items

Notice how alexdebrie has signed up with email address of alexdebrie1@gmail.com (my real address -- email me if you have questions!), whereas lukeskywalker has an email of rebel@gmail.com.

Imagine a new user tries to sign up with the email of alexdebrie but with an email address of evilalex@hotmail.com. The PutItem request would look something like the following:

const response = await client.putItem({
   TableName: "MyTable",
   Item: {
     "PK": "user#alexdebrie",
     "SK": "email#evilalex@hotmail.com"
     ...additional properties ...
   },
   ConditionExpression='attribute_not_exists(PK) AND attribute_not_exists(SK)'
}).promise()

Notice the Condition Expression is the same in the last example -- attribute_not_exists(PK) AND attribute_not_exists(SK). However, the result is much worse this time!

When evaluating this write, DynamoDB will first look for an item with the exact primary key. Though there is an item with the same PK value, there is not an item with the same primary key combination for PK and SK. As a result, no item would be found. When evaluating the Condition Expression, the matched item does not exist, and thus the attribute_not_exists() statements would fail.

When you combine multiple elements into a primary key (here, username and email address), you can only confirm that the combination of those two elements is unique.

If you want to ensure each element is unique, you'll need to create two separate items and use a DynamoDB Transaction to confirm that neither exists. For an example of this, check out an example in my post on DynamoDB Transactions.

3. Enforcing business rules

In addition to basic uniqueness, you may also want to enforce certain business rules from your application when performing writes to DynamoDB. Using Condition Expressions is an easier way to handle that compared to retrieving the item and evaluating the business logic in your application code.

So far, we've only used the existence functions (attribute_exists() and attribute_not_exists()), but you can also use additional functions or even math expressions when asserting your conditions.

For this example, imagine you have a banking application where users have accounts with balances. When a user initiates a transaction, you want to confirm that the account doesn't drop below \$0 before approving. To do so, you might have an UpdateItem operation that looks as follows:

const response = await client.updateItem({
    TableName: "MyTable",
    Key: {
        "PK": "user#alexdebrie",
        "SK": "account#0123456789"
    },
    ConditionExpression='#balance > :amount',
    UpdateExpression='SET #balance = #balance - :amount',
    ExpressionAttributeNames={
        "#balance": "balance"
    },
    ExpressionAttributeValues={
        ":amount": { "N": "<amount of transaction>" },
        ":zero": { "N": "0" },
    }
}).promise()

In this API call, we want to reduce the current balance of the account by the amount of the transaction. However, before we do so, we use the ConditionExpression to assert that the current balance is greater than the transaction amount so that the balance doesn't dip below \$0.

You can use this for a variety of use cases:

Comparing timestamps to ensure an item hasn't expired;
Ensuring a particular user has permissions on the given item;
Confirming you are working on the latest version of the item.

4. Enforcing rules based on aggregates

The last example is a little tricky -- what if you need to enforce conditions based on aggregates across multiple rows?

In the most common example, you may want to limit the number of items for a particular relationship -- an organization using your SaaS product may only invite 10 users, or a GitHub user may only create 5 private repositories. It could also be a maximum or minimum value across a number of records.

Because DynamoDB does a comparison against only a single item, you'll have to be creative to handle this use case. You'll need to maintain the aggregate yourself on a separate item in order to satisfy this business logic.

Let's use one of the examples above. Imagine that organizations sign up for our SaaS product, and they are limited in the number of users based on the plan they chose. When creating a new user, we want to ensure they haven't exceeded their limit.

To handle this, we could include two properties -- ActiveUsers and AllowedUsers -- on the parent organization item. When an organization signs up or changes their plan, we would update the AllowedUsers property to the allowed value for their plan.

Our table might look something like this:

SaaS table with organizations and users

For the subset of data in this table, we can see two organizations, BigCorp and TinyInc, as well as some user items. Notice that both BigCorp and TinyInc have ActiveUsers and AllowedUsers aggregates that indicate the current values.

When the organization tries to invite a new user, we could wrap it in a DynamoDB Transaction to increment the ActiveUsers count and ensure they are not exceeding their AllowedUsers count.

The Transaction might look as follows:

const response = await client.transactWriteItems({
    TransactItems=[
        {
            'Update': {
                'TableName': 'SaaSTable',
                'Key': {
                    'PK': { 'S': 'ORG#BigCorp' },
                    'SK': { 'S': 'ORG#BigCorp' }
                },
                'ConditionExpression': '#active < #allowed',
                'UpdateExpression': 'SET #active = #active + :inc'
                'ExpressionAttributeNames': {
                  '#active': 'ActiveUsers',
                  '#allowed': 'AllowedUsers'
                },
                'ExpressionAttributeValues': {
                  ':inc': { 'N': '1' }
                }
            }
        },
        {
            'Put': {
                'TableName': 'SaaSTable',
                'Item': {
                    'PK': { 'S': 'ORG#BigCorp' },
                    'SK': { 'S': 'USER#newnick' },
                    ... additional attributes ...
                },
                'ConditionExpression': 'attribute_not_exists(PK)
            }
        }
    ]
}).promise()

Notice that the first request has a ConditionExpression to confirm that ActiveUsers is less than AllowedUsers. If that passes, then it updates the ActiveUsers count to increment it by one.

This is in line with any aggregates with DynamoDB -- you need to maintain the aggregate yourself. This might feel burdensome, particularly if you're used to aggregates in a relational database. However, it is in line with DynamoDB's philosophy of consistent performance and making scaling decisions explicit. As your data grows, aggregates will be slower and slower.

Conclusion

Condition Expressions are a powerful part of DynamoDB, but they can be tricky if you don't have a solid mental model of how they work.

In this post, we saw what DynamoDB Condition Expressions are and why you would use them. Then, we built a model to understand how they work and why. We saw how this fits in with DynamoDB's philosophy to provide consistent performance at any scale. Finally, we looked at some common examples of using Condition Expressions in your application.

If you have questions or comments on this piece, feel free to leave a note below or email me directly.

What are DynamoDB Condition Expressions?​

How to think about Condition Expressions​

Condition Expressions in action​

Common patterns and mistakes​

1. Confirming existence or non-existence of an item​

2. Enforcing multiple unique attributes​

3. Enforcing business rules​

4. Enforcing rules based on aggregates​

Conclusion​