Understanding CloudFormation Updates: Replacement, Resource Policies, and Stack Policies

AWS CloudFormation is a powerful tool for provisioning resources in AWS. It allows you to describe your desired infrastructure in a configuration file, which can be checked into source control for easy review, automation, and tracking over time. You submit this configuration file to the CloudFormation service, which safely and reliably provisions your infrastructure.

One of the trickier aspects of CloudFormation is around its update mechanisms. Changing a property for an existing resource in your CloudFormation file can have drastic effects on your infrastructure, including replacing an existing piece of infrastructure by deleting the old one and creating a new one.

If the piece of infrastructure in question is a database or critical EC2 instance, this can be a very unwelcome surprise!

Fortunately, the mechanisms for updating resources in CloudFormation are well-defined, and AWS provides a number of ways to protect against an accidental configuration error.

In this post, we will cover:

The three kinds of update behaviors with CloudFormation;
Why you might not want the Replacement update behavior with CloudFormation;
Using Stack Policies to prevent undesirable updates to existing resources;
Using UpdateReplace Policies to manage resources that are replaced.

Let's get started.

The three kinds of update behaviors with CloudFormation

First, let's cover the three types of update behaviors in CloudFormation.

For an update behavior to be relevant, two things need to be true:

You have an existing resource that is managed by a CloudFormation stack; and
You are changing an attribute on the existing resource during an update to your CloudFormation stack.

When this happens, there are three potential behaviors for your existing resource:

Update with no interruption: Your resource is updated in place without any disruption to normal use.
Update with some interruption: Your resource is still the same resource, but there will be some downtime as the update is applied.
Replacement: An entirely new resource is created to replace your existing resource.

These concepts can be difficult to understand at an abstract level, so let's explore it with an example.

Update behavior examples on an EC2 instance

Imagine that we have provisioned an EC2 instance via CloudFormation with the following characteristics:

Using an EBS volume as the root volume;
In the us-east-1b Availability Zone;
With an m5.large Instance Type; and
Without detailed monitoring enabled.

To understand the update behaviors of the EC2 instance, head to the CloudFormation reference page for an EC2 instance.

The CloudFormation reference pages are well-organized for easy lookup. Click on the Properties quick link on the righthand side to see the properties available for each resource.

Protip: CloudFormation resources are always named "AWS::<Resource Category>::<Resource Type>". You can quickly find the CloudFormation reference page for a resource by Googling AWS::<ResourceType>, e.g. AWS::DynamoDB.

In the Properties section, each potential property is shown. It includes a description of the property, whether the property is required, the type of object for the property, and, most relevant for our case, what happens when the property is updated.

Let's walk through an example for each of the update behavior types.

Update without interruption by adding monitoring

First, imagine we want to enable detailed monitoring for our existing EC2 instance.

Looking at the Monitoring property for EC2 instances, we can see an update requires No interruption:

If we update this property and submit our updated stack to CloudFormation, our EC2 instance will continue to work without any issues. AWS will flip the switch for more detailed monitoring, but it won't affect our instance.

Update with interruption by changing the instance type

Second, imagine we want to change the instance size. Our m5.large instance isn't holding up, and we want to kick it up to an m5.4xlarge instance type.

Look at the InstanceType property on the EC2 instance reference page:

CloudFormation: EC2 InstanceType property

We're using an EBS-backed instance. AWS can resize your existing EBS-backed instances without a replacement. However, it does cause some downtime -- AWS will need to spin down your existing instance, spin up a new instance, and attach your EBS volume to the new instance.

This is an example of the Update requires some interruption update behavior in CloudFormation. You still have the same ID for your EC2 instance, but there is some downtime during reconfiguration.

Replace your resource by changing the availability zone

Finally, imagine we want to change the availability zone of our EC2 instance.

Look at the AvailabilityZone property on the EC2 instance reference page:

CloudFormation: EC2 AvailabilityZone property

A particular EC2 instance is fundamentally attached to a specific availability zone. To change the availability zone of the instance is to say you want a different instance altogether.

For this reason, changing the availability zone uses Replacement update behavior in CloudFormation. It will create a new EC2 instance in the new availability zone and destroy your existing instance.

Once the replacement is complete, you will still have an EC2 instance in your stack, but the instance will have a completely new ID.

Why you might not want the Replacement update behavior

In the example above, we showed an example of how each of the three CloudFormation update behaviors work.

With an EC2 instance, and particularly if you have a cattle, not pets approach to your infrastructure, you might not care that your EC2 instance was replaced. Ideally, you can move instances around and recreate them from scratch without any downtime in your infrastructure.

However, there are occasions when the Replacement behavior can be undesirable. This could be for more "pet-like" EC2 instances but more often is for stateful pieces of your infrastructure.

With serverless applications, I've noticed two areas where the Replacement update behavior is most problematic:

With a DynamoDB Table, by trying to update the KeySchema after a table has been created;
With an AWS Cognito User Pool, by trying to update the Schema or AliasAttributes after the user pool has been created.

Both a DynamoDB table and a Cognito User Pool are stateful resources, and an unexpected replacement can be devastating. With a DynamoDB table, it means all the existing items in your table are lost. For a Cognito User Pool, all of your registered users are gone.

There are three ways to help protect against undesirable replacements in CloudFormation. First, you can use normal IAM permissions to prevent certain updates and/or deletes of particular resources. This approach isn't specific to CloudFormation itself and thus is a bit outside the scope of this post.

The second approach is to use a Stack Policy to block updates on certain resources.

Finally, you can use a UpdateReplace Policy to retain the existing resource, even if a new resource is created. This helps you save your existing data to be used in a migration.

We'll explore the latter two approaches in the following sections.

Using Stack Policies to prevent undesirable updates to existing resources

In this section, we'll learn how to use a CloudFormation Stack Policy to prevent updates on certain resources.

A CloudFormation Stack Policy can protect you from accidental replacement of existing resources. CloudFormation syntax can be complex, and it's not uncommon to think you're making an innocent change that deletes your existing DynamoDB table.

By default, none of the resources in your CloudFormation stack are protected from updates. If a user has permissions to update a CloudFormation stack and the resources in that stack, CloudFormation will not block them from destructive updates.

You can limit this behavior by attaching a stack policy to your CloudFormation stack. A stack policy is similar to an IAM policy. It contains a number of statements, like IAM policy statements, that describe the update actions that can be taken on a CloudFormation stack.

If you have an existing CloudFormation stack, you can add a CloudFormation stack policy by using the SetStackPolicy API. Here's an example with the AWS CLI:

aws cloudformation set-stack-policy \
  --stack-name <STACK-NAME> \
  --stack-policy-body {}

If you add an empty stack policy, all updates to existing resources will be prevented. You can then choose to Allow certain updates by adding statements.

One handy feature is you can add a temporary stack policy to be applied only during a particular update. This allows you to override your blanket policy of no updates in specific circumstances while still keeping strong protection on your resources.

To do this, use the StackPolicyDuringUpdateBody parameter when making an UpdateStack call.

For example, with the AWS CLI, your call would be:

aws cloudformation update-stack \
  --stack-name <STACK-NAME> \
  --template-body <YOUR-TEMPLATE> \
  ----stack-policy-during-update-body <YOUR-POLICY>

The AWS documentation includes a number of example stack policy statements to help you out. This includes statements to prevent updates to specific resources, to specific resource types (e.g. DynamoDB tables), or to prevent all Replacement actions.

Using UpdateReplace Policies to manage resources that are replaced.

In the previous section, we saw how to put up guardrails around accidental updates of existing resources. In this section, we'll learn how to keep our existing resource around even in the event of a replacement in CloudFormation.

When provisioning a resource in CloudFormation, you can add an UpdateReplacePolicy property on your resource.

For example, you might have a DynamoDB table resource as follows:

MyTable:
  Type: AWS::DynamoDB::Table
  UpdateReplacePolicy: Retain # <--- Look here
  Properties:
    TableName: "MyTable"
    ProvisionedThroughput:
      ReadCapacityUnits: "5"
      WriteCapacityUnits: "5"
    KeySchema:
      - AttributeName: id
        KeyType: HASH
    AttributeDefinitions:
      - AttributeName: id
        AttributeType: S

With an UpdateReplace Policy, you can specify what should happen to an existing resource in the event it is replaced due to an update.

By default, AWS will delete any resources in a stack that are replaced by an update. As noted above, this can be devastating for stateful resources like DynamoDB tables or Cognito User Pools.

By adding an UpdateReplacePolicy of Retain, we're indicating that our DynamoDB table should stick around even after it's replaced.

While this can protect against unexpected replacements, you really should be using the Stack Policy in the previous section for that protection. An UpdateReplace Policy is better for a different purpose -- assisting in migrations.

It's possible you have a stateful resource, like a DynamoDB table or a Cognito User Pool, that needs a fundamental change in its structure. Perhaps you configured your primary keys incorrectly for your DynamoDB table, or you set up the wrong schema for your User Pool.

By retaining the old resource, you can create a new table but still keep the old data around. Once your new resource is available, you can run a migration script to move your old data into a new resource.

Pro-tip: Use the DeletionPolicy on your stateful resources for similar reasons. If a resource or an entire stack is ever removed by accident, setting a Deletion Policy can help make sure you at least retain your data.

Conclusion

In this post, we learned about the three kinds of update behaviors on CloudFormation resources.

We also saw why the Replacement behavior can result in bad outcomes for your infrastructure. AWS provides a few different ways to address these bad outcomes. We reviewed how a Stack Policy can prevent undesired replacements of infrastructure and how UpdateReplace Policies can retain existing data when a replacement is necessary.

The three kinds of update behaviors with CloudFormation​

Update behavior examples on an EC2 instance​

Update without interruption by adding monitoring​

Update with interruption by changing the instance type​

Replace your resource by changing the availability zone​

Why you might not want the Replacement update behavior​

Using Stack Policies to prevent undesirable updates to existing resources​

Using UpdateReplace Policies to manage resources that are replaced.​

Conclusion​