How can I quickly and effectively debug CloudFormation templates?

Solution 1:

Use the aws cloudformation validate-template command in the AWS CLI tool. It only validates whether your template is valid JSON or YAML, not whether your keys and values are correct (for example doesn't check for typos in keys)

Solution 2:

Another option, a year later, is to abstract these templates to a 3rd party library, such as troposphere. That library constructs the JSON payload for you, and does a lot of validation along the way. This also solves the "Wow managing a 1000-line JSON file sure is sad" problem.

Solution 3:

How can I make the template debugging process faster, or am I stuck forever noticing my mistakes half an hour after I make them?

Here are a few best-practice suggestions, focusing specifically on improving the iteration speed of complex CloudFormation-template development:

Use CloudFormation tools to validate templates and stack updates

AWS has already outlined these in its own Best Practices document, so I won't repeat them:

Validate Templates Before Using Them
Create Change Sets Before Updating Your Stacks

The point of this step is to catch obvious syntax or logical errors before actually performing a Stack creation/update.

Test Resources in isolation

Before using any individual CloudFormation Resource in a complex Stack, make sure you thoroughly understand the full extent of that Resource's creation/update/delete behavior, including any limits on usage and typical startup/teardown times, by testing their behavior in smaller, standalone Stacks first.

If you are developing or using any third-party Custom Resources, write unit tests using appropriate libraries for the language platform, to make sure the application logic behaves as expected across all use-cases.
Be aware that the amount of time for an individual Resource to create/update/delete can vary widely between Resource Types, depending on the behavior of the underlying API calls. For example, a complex AWS::CloudFront::Distribution resource can sometimes take 30-60 minutes to create/update/delete, while an AWS::EC2::SecurityGroup updates in seconds.
Individual Resources may have bugs/issues/limitations in their implementation, which are much easier to debug and develop workarounds for when tested in isolation, rather than within a much larger Stack. Keep in mind limitations such as AWS Service Limits depending on your individual AWS Account settings, or Region Availability of services depending on the Region within which you create your Stack.

Build complicated stacks in small increments

When performing a Stack creation/update, a failure in any single Resource will cause the Stack to rollback the entire set of Resource changes, which can unnecessarily destroy other successfully-created Resources and take a very long time when building a complicated stack with a long dependency-graph of associated Resources.

The solution to this is to build your Stack incrementally in smaller Update batches, adding Resources one (or a few) at a time. This way, if/when a failure occurs in a resource creation/update, the rollback doesn't cause your entire Stack's resources to be destroyed, just the set of Resources changed in the latest Update.

Monitor the progress of stack updates

Be sure to Monitor the Progress of your Stack Update by viewing the stack's events while a creation/update is performed. This will be the starting-point for debugging further issues with individual resources.

Solution 4:

Have you looked at the AWS CloudFormation Template Editor that is included in the AWS Toolkit for Eclipse? It has syntax highlighting, statement completion, and deployment to AWS CloudFormation.

Solution 5:

The AWS CloudFormation linter provides additional static analysis beyond aws cloudformation validate-template

It will inform you which resource types and instance types are unavailable in certain regions, validate property values against allowed values, catch circular resource dependencies, syntax errors, template limits, and much more

In addition to the CLI, one of the most popular mechanisms to remember to run the linter is installing an editor plugin like the Visual Studio Code extension which runs on every file save

Other mechanisms like pre-commit Git hooks are described here

Visual Studio Code extension example screenshot