Best Practice for Updating AWS ECS Service Tasks
I'm currently attempting to set up a simple CI that will rebuild my project, create a new docker image, push the new image to an amazon ecr repo, create a new revision of an existing task definition with the latest docker image, update a running service with the new revision of the task definition, and finally stop the existing task running the old revision and start one running the new revision.
Everything is working fine except for starting the new revision of the task.
From a bash script, the final command I call is:
aws ecs update-service --cluster "$CLUSTER" --service "$SERVICE" --task-definition "$TASK_DEFINITION":"$REVISION"
This results in an event error of:
(service rj-api-service) was unable to place a task because no container instance met all of its requirements. The closest matching (container-instance bbbc23d5-1a09-45e7-b344-e68cc408e683) is already using a port required by your task.
And this makes sense because the container I am replacing is exactly sthe same as the new one and will be running on the same port, it just contains the latest version of my application.
I was under the impression the update-service
command would stop the existing task, and start the new one, but it looks like it starts the new one first, and if it succeeds stops the old one.
What is the best practice for handling this? Should I stop the old task first? Should I just delete the service in my script first and recreate the entire service each update?
Currently I only need 1 instance of the task running, but I don't want to box my self in if I need this to be able to auto scale to multiple instances. Any suggestions on the best way to address this?
The message that you are getting is because ECS is trying to do a blue-green deployment. It means that it is trying to allocate your new task revision without stopping the current task to avoid downtime in your service. Once the newest task is ready (steady state), the old one will be finally removed.
The problem with this type of deployment is that you need to have enough free resources in your cluster in order maintain up and running the 2 tasks (old and new one) for a period of time. For example, if you are deploying a task with 2GB of memory and 2 CPUs, your cluster will need to have that amount of free resources in order to update the service with a new task revision.
You have 2 options:
- Scale up your cluster by adding a new EC2 instance so you can have enough free resources and perform the deployment.
- Change your service configuration in order to do not perform a blue-green deployment (allow only 1 task at the same time in your cluster).
In order to perform option number 2 you only need to set the following values:
- Minimum healthy percent: 0
- Maximum percent: 100
Example
Which means that you only want to have 100% of your desired tasks running (and no more!) and you are willing to have a downtime while you deploy a new version (0% of healthy service).
In the example I am assuming that you only want 1 desired task, but the Minimum healthy percent and Maximum percent values will work for any amount of desired tasks you want.
Hope it helps! Let me know if you have any other doubt.
You can start the new revision of tasks with the following steps using a shell script in your build environment.
Store the tasks definition json template in your build environment in a file (for e.g template file is
web-server.json
and task definition family isweb-server
).-
Use the file directory as current directory and execute register task definition(Happens for the first run if not exists)
aws ecs register-task-definition --cli-input-json file://web-server.json
-
Get the running task id(TASK_ID) to a variable in shell script.
TASK_ID=`aws ecs list-tasks --cluster default --desired-status RUNNING --family web-server | egrep "task" | tr "/" " " | tr "[" " " | awk '{print $2}' | sed 's/"$//'`
-
Get the task revision(TASK_REVISION) to variables in shell script.
TASK_REVISION=`aws ecs describe-task-definition --task-definition web-server | egrep "revision" | tr "/" " " | awk '{print $2}' | sed 's/"$//'`
-
Stop the current task running
aws ecs stop-task --cluster default --task ${TASK_ID}
-
Immediately start a new task
aws ecs update-service --cluster default --service web-server --task-definition web-server:${TASK_REVISION} --desired-count 1
As a best practice, you can keep desired-count minimum for 2 tasks( two tasks running inside the service) and do rolling updates(Update one task at a time) using the following script(Extension of above steps for multiple containers) with zero downtime (Make sure you keep sufficient time after first container updates e.g sleep 30 for it to be ready to accept new requests).
cd /<directory-containing-web-server.json>
aws ecs register-task-definition --cli-input-json file://web-server.json
OLD_TASK_ID=`aws ecs list-tasks --cluster default --desired-status RUNNING --family web-server | egrep "task" | tr "/" " " | tr "[" " " | awk '{print $2}' | sed 's/"$//'`
TASK_REVISION=`aws ecs describe-task-definition --task-definition web-server | egrep "revision" | tr "/" " " | awk '{print $2}' | sed 's/"$//'`
aws ecs stop-task --cluster default --task ${OLD_TASK_ID}
OLD_TASK_ID=`aws ecs list-tasks --cluster default --desired-status RUNNING --family web-server | egrep "task" | tr "/" " " | tr "[" " " | awk '{print $2}' | sed 's/"$//'`
aws ecs update-service --cluster default --service web-server --task-definition web-server:${TASK_REVISION} --desired-count 1
sleep 30
aws ecs stop-task --task ${OLD_TASK_ID}
aws ecs update-service --cluster default --service web-server --task-definition web-server:${TASK_REVISION} --desired-count 2
Note: You need to configure the task definition family, desired-count of instances and task definition template accordingly.
Use -> AWS CLI
Get OLD_TASK_ID
aws ecs list-tasks --cluster ${ecsClusterName} --desired-status RUNNING --family ${nameTaskDefinition} | egrep "task/" | sed -E "s/.*task\/(.*)\"/\1/"
Stop TASK
aws ecs stop-task --cluster ${ecsClusterName} --task ${OLD_TASK_ID}
Update ECS Service
aws ecs update-service --cluster ${ecsClusterName} --service ${nameService} --task-definition ${nameTaskDefinition}:${version} --desired-count 1 --force-new-deployment