Best way to move messages off DLQ in Amazon SQS?
Solution 1:
Here is a quick hack. This is definitely not the best or recommended option.
- Set the main SQS queue as the DLQ for the actual DLQ with Maximum Receives as 1.
- View the content in DLQ (This will move the messages to the main queue as this is the DLQ for the actual DLQ)
- Remove the setting so that the main queue is no more the DLQ of the actual DLQ
Solution 2:
There are a few scripts out there that do this for you:
- npm / nodejs based: http://github.com/garryyao/replay-aws-dlq
# install
npm install replay-aws-dlq;
# use
npx replay-aws-dlq [source_queue_url] [dest_queue_url]
- go based: https://github.com/mercury2269/sqsmover
# compile: https://github.com/mercury2269/sqsmover#compiling-from-source
# use
sqsmover -s [source_queue_url] -d [dest_queue_url]
Solution 3:
Don't need to move the message because it will come with so many other challenges like duplicate messages, recovery scenarios, lost message, de-duplication check and etc.
Here is the solution which we implemented -
Usually, we use the DLQ for transient errors, not for permanent errors. So took below approach -
-
Read the message from DLQ like a regular queue
Benefits- To avoid duplicate message processing
- Better control on DLQ- Like I put a check, to process only when the regular queue is completely processed.
- Scale up the process based on the message on DLQ
Then follow the same code which regular queue is following.
-
More reliable in case of aborting the job or the process got terminated while processing (e.g. Instance killed or process terminated)
Benefits- Code reusability
- Error handling
- Recovery and message replay
-
Extend the message visibility so that no other thread process them.
Benefit- Avoid processing same record by multiple threads.
-
Delete the message only when either there is a permanent error or successful.
Benefit- Keep processing until we are getting a transient error.