Why do SqS messages sometimes remain in-flight on queue

I'm using Amazon SQS queues in a very simple way. Usually, messages are written and immediately visible and read. Occasionally, a message is written, and remains In-Flight(Not Visible) on the queue for several minutes. I can see it from the console. Receive-message-wait time is 0, and Default Visibility is 5 seconds. It will remain that way for several minutes, or until a new message gets written that somehow releases it. A few seconds delay is ok, but more than 60 seconds is not ok.

There a 8 reader threads that are long polling always, so its not that something is not trying to read it, they are.

Edit : To be clear, none of the consumer reads are returning any messages at all and it happens regardless of whether or not the console is open. In this scenario, only one message is involved, and it is just sitting in the queue invisible to the consumers.

Has anyone else seen this behavior and what I can do to improve it?

Here is the sdk for java I am using:

<dependency>
  <groupId>com.amazonaws</groupId>
  <artifactId>aws-java-sdk</artifactId>
  <version>1.5.2</version>
</dependency>     

Here is the code that does the reading (max=10,maxwait=0 startup config):

void read(MessageConsumer consumer) {

  List<Message> messages = read(max, maxWait);

  for (Message message : messages) {
    if (tryConsume(consumer, message)) {
      delete(message.getReceiptHandle());
    }
  }
}

private List<Message> read(int max, int maxWait) {

  AmazonSQS sqs = getClient();
  ReceiveMessageRequest rq = new ReceiveMessageRequest(queueUrl);
  rq.setMaxNumberOfMessages(max);
  rq.setWaitTimeSeconds(maxWait);
  List<Message> messages = sqs.receiveMessage(rq).getMessages();

  if (messages.size() > 0) {
    LOG.info("read {} messages from SQS queue",messages.size());
  }

  return messages;
}

The log line for "read .." never appears when this is happening, and its what causes me to go in with the console and see if the message is there or not, and it is.


It sounds like you are misinterpreting what you are seeing.

Messages "in flight" are not pending delivery, they're messages that have already been delivered but not further acted on by the consumer.

Messages are considered to be in flight if they have been sent to a client but have not yet been deleted or have not yet reached the end of their visibility window.

— https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-available-cloudwatch-metrics.html

When a consumer receives a message, it has to -- at some point -- either delete the message, or send a request to increase the timeout for that message; otherwise the message becomes visible again after the timeout expires. If a consumer fails to do one of these things, the message automatically becomes visible again. The visibility timeout is how long the consumer has before one of these things must be done.

Messages should not be "in flight" without something having already received them -- but that "something" can include the console itself, as you'll note on the pop-up you see when you choose "View/Delete Messages" in the console (unless you already checked the "Don't show this again" checkbox):

Messages displayed in the console will not be available to other applications until the console stops polling for messages.

Messages displayed in the console are "in flight" while the console is observing the queue from the "View/Delete Messages" screen.

The part that does not make obvious sense is messages being in flight "for several minutes" if your default visibility timeout is only 5 seconds and nothing in your code is increasing that timeout... however... that could be explained almost perfectly by your consumers not properly disposing of the message, causing it to timeout and immediately be redelivered, giving the impression that a single instance of the message was remaining in-flight, when in fact, the message is briefly transitioning back to visible, only to be claimed almost immediately by another consumer, taking it back to in-flight again.


It may happen when you send or lock a message and within some seconds you try to get the fresh list of messages. Amazon SQS stores the data into multiple servers and in multiple data centers http://aws.amazon.com/sqs/faqs/#How_reliably_is_my_data_stored_in_Amazon_SQS.

To get rid of these issues you need to wait more so that queue would have more time to give appropriate results.