Relationship between JMS connections, sessions, and producers/consumers
Is it OK if some of the messages are duplicated or lost? When the JMS client connects to the JMS broker over the network there are three phases to any API call.
- The API call, including any message data, is transmitted over the wire to the broker.
- The API call is executed by the broker.
- The result code and any message data is transmitted back to the client.
Consider the producer for a minute. If the connection is broken in the first step then the broker never got the message and the app would need to send it again. If the connection is broken in the third step then the message has been successfully sent and sending it again would produce a duplicate message. The app cannot tell the difference between these and so the only safe choice is to resend the message on error. If the session is transacted the message can be safely resent in all cases because if the original had made it to the broker, it will be rolled back.
Consider the consumer. If the connection is lost in the third step then the message is deleted from the queue but never made it back to the client. But if the session is transacted the message will be redelivered when the application reconnects.
Outside of transactions there is the possibility of lost or duplicate messages. Inside of a transaction the same window of ambiguity exists but it is on the COMMIT call rather then the PUT or GET. With transacted sessions it is possible to send or receive a message twice but not to lose one.
The JMS spec recognizes this window of ambiguity and provides the following guidance:
If a failure occurs between the time a client commits its work on a Session and the commit method returns, the client cannot determine if the transaction was committed or rolled back. The same ambiguity exists when a failure occurs between the non-transactional send of a PERSISTENT message and the return from the sending method.
It is up to a JMS application to deal with this ambiguity. In some cases, this may cause a client to produce functionally duplicate messages.
A message that is redelivered due to session recovery is not considered a duplicate message.
JMS sessions should always be transacted except for cases where it really is OK to lose messages. If the sessions are transacted then you'd need session and connection per-thread due to the JMS thread model.
Any advice about performance impacts would be vendor-specific but in general persistent messages outside of syncpoint are hardened to disk before the API call returns. But a transacted call can return before the persistent message is written to disk so long as the message is persisted before the COMMIT returns. If the vendor optimizes based on this, then it is much more performant to write several messages to disk and then commit them in batches. This allows the broker to optimize writes and disk flushes by disk block rather than per-message. The number of messages to put in the transaction decreases with the size of the message and beyond a certain message size dwindles back down to one.
If your 20k messages are relatively small (measured in k and not mb) then you probably want to use transacted sessions per thread and tune the commit interval.
In most scenarios it is sufficient to work with one connection and multiple sessions, using one session per thread. In some environments you can gain additional performance by using multiple connections:
Some messaging systems support a cluster mode, where connections get loadbalanced to different nodes. With multiple connections you can use the performance of multiple nodes in this scenario. (which of course does only help, when the bottleneck is on the side of the message broker).
The best solution would be to us a pool of connections, and give the administrator some options to configure the behaviour in the specific area.