How to handle OpenSSL SSL_ERROR_WANT_READ / WANT_WRITE on non-blocking sockets
The OpenSSL library allows to read from an underlying socket with SSL_read and write to it with SSL_write. These functions maybe return with SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE depending on their ssl protocol needs (for example when renegotiating a connection).
I don't really understand what the API wants me to do with these results.
Imaging a server app that accepts client connections, sets up a new ssl session, makes the underlying socket non-blocking and then adds the filedescriptor to a select/poll/epoll loop.
If a client sends data, the main loop will dispatch this to a ssl_read. What has to be done here if SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE is returned? WANT_READ might be easy, because the next main loop iteration could just lead to another ssl_read. But if the ssl_read return WANT_WRITE, with what parameters should it be called? And why doesn't the library issue the call itself?
If the server wants to send a client some data, it will use ssl_write. Again, what is to be done if WANT_READ or WANT_WRITE are returned? Can the WANT_WRITE be answered by repeating the very same call that just was invoked? And if WANT_READ is returned, should one return to the main loop and let the select/poll/epoll take care of this? But what about the message that should be written in the first place?
Or should the read be done right after the failed write? Then, what protects against reading bytes from the application protocol and then having to deal with it somewhere in the outskirts of the app, when the real parser sits in the mainloop?
Solution 1:
With non-blocking sockets, SSL_WANT_READ
means "wait for the socket to be readable, then call this function again."; conversely, SSL_WANT_WRITE
means "wait for the socket to be writeable, then call this function again.". You can get either SSL_WANT_WRITE
or SSL_WANT_READ
from both an SSL_read()
or SSL_write()
call.
Solution 2:
Did you read the OpenSSL documentation for ssl_read and ssl_get_error yet?
ssl_read:
If the underlying BIO is blocking, SSL_read() will only return, once the read operation has been finished or an error occurred, except when a renegotiation take place, in which case a SSL_ERROR_WANT_READ may occur. This behaviour can be controlled with the SSL_MODE_AUTO_RETRY flag of the SSL_CTX_set_mode(3) call.
If the underlying BIO is non-blocking, SSL_read() will also return when the underlying BIO could not satisfy the needs of SSL_read() to continue the operation. In this case a call to SSL_get_error(3) with the return value of SSL_read() will yield SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE. As at any time a re-negotiation is possible, a call to SSL_read() can also cause write operations! The calling process then must repeat the call after taking appropriate action to satisfy the needs of SSL_read(). The action depends on the underlying BIO. When using a non-blocking socket, nothing is to be done, but select() can be used to check for the required condition.
ssl_get_error:
SSL_ERROR_WANT_READ, SSL_ERROR_WANT_WRITE
The operation did not complete; the same TLS/SSL I/O function should be called again later. If, by then, the underlying BIO has data available for reading (if the result code is SSL_ERROR_WANT_READ) or allows writing data (SSL_ERROR_WANT_WRITE), then some TLS/SSL protocol progress will take place, i.e. at least part of an TLS/SSL record will be read or written. Note that the retry may again lead to a SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE condition. There is no fixed upper limit for the number of iterations that may be necessary until progress becomes visible at application protocol level.
For socket BIOs (e.g. when SSL_set_fd() was used), select() or poll() on the underlying socket can be used to find out when the TLS/SSL I/O function should be retried.
Caveat: Any TLS/SSL I/O function can lead to either of SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE. In particular, SSL_read() or SSL_peek() may want to write data and SSL_write() may want to read data. This is mainly because TLS/SSL handshakes may occur at any time during the protocol (initiated by either the client or the server); SSL_read(), SSL_peek(), and SSL_write() will handle any pending handshakes.
OpenSSL is implemented as a state machine. SSL_ERROR_WANT_READ
means that more inbound data, and SSL_ERROR_WANT_WRITE
means that more outbound data, is needed in order to make forward progress on the connection. If you get SSL_ERROR_WANT_WRITE
on an ssl_read() operation, you need to send outbound data, or at least wait for the socket to become writable. If you get SSL_ERROR_WANT_READ
on an ssl_write() operation, you need to read inbound data.
You should subscribe to the OpenSSL mailing lists. This question gets asked alot.
Solution 3:
SSL_WANT_READ means that the SSL engine can't currently encrypt for you as it's waiting for more input data (either as part of the initial handshake or as part of a renegotiation), so, once your next read has completed and you've pushed the data that arrived through the SSL engine you can retry your write operation.
Likewise, SSL_WANT_WRITE means that the SSL engine is waiting for you to extract some data from it and send it to the peer.
I wrote about using OpenSSL with non blocking and async sockets back in 2002 for Windows Developer Journal (reprinted here) and although this article is ostensibly aimed at Windows code the principals are the same for other platforms. The article comes with some code that integrates OpenSSL with async sockets on Windows and which deals with the whole SSL_WANT_READ/SSL_WANT_WRITE issue.
Essentially, when you get an SSL_WANT_READ you need to queue outbound data until you've had a read complete and you've passed the new inbound data into the SSL engine, once that has happened you can retry sending your outbound data.