Debugging a timeout with ldap auth on apache

I'm trying to debug a timeout problem I have with Apache, for some months now.

The pattern looks like this:

On every first request of a new session (or after some time after the last request) the browser asks instantly for credentials, then sends the request with basic auth. Then the server waits exactly 1 minute before sending the result.

enter image description here

Subsequent requests are answered instantly, this only happens for requests after some time (couldn't pinpoint it exactly yet, between 5 and 15 minutes).

The fact that the wait time is reproducible exactly 60 seconds smells like a timeout to me. If I cancel the request and hit reload I get the requested URL instantly.

Since the password prompt appears instantly as well I can rule out a problem with the SSL handshake between client and server, or DNS problems on that leg. It doesn't matter if I request a PHP script or a blank text file, which rules a problem with scripts on the server out as well. I guess the result of the auth process is then cached for a while, so it isn't necessary for subsequent requests.

Note that the authentication always succeeds, so I can rule out a "the domain controller didn't answer" problem as well.

Apache 2.4 is running on Windows Server 2012 R2. It is configured for LDAP auth:

<Location />
    AuthType Basic
    AuthName "AD Login"
    AuthBasicProvider ldap
    LDAPReferrals Off
    #AuthLDAPUrl ldap://dc01.domain.de:3268/dc=ad,dc=domain,dc=de?sAMAccountName?sub?(objectClass=*)
    #AuthLDAPUrl ldap://ad.domain.de:389/dc=ad,dc=domain,dc=de?sAMAccountName?sub?(objectClass=*) STARTTLS
    AuthLDAPUrl ldap://ad.domain.de:389/dc=ad,dc=domain,dc=de?sAMAccountName?sub?(objectClass=*) TLS
    AuthLDAPBindDN "[email protected]"
    AuthLDAPBindPassword "secret"
    Require valid-user
    Require all denied
</Location>

As you can see I tried different connection types to the domain controllers, it doesn't really seem to matter which encryption method I use, or if I pass on encryption at all.

ad.domain.de resolves to multiple domain controllers, but the behavior is the same if I connect to a specific DC.

No entries in the error log on LogLevel info, I'm yet hesitant to increase it to debug, as I know from experience that I have trouble sifting through the generated debug information.

Is there anything I missed yet that I can use to debug the problem, or do I have to go through debug level logging?


Solution 1:

After increasing the log level for the modules authnz_ldap and ldap the following error message appeared in the error log:

ldap_simple_bind() timed out on reused connection, dropped by firewall?

This led me to this mod_ldap bug report, which, while it turned out to be a configuration error, pointed me to the actual problem:

As reported elsewhere, windows closes an LDAP connection after 900 seconds, but the default Apache behavior appears to try to re-use the connection indefinitely. If Apache tries to re-use after windows has closed the connection, there's a 60 second delay waiting for the connection to timeout [...]

Doing a few quick checks to confirm this:

The default value of MaxConnIdleTime in the Microsoft LDAP Policies is 900 seconds, which matches with my observation that the problem appears again after 15 minutes. The 60 second delay also matches exactly with my problem.

According to that error report the issue should be resolved by setting LDAPConnectionPoolTTL to a value lower than MaxConnIdleTime and other than the default value -1, but that didn't work for me. I had to set the value to 0, disabling the reuse of existing connections.

LDAPConnectionPoolTTL 0

I don't expect any performance issues with this, since the ldap results are cached anyway.

The only thing that remains a mystery is why this issue only occurs on our Apache instance running on Windows, but not with the Apache instances running on Linux.