C++11 std::mutex in Visual Studio 2012 deadlock when locked from DllMain()

I am seeing a deadlock with std::mutex when the mutex is locked from DllMain() Below is a minimal DLL test case that exhibits the problem for me. My actual code does the mutex locking because it uses member functions that are also usable outside initialization during normal function.

I think that the problem is a deadlock between the scheduler as seen in the call stack of main() thread and the other thread (probably) spawned by the scheduler. The deadlock seems to happen before main() is actually executed.

I would appreciate any advice as to how to fix/resolve the deadlock.

Simple DLL:

static void testFunc()
{
    std::mutex mtx;
    mtx.lock();

    mtx.unlock();
}


BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
        testFunc ();
        break;

    case DLL_THREAD_ATTACH:
        testFunc ();
        break;

    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

At the point of the deadlock there are two threads in the process:

Not Flagged >   6408    0   Main Thread Main Thread msvcr110d.dll!Concurrency::details::SchedulerBase::SchedulerBase    Normal
Not Flagged     7600    0   Worker Thread   ntdll.dll!_TppWaiterpThread@4() ntdll.dll!_NtDelayExecution@8   Normal

Here is the call stack of main() thread:

    ntdll.dll!_NtWaitForKeyedEvent@16() Unknown
    ntdll.dll!_TppWaitpSet@16() Unknown
    ntdll.dll!_TppSetWaitInterrupt@12() Unknown
    ntdll.dll!_RtlRegisterWait@24() Unknown
    kernel32.dll!_RegisterWaitForSingleObject@24()  Unknown
>   msvcr110d.dll!Concurrency::details::SchedulerBase::SchedulerBase(const Concurrency::SchedulerPolicy & policy) Line 152  C++
    msvcr110d.dll!Concurrency::details::ThreadScheduler::ThreadScheduler(const Concurrency::SchedulerPolicy & policy) Line 26   C++
    msvcr110d.dll!Concurrency::details::ThreadScheduler::Create(const Concurrency::SchedulerPolicy & policy) Line 34    C++
    msvcr110d.dll!Concurrency::details::SchedulerBase::CreateWithoutInitializing(const Concurrency::SchedulerPolicy & policy) Line 276  C++
    msvcr110d.dll!Concurrency::details::SchedulerBase::GetDefaultScheduler() Line 650   C++
    msvcr110d.dll!Concurrency::details::SchedulerBase::CreateContextFromDefaultScheduler() Line 567 C++
    msvcr110d.dll!Concurrency::details::SchedulerBase::CurrentContext() Line 399    C++
    msvcr110d.dll!Concurrency::details::LockQueueNode::LockQueueNode(unsigned int timeout) Line 616 C++
    msvcr110d.dll!Concurrency::critical_section::lock() Line 1017   C++
    msvcp110d.dll!mtx_do_lock(_Mtx_internal_imp_t * * mtx, const xtime * target) Line 65    C++
    msvcp110d.dll!_Mtx_lock(_Mtx_internal_imp_t * * mtx) Line 144   C++
    ConsoleApplicationDll.dll!std::_Mtx_lockX(_Mtx_internal_imp_t * * _Mtx) Line 68 C++
    ConsoleApplicationDll.dll!std::_Mutex_base::lock() Line 43  C++
    ConsoleApplicationDll.dll!testFunc() Line 16    C++
    ConsoleApplicationDll.dll!DllMain(HINSTANCE__ * hModule, unsigned long ul_reason_for_call, void * lpReserved) Line 29   C++
    ConsoleApplicationDll.dll!__DllMainCRTStartup(void * hDllHandle, unsigned long dwReason, void * lpreserved) Line 508    C
    ConsoleApplicationDll.dll!_DllMainCRTStartup(void * hDllHandle, unsigned long dwReason, void * lpreserved) Line 472 C
    ntdll.dll!_LdrpCallInitRoutine@16() Unknown
    ntdll.dll!_LdrpRunInitializeRoutines@4()    Unknown
    ntdll.dll!_LdrpInitializeProcess@8()    Unknown
    ntdll.dll!__LdrpInitialize@8()  Unknown
    ntdll.dll!_LdrInitializeThunk@8()   Unknown

The second thread's call stack is short:

>   ntdll.dll!_NtDelayExecution@8() Unknown
    ntdll.dll!__LdrpInitialize@8()  Unknown
    ntdll.dll!_LdrInitializeThunk@8()   Unknown

EDIT 1:

WinDbg confirms that it is loader lock issue:

PRIMARY_PROBLEM_CLASS:  APPLICATION_HANG_HungIn_LoaderLock

Check the Best Practices for Creating DLLs document:

You should never perform the following tasks from within DllMain:

  • Call LoadLibrary or LoadLibraryEx (either directly or indirectly). This can cause a deadlock or a crash.
  • Synchronize with other threads. This can cause a deadlock.

It seems that using QueueUserAPC() to queue initialization is always executed before main() but out of the dreaded loader lock. This looks like a solution to my problem.

EDIT 1

After some testing it seems that the APC method works if I queue the APC from DllMain() but it does not work if I queue the APC from a ctor of a static global instance of a class. IOW, using the APC is not uniformly usable across all possible combinations of compilers and build modes for me.