Why is our MonoTouch app breaking in the garbage collector? It is not out of memory
We have a simple question, but the cause is complicated. We are experienced developers, and have done a lot of research into what may be causing it. We are hoping that MonoTouch developers can work with us to identify what appears to be a common problem that people are having and for which no solution appears to exist yet. We've been working on this for over two weeks, and not been able to resolve it.
The question is: Why is our MonoTouch app breaking in the garbage collector? It is not out of memory.
The situation is that we have an app that checks a web service regularly (perhaps every 5 seconds). After a period of time it fails with a memory management abort. This typically happens after about an hour and a half, but can be anywhere from ten minutes to overnight. This happens on all of our test devices (we have 7 in total covering iOS3 and iOS4, iPod Touch, iPhones and iPads (1&2). After looking on StackOverflow, we have added a System.Gc.Collect in a timer before we take any action. This improved things a little (it takes longer to fail), but it did not go away. It is also worth adding that the memory log from the iPad shows that there are 777 free blocks, and 2041 in use by our app, with a total of 26488 wired pages. Since we've garbage collected, and are not doing anything different to what we did 5 seconds before, it seems odd to run out of memory.
We upgraded to MonoTouch 4.0.1 but that has not fixed it.
StackOverflow questions that might be on the same issue, but not answering it: 5666905 / 4545383 / 5492469 / 5426733
The stack at failure on an iPad2 is below. The failure can happen in the main thread or an http thread, but always goes in this GC_ sequence. I have included the code for the memory manager GC_remap below, with discussion.
Thread 10 Crashed: 0 libsystem_kernel.dylib 0x34b4da1c __pthread_kill + 8 1 libsystem_c.dylib 0x3646a3b4 pthread_kill + 52 2 libsystem_c.dylib 0x36462bf8 abort + 72 3 MyApp 0x004ca92c mono_handle_native_sigsegv (mini-exceptions.c:2249) 4 MyApp 0x004f2208 sigabrt_signal_handler (mini-posix.c:195) 5 libsystem_c.dylib 0x36475728 _sigtramp + 36 6 libsystem_c.dylib 0x3646a3b4 pthread_kill + 52 7 libsystem_c.dylib 0x36462bf8 abort + 72 8 MyApp 0x0061dc94 GC_remap (os_dep.c:2092) 9 MyApp 0x00611678 GC_allochblk_nth (allchblk.c:730) 10 MyApp 0x00611028 GC_allochblk (allchblk.c:561) 11 MyApp 0x0061d0e0 GC_new_hblk (new_hblk.c:253) 12 MyApp 0x006133d0 GC_allocobj (alloc.c:1116) 13 MyApp 0x00617d30 GC_generic_malloc_inner (malloc.c:136) 14 MyApp 0x00617f40 GC_generic_malloc (malloc.c:192) 15 MyApp 0x00618264 GC_malloc_atomic (malloc.c:262) 16 MyApp 0x005a46d4 mono_object_allocate_ptrfree (object.c:4221) 17 MyApp 0x005a4aa0 mono_string_new_size (object.c:4848) 18 MyApp 0x005c1b14 ves_icall_System_String_InternalAllocateStr (string-icalls.c:213) 19 MyApp 0x002d34c4 wrapper_managed_to_native_string_InternalAllocateStr_int + 52 20 MyApp 0x002cff5c string_ToLower_System_Globalization_CultureInfo + 56 21 MyApp 0x003e6ac0 System_Net_WebRequest_GetCreator_string + 40 22 MyApp 0x003e694c System_Net_WebRequest_Create_System_Uri + 48 23 MyApp 0x003e68d8 System_Net_WebRequest_Create_string + 64 24 MyApp 0x004489c4 MyApp_Services_Client_GetResponseContent_string + 152 25 MyApp 0x00446288 MyApp_Services_Client_GetCurrentQuestion_long_long + 916 26 MyApp 0x00196fcc MyApp_Iphone_RootViewController_RetrieveCurrentQuestion + 868 27 MyApp 0x002e6368 System_Threading_Thread_StartUnsafe + 168 28 MyApp 0x00306890 wrapper_runtime_invoke_object_runtime_invoke_dynamic_intptr_intptr_intptr_intptr + 192 29 MyApp 0x004b0274 mono_jit_runtime_invoke (mini.c:5746) 30 MyApp 0x0059f924 mono_runtime_invoke (object.c:2756) 31 MyApp 0x005a1350 mono_runtime_delegate_invoke (object.c:3421) 32 MyApp 0x005ca884 start_wrapper_internal (threads.c:788) 33 MyApp 0x005ca924 start_wrapper (threads.c:830) 34 MyApp 0x005ef4b8 thread_start_routine (wthreads.c:285) 35 MyApp 0x0061f1d0 GC_start_routine (pthread_support.c:1468) 36 libsystem_c.dylib 0x3646a30a _pthread_start + 242 37 libsystem_c.dylib 0x3646bbb4 thread_start + 0
This is the GC_remap code that appears to be the point of failure, from https://github.com/mono/mono/blob/master/libgc/os_dep.c
#ifdef NACL { /* NaCl doesn't expose mprotect, but mmap should work fine */ void * mmap_result; mmap_result = mmap(start_addr, len, PROT_READ | PROT_WRITE | OPT_PROT_EXEC, MAP_PRIVATE | MAP_FIXED | OPT_MAP_ANON, zero_fd, 0/* offset */); if (mmap_result != (void *)start_addr) ABORT("mmap as mprotect failed"); /* Fake the return value as if mprotect succeeded. */ result = 0; } #else /* NACL */ result = mprotect(start_addr, len, PROT_READ | PROT_WRITE | OPT_PROT_EXEC); #endif /* NACL */ if (result != 0) { GC_err_printf3( "Mprotect failed at 0x%lx (length %ld) with errno %ld\n", start_addr, len, errno); ABORT("Mprotect remapping failed"); } GC_unmapped_bytes -= len;
It would appear that the ABORT is caused by the mprotect function failing. We have been unable to get the failure code as the problem does not manifest itself on the simulator. The mprotect function appears to just mark the memory as accessible for read/write/execute. How is the memory manager passing parameters that cause it to fail? Could it be passing an incorrect pointer, or an incorrect length? Or are certain areas or boundaries handled differently on iOS?
The code at https://github.com/mono/mono/blob/master/libgc/allchblk.c for GC_allochblk_nth implies that the GC_remap function is only called if the memory block found was valid. (This file doesn't quite match the line numbers of the stack trace, so presumably it is not exactly the same file.)
http://developer.apple.com/library/ios/#documentation/System/Conceptual/ManPages_iPhoneOS/man2/mprotect.2.html says that it might fail with EACCES, EINVAL, ENOTSUP which are 13, 22, & 45 respectively. One of the reports on SO says that they get an error 12 (ENOMEM). I'm not sure what that means, as mprotect shouldn't be allocating memory, and the documentation doesn't say that is valid.
A more generic documentation at http://linux.die.net/man/2/mprotect indicates that ENOMEM can be caused by "Internal kernel structures could not be allocated. Or: addresses in the range [addr, addr+len] are invalid for the address space of the process, or specify one or more pages that are not mapped." How could this be?
We would much appreciate any suggestions on how we might move this forward. We are not doing anything other than C# code, and are not doing anything other than a periodic https read. What can we do to improve debugging (we can't trace anything as the app is killed by iOS). We have tried creating a simpler demonstration, but it does not fail fast enough to be worth using. If a Novell MonoTouch developer wants our source, we can provide it subject to the obvious confidentiality.
Solution 1:
Thanks to your reproduction we have found and corrected a very obscure issue in the garbage collector. It will be included in MonoTouch 4.0.2.