Go 1.3 Garbage collector not releasing server memory back to system
We wrote the simplest possible TCP server (with minor logging) to examine the memory footprint (see tcp-server.go below)
The server simply accepts connections and does nothing. It is being run on an Ubuntu 12.04.4 LTS server (kernel 3.2.0-61-generic) with Go version go1.3 linux/amd64.
The attached benchmarking program (pulse.go) creates, in this example, 10k connections, disconnects them after 30 seconds, repeats this cycle three times, and then continuously repeats small pulses of 1k connections/disconnections. The command used to test was ./pulse -big=10000 -bs=30.
The first attached graph is obtained by recording runtime.ReadMemStats when the number of clients has changed by a multiple of 500, and the second graph is the RES memory size seen by “top” for the server process.
The server starts with a negligible 1.6KB of memory. Then the memory is set by the “big” pulses of 10k connections at ~60MB (as seen by top), or at about 16MB “SystemMemory” as seen by ReadMemStats. As expected, when the 10K pulses end, the in-use memory drops, and eventually the program starts releasing memory back to OS as evidenced by the grey “Released Memory” line.
The problem is that the System Memory (and correspondingly, the RES memory seen by “top”) never drops significantly (although it drops a little as seen in the second graph).
We would expect that after the 10K pulses end, memory would continue to be released until the RES size is the minimum needed for handling each 1k pulse (which is 8m RES as seen by “top” and 2MB in-use reported by runtime.ReadMemStats). Instead, the RES stays at about 56MB and in-use never drops from its highest value of 60MB at all.
We want to ensure scalability for irregular traffic with occasional spikes as well as be able to run multiple servers on the same box that have spikes at different times. Is there a way to effectively ensure that as much memory is released back to the system as possible in a reasonable time frame?
Code https://gist.github.com/eugene-bulkin/e8d690b4db144f468bc5 :
server.go:
package main
import (
"net"
"log"
"runtime"
"sync"
)
var m sync.Mutex
var num_clients = 0
var cycle = 0
func printMem() {
var ms runtime.MemStats
runtime.ReadMemStats(&ms)
log.Printf("Cycle #%3d: %5d clients | System: %8d Inuse: %8d Released: %8d Objects: %6d\n", cycle, num_clients, ms.HeapSys, ms.HeapInuse, ms.HeapReleased, ms.HeapObjects)
}
func handleConnection(conn net.Conn) {
//log.Println("Accepted connection:", conn.RemoteAddr())
m.Lock()
num_clients++
if num_clients % 500 == 0 {
printMem()
}
m.Unlock()
buffer := make([]byte, 256)
for {
_, err := conn.Read(buffer)
if err != nil {
//log.Println("Lost connection:", conn.RemoteAddr())
err := conn.Close()
if err != nil {
log.Println("Connection close error:", err)
}
m.Lock()
num_clients--
if num_clients % 500 == 0 {
printMem()
}
if num_clients == 0 {
cycle++
}
m.Unlock()
break
}
}
}
func main() {
printMem()
cycle++
listener, err := net.Listen("tcp", ":3033")
if err != nil {
log.Fatal("Could not listen.")
}
for {
conn, err := listener.Accept()
if err != nil {
log.Println("Could not listen to client:", err)
continue
}
go handleConnection(conn)
}
}
pulse.go:
package main
import (
"flag"
"net"
"sync"
"log"
"time"
)
var (
numBig = flag.Int("big", 4000, "Number of connections in big pulse")
bigIters = flag.Int("i", 3, "Number of iterations of big pulse")
bigSep = flag.Int("bs", 5, "Number of seconds between big pulses")
numSmall = flag.Int("small", 1000, "Number of connections in small pulse")
smallSep = flag.Int("ss", 20, "Number of seconds between small pulses")
linger = flag.Int("l", 4, "How long connections should linger before being disconnected")
)
var m sync.Mutex
var active_conns = 0
var connections = make(map[net.Conn] bool)
func pulse(n int, linger int) {
var wg sync.WaitGroup
log.Printf("Connecting %d client(s)...\n", n)
for i := 0; i < n; i++ {
wg.Add(1)
go func() {
m.Lock()
defer m.Unlock()
defer wg.Done()
active_conns++
conn, err := net.Dial("tcp", ":3033")
if err != nil {
log.Panicln("Unable to connect: ", err)
return
}
connections[conn] = true
}()
}
wg.Wait()
if len(connections) != n {
log.Fatalf("Unable to connect all %d client(s).\n", n)
}
log.Printf("Connected %d client(s).\n", n)
time.Sleep(time.Duration(linger) * time.Second)
for conn := range connections {
active_conns--
err := conn.Close()
if err != nil {
log.Panicln("Unable to close connection:", err)
conn = nil
continue
}
delete(connections, conn)
conn = nil
}
if len(connections) > 0 {
log.Fatalf("Unable to disconnect all %d client(s) [%d remain].\n", n, len(connections))
}
log.Printf("Disconnected %d client(s).\n", n)
}
func main() {
flag.Parse()
for i := 0; i < *bigIters; i++ {
pulse(*numBig, *linger)
time.Sleep(time.Duration(*bigSep) * time.Second)
}
for {
pulse(*numSmall, *linger)
time.Sleep(time.Duration(*smallSep) * time.Second)
}
}
First, note that Go, itself, doesn't always shrink its own memory space:
https://groups.google.com/forum/#!topic/Golang-Nuts/vfmd6zaRQVs
The heap is freed, you can check this using runtime.ReadMemStats(), but the processes virtual address space does not shrink -- ie, your program will not return memory to the operating system. On Unix based platforms we use a system call to tell the operating system that it can reclaim unused parts of the heap, this facility is not available on Windows platforms.
But you're not on Windows, right?
Well, this thread is less definitive, but it says:
https://groups.google.com/forum/#!topic/golang-nuts/MC2hWpuT7Xc
As I understand, memory is returned to the OS about 5 minutes after is has been marked as free by the GC. And the GC runs every two minutes top, if not triggered by an increase in memory use. So worst-case would be 7 minutes to be freed.
In this case, I think that the slice is not marked as freed, but in use, so it would never be returned to the OS.
It's possible you weren't waiting long enough for the GC sweep followed by the OS return sweep, which could be up to 7 minutes after the final "big" pulse. You can explicitly force this with runtime.FreeOSMemory
, but keep in mind that it won't do anything unless the GC has been run.
(Edit: Note that you can force garbage collection with runtime.GC()
though obviously you need to be careful how often you use it; you may be able to sync it with sudden downward spikes in connections).
As a slight aside, I can't find an explicit source for this (other than the second thread I posted where someone mentions the same thing), but I recall it being mentioned several times that not all of the memory Go uses is "real" memory. If it's allocated by the runtime but not actually in use by the program, the OS actually has use of the memory regardless of what top
or MemStats
says, so the amount of memory the program is "really" using is often very overreported.
Edit: As Kostix notex in the comments and supports JimB's answer, this question was crossposted on Golang-nuts and we got a rather definitive answer from Dmitri Vyukov:
https://groups.google.com/forum/#!topic/golang-nuts/0WSOKnHGBZE/discussion
I don't there is a solution today. Most of the memory seems to be occupied by goroutine stacks, and we don't release that memory to OS. It will be somewhat better in the next release.
So what I outlines only applies to heap variables, memory on a Goroutine stack will never be released. How exactly this interacts with my last "not all shown allocated system memory is 'real memory'" point remains to be seen.
As LinearZoetrope said, you should wait at least 7 minutes to check how much memory is freed. Sometimes it needs two GC passes, so it would be 9 minutes.
If that is not working, or it is too much time, you can add a periodic call to FreeOSMemory (no need to call runtime.GC() before, it is done by debug.FreeOSMemory() )
Something like this: http://play.golang.org/p/mP7_sMpX4F
package main
import (
"runtime/debug"
"time"
)
func main() {
go periodicFree(1 * time.Minute)
// Your program goes here
}
func periodicFree(d time.Duration) {
tick := time.Tick(d)
for _ = range tick {
debug.FreeOSMemory()
}
}
Take into account that every call to FreeOSMemory will take some time (not much) and it can be partly run in parallel if GOMAXPROCS>1
since Go1.3.