Best way to extract strings from binary data in golang using unsafe

I have an application which loads a byte array of several gigabytes. I dont have control of the binary format. The program spends most of its time converting sections of the array into strings, doing string manipulation and then releasing all of the strings. It occasionally runs out of memory when there are large numbers of clients triggering large numbers of objects being allocated in memory.

Given that the byte array lives in memory for the entire life of he Application it seems like an ideal candidate for using the unsafe package to avoid memory allocation.

Just testing this out in the go playground, it appears a "SliceHeader" is needed to generate an actual string. But this means a "SliceHeader" must still be allocated every time a string needs to be returned. (i.e. the "x" variable in this example)

func main() {
    t := []byte{
        65, 66, 67, 68, 69, 70,
        71, 72, 73, 74, 75, 76,
        77, 78, 79, 80, 81, 82,
        83, 84, 85,
    }
    var x [10]reflect.StringHeader

    h := (*reflect.StringHeader)(unsafe.Pointer(&x[0]))
    h.Len = 4
    h.Data = uintptr(unsafe.Pointer(&t[8]))

    fmt.Printf("test %v\n", *(*string)(unsafe.Pointer(&x[0])))

    h = (*reflect.StringHeader)(unsafe.Pointer(&x[1]))
    h.Len = 4
    h.Data = uintptr(unsafe.Pointer(&t[3]))

    fmt.Printf("test %v\n", *(*string)(unsafe.Pointer(&x[1])))
}

I could probably attach an array with a fixed length set of string header objects to each client when they connect to the server (that is re-cycled when new clients connect).

This means that 1. string data would no longer be copied around, and 2. string headers are not being allocated/garbage collected. 3. We know the maximum number of clients per server because they have a fixed/hardcoded amount of stringheaders available when they are pulling out strings.

Am I on track, crazy? Let me know 😀 Thanks.


Solution 1:

Use the following function to convert a byte slice to a string without allocation:

func btos(p []byte) string {
    return *(*string)(unsafe.Pointer(&p))
}

The function takes advantage of the fact that the memory layout for a string header is a prefix of the memory layout for a slice header.

Do not modify the backing array of the slice after calling this function -- that will break the assumption that strings are immutable.

Use the function like this:

t := []byte{
    65, 66, 67, 68, 69, 70,
    71, 72, 73, 74, 75, 76,
    77, 78, 79, 80, 81, 82,
    83, 84, 85,
}
s := btos(t[8:12])
fmt.Printf("test %v\n", s) // prints test IJKL

s = btos(t[3:7])
fmt.Printf("test %v\n", s) // prints test DEFG