Recover in-memory Pages data from failed hibernation wakeup

My girlfriend's Macbook crashed while attempting to restore from a hibernated file. The progress bar stopped at ~10%, after which we restarted the computer for a normal startup.

This hibernated memory image had an unsaved document open in Pages, which we'd like to recover. There is a sleepimage in /private/var/vm, which I assume is the hibernate image which never got correctly restored. We backed up this thing to keep it alive.

We tried to strings sleepimage | grep known_substring but it returned nothing. grep -a known_substring sleepimage also did nothing, so I'm assuming that Pages didn't keep the text data in memory as plain text.

Edit: After reading this answer on Binary grep I tried to perl -ln0777e 'print unpack("H*",$1), "\n", pos() while /(null_padded_substring)/g' sleepimage, again being fruitless. I padded it with nulls in order to attempt a match for UTF-8 text. Then I tried with .* globs between each character –- still no dice.

So Pages probably doesn't store text by any common encoding in memory. I would need to find a translation rule between ASCII string and Pages data representation -- I'm thinking maybe some kind of Objective C string buffer. To me it seems very weird to store character data as anything else than a sequence of characters, but this seems to be what Pages is doing.

If you have any idea on how to figure out the in-memory representation of text inside Pages, it might be very helpful in solving this problem. Maybe I can dump and read the process memory in some simple way?

Another possible solution is simpler -- I'm assuming it is somehow possible to reboot the computer from this sleepimage, but I can't find any documentation as to how you would proceed with that. Some other users (macrumors) seem to have encountered this, but for all the forum questions I've found, none of them have responses.

The OS X version is Snow Leopard, 10.6.8.

Complex suggestions involving programming are welcome. I do C and Python.

Thank you.


Update with pictures:

  • that loobsdpkdbik identifier mentioned first, isn't one - just happend to be before my text the fist time I tried it.

  • part of the text seems to get "lost" (i.e. not saved in one continuous memory stretch) and this may worsen with RAM usage

  • you may not be able to recover meaningful text from the sleepimage

Now my original text (with typo in 1st paragraph, sry Mr. Matisse):

Hidden Gems: MoMa’s Abby Aldrich Rockefeller Sculpture Garden, designed by Philip Johnson in 1953, is a spectacular urban oasis with its reflecting pools and beautiful landscaping. This outdoor gallery is installed with changing displays of outdoor sculpture, including works by Aristide Maillol, Alexander Calder, Henri Maisse, Pablo Picasso, and Richard Serra.

While visiting the new painting and sculpture galleries at MoMa, be sure to traverse the staircase bridging the forth and fifth floors in order to see Henri Matisse’s monumental image of joy and energy, Dance (1909). The painting was originally intended to hang in the stair hall of a Russian palace in Moscow.

And the recovered text:

Hidden Gems: Ma s Abby Aldrich Rockeller Sculpre Gn, desigd by Phip John 1953, is spectacular ursithtseflecting pools autifulandscapg. This outdoor gallery is italled with changing displays of outor sculpre, includg workby Aristide Maillol, Alexander Calder, Henri Maisse, Pabloicasso, anchard Sea.

While ving the new paintg sculpture gallies at Ma, be sure to traver t stase bridging the forth fth flrsn ordeto s Henri Matse s mtal imagof joy and ey, Dan (19). The painting waorinally intded to hg t stair hall of Rsian palace Moscow.

And the screen-shots:

Original text in Pages

Recovered text from sleepimage


It seems that for an (unsaved) Pages document (almost) all characters in your text are separated by 0x00 in memory - thus STRING becomes S.T.R.I.N.G with . being 0x00. So you either have to search for that; I can recommend 0xED for a graphical front-end... ..or you search for loobsdpkdbik which seems to be (part of) an identifier, which comes 5 bytes before the text (at least only in one case).


First try, IF known_string WAS stored in plain text (not the case)

I guess you could try using

grep -Ubo --binary-files=text "known_substring" sleepimage 

From that, -U parameter specifies search on binary files, -b specifies that the offset in bytes to the matching part should be displayed and, lastly, -o specifies that only the matching part should be printed.

If that works, you would know the offset in bytes to get to that region, but I would not know exactly how to proceed there. Depending on the filetype, you could probably check for the filetype signature near that informed offset and try to isolate only the bytes that do make part of that file. For this, I guess you could either write a C program to do that, or maybe execute hexdump -s known_offset sleepimage and try getting only the bytes that relate to the file you need.

For instance, suppose I wanted to know something about Chrome:

$ sudo grep -Ubo --binary-files=text -i "chrome" sleepimage
3775011731:chrome

So I know I got an occurrence of chrome at the byte offset 3775011731. Hence I could:

$ sudo hexdump -s 3775011731 sleepimage | head -n 3
e1021b93 09 09 3c 73 74 72 69 6e 67 3e 2e 63 68 72 6f 6d
e1021ba3 65 2e 67 6f 6f 67 6c 65 2e 63 6f 6d 3c 2f 73 74
e1021bb3 72 69 6e 67 3e 0a 09 09 3c 6b 65 79 3e 45 78 70

The tricky part would be to get only the bytes you want. If the filetype has a known header you could maybe subtract the header size in bytes from the hexdump offset, so you get the file "since the beginning". If the filetype has a known "EOF" signature, you could try searching for it too and hence get only the bytes up to that point.

What is your filetype? Do you think that some procedure like this could be used in your case? Note that I have never done this before, and I am basing myself on a lot of "guesses", but I suppose something like this has a little chance of working..

Second try, a slow method for parsing all bytes

The method before does not work because it also searches only for plain text, my bet. For this second text I created a simple C program containing:

#include <stdio.h>

int main () {
  printf("assim");
  return 0;
}

So I could search for "assim", which would be your known_string, in that text. In order to know what bytes to search for I did:

$ echo -n "assim" | hexdump
0000000 61 73 73 69 6d                                 
0000005

Hence, I must find "61 73 73 69 6d". After compiling that simple C source into the program "tt", I did the following:

hexdump -v -e '/1 "%02X\n"' tt | # format output for hexdump of file tt
    pcregrep -M --color -A 3 -B 3 "61\n73\n73\n69\n6D" # get 3 bytes A-fter and 3 bytes B-fore the occurence

Which returned to me:

enter image description here

If you did something like that, I guess you could get your data.. It would be kind of slow to parse 2~8GBs of bytes though...

Note that in this approach you must find the hexes in capital letter (write 6D instead of 6d on the last grep), not in under-case letters, and use \n instead of white-spaces (so you can use -A and -B for the grep). You could use grep -i so it became case-insensitive, but it would be a little slower. Hence, just use capitals if this is used.

Or, if you want a do-all automated "script":

FILENAME=tt # file to parse looking for string
BEFORE=3 # bytes before occurrence
AFER=3 # bytes after occurrence
KNOWNSTRING="assim" # string to search for

ks_bytes="$(echo -n "$KNOWNSTRING" | hexdump | head -n1 | cut -d " " -f2- | tr '[:lower:]' '[:upper:]' | sed -e 's/ *$//g' -e 's/ /\\n/g')"

hexdump -v -e '/1 "%02X\n"' $FILENAME | pcregrep -M --color -A $AFER -B $BEFORE $ks_bytes