Recover in-memory Pages data from failed hibernation wakeup
My girlfriend's Macbook crashed while attempting to restore from a hibernated file. The progress bar stopped at ~10%, after which we restarted the computer for a normal startup.
This hibernated memory image had an unsaved document open in Pages, which we'd like to recover. There is a sleepimage
in /private/var/vm
, which I assume is the hibernate image which never got correctly restored. We backed up this thing to keep it alive.
We tried to strings sleepimage | grep known_substring
but it returned nothing. grep -a known_substring sleepimage
also did nothing, so I'm assuming that Pages didn't keep the text data in memory as plain text.
Edit: After reading this answer on Binary grep I tried to perl -ln0777e 'print unpack("H*",$1), "\n", pos() while /(null_padded_substring)/g' sleepimage
, again being fruitless. I padded it with nulls in order to attempt a match for UTF-8 text. Then I tried with .*
globs between each character –- still no dice.
So Pages probably doesn't store text by any common encoding in memory. I would need to find a translation rule between ASCII string and Pages data representation -- I'm thinking maybe some kind of Objective C string buffer. To me it seems very weird to store character data as anything else than a sequence of characters, but this seems to be what Pages is doing.
If you have any idea on how to figure out the in-memory representation of text inside Pages, it might be very helpful in solving this problem. Maybe I can dump and read the process memory in some simple way?
Another possible solution is simpler -- I'm assuming it is somehow possible to reboot the computer from this sleepimage
, but I can't find any documentation as to how you would proceed with that. Some other users (macrumors) seem to have encountered this, but for all the forum questions I've found, none of them have responses.
The OS X version is Snow Leopard, 10.6.8.
Complex suggestions involving programming are welcome. I do C and Python.
Thank you.
Update with pictures:
-
that
loobsdpkdbik
identifier mentioned first, isn't one - just happend to be before my text the fist time I tried it. -
part of the text seems to get "lost" (i.e. not saved in one continuous memory stretch) and this may worsen with RAM usage
-
you may not be able to recover meaningful text from the sleepimage
Now my original text (with typo in 1st paragraph, sry Mr. Matisse):
Hidden Gems: MoMa’s Abby Aldrich Rockefeller Sculpture Garden, designed by Philip Johnson in 1953, is a spectacular urban oasis with its reflecting pools and beautiful landscaping. This outdoor gallery is installed with changing displays of outdoor sculpture, including works by Aristide Maillol, Alexander Calder, Henri Maisse, Pablo Picasso, and Richard Serra.
While visiting the new painting and sculpture galleries at MoMa, be sure to traverse the staircase bridging the forth and fifth floors in order to see Henri Matisse’s monumental image of joy and energy, Dance (1909). The painting was originally intended to hang in the stair hall of a Russian palace in Moscow.
And the recovered text:
Hidden Gems: Ma s Abby Aldrich Rockeller Sculpre Gn, desigd by Phip John 1953, is spectacular ursithtseflecting pools autifulandscapg. This outdoor gallery is italled with changing displays of outor sculpre, includg workby Aristide Maillol, Alexander Calder, Henri Maisse, Pabloicasso, anchard Sea.
While ving the new paintg sculpture gallies at Ma, be sure to traver t stase bridging the forth fth flrsn ordeto s Henri Matse s mtal imagof joy and ey, Dan (19). The painting waorinally intded to hg t stair hall of Rsian palace Moscow.
And the screen-shots:
It seems that for an (unsaved) Pages document (almost) all characters in your text are separated by 0x00
in memory - thus
STRING
becomes S.T.R.I.N.G
with .
being 0x00
.
So you either have to search for that; I can recommend 0xED for a graphical front-end...
..or you search for only in one case).loobsdpkdbik
which seems to be (part of) an identifier, which comes 5 bytes before the text (at least
First try, IF known_string WAS stored in plain text (not the case)
I guess you could try using
grep -Ubo --binary-files=text "known_substring" sleepimage
From that, -U parameter specifies search on binary files, -b specifies that the offset in bytes to the matching part should be displayed and, lastly, -o specifies that only the matching part should be printed.
If that works, you would know the offset in bytes to get to that region, but I would not know exactly how to proceed there. Depending on the filetype, you could probably check for the filetype signature near that informed offset and try to isolate only the bytes that do make part of that file. For this, I guess you could either write a C program to do that, or maybe execute hexdump -s known_offset sleepimage
and try getting only the bytes that relate to the file you need.
For instance, suppose I wanted to know something about Chrome:
$ sudo grep -Ubo --binary-files=text -i "chrome" sleepimage
3775011731:chrome
So I know I got an occurrence of chrome at the byte offset 3775011731. Hence I could:
$ sudo hexdump -s 3775011731 sleepimage | head -n 3
e1021b93 09 09 3c 73 74 72 69 6e 67 3e 2e 63 68 72 6f 6d
e1021ba3 65 2e 67 6f 6f 67 6c 65 2e 63 6f 6d 3c 2f 73 74
e1021bb3 72 69 6e 67 3e 0a 09 09 3c 6b 65 79 3e 45 78 70
The tricky part would be to get only the bytes you want. If the filetype has a known header you could maybe subtract the header size in bytes from the hexdump offset, so you get the file "since the beginning". If the filetype has a known "EOF" signature, you could try searching for it too and hence get only the bytes up to that point.
What is your filetype? Do you think that some procedure like this could be used in your case? Note that I have never done this before, and I am basing myself on a lot of "guesses", but I suppose something like this has a little chance of working..
Second try, a slow method for parsing all bytes
The method before does not work because it also searches only for plain text, my bet. For this second text I created a simple C program containing:
#include <stdio.h>
int main () {
printf("assim");
return 0;
}
So I could search for "assim", which would be your known_string, in that text. In order to know what bytes to search for I did:
$ echo -n "assim" | hexdump
0000000 61 73 73 69 6d
0000005
Hence, I must find "61 73 73 69 6d". After compiling that simple C source into the program "tt", I did the following:
hexdump -v -e '/1 "%02X\n"' tt | # format output for hexdump of file tt
pcregrep -M --color -A 3 -B 3 "61\n73\n73\n69\n6D" # get 3 bytes A-fter and 3 bytes B-fore the occurence
Which returned to me:
If you did something like that, I guess you could get your data.. It would be kind of slow to parse 2~8GBs of bytes though...
Note that in this approach you must find the hexes in capital letter (write 6D instead of 6d on the last grep), not in under-case letters, and use \n instead of white-spaces (so you can use -A and -B for the grep). You could use grep -i
so it became case-insensitive, but it would be a little slower. Hence, just use capitals if this is used.
Or, if you want a do-all automated "script":
FILENAME=tt # file to parse looking for string
BEFORE=3 # bytes before occurrence
AFER=3 # bytes after occurrence
KNOWNSTRING="assim" # string to search for
ks_bytes="$(echo -n "$KNOWNSTRING" | hexdump | head -n1 | cut -d " " -f2- | tr '[:lower:]' '[:upper:]' | sed -e 's/ *$//g' -e 's/ /\\n/g')"
hexdump -v -e '/1 "%02X\n"' $FILENAME | pcregrep -M --color -A $AFER -B $BEFORE $ks_bytes