Convert var_dump of array back to array variable
I have never really thought about this until today, but after searching the web I didn't really find anything. Maybe I wasn't wording it right in the search.
Given an array (of multiple dimensions or not):
$data = array('this' => array('is' => 'the'), 'challenge' => array('for' => array('you')));
When var_dumped:
array(2) { ["this"]=> array(1) { ["is"]=> string(3) "the" } ["challenge"]=> array(1) { ["for"]=> array(1) { [0]=> string(3) "you" } } }
The challenge is this: What is the best optimized method for recompiling the array to a useable array for PHP? Like an undump_var()
function. Whether the data is all on one line as output in a browser or whether it contains the line breaks as output to terminal.
Is it just a matter of regex? Or is there some other way? I am looking for creativity.
UPDATE: Note. I am familiar with serialize and unserialize folks. I am not looking for alternative solutions. This is a code challenge to see if it can be done in an optimized and creative way. So serialize and var_export are not solutions here. Nor are they the best answers.
Solution 1:
var_export
or serialize
is what you're looking for. var_export
will render a PHP parsable array syntax, and serialize
will render a non-human readable but reversible "array to string" conversion...
Edit Alright, for the challenge:
Basically, I convert the output into a serialized string (and then unserialize it). I don't claim this to be perfect, but it appears to work on some pretty complex structures that I've tried...
function unvar_dump($str) {
if (strpos($str, "\n") === false) {
//Add new lines:
$regex = array(
'#(\\[.*?\\]=>)#',
'#(string\\(|int\\(|float\\(|array\\(|NULL|object\\(|})#',
);
$str = preg_replace($regex, "\n\\1", $str);
$str = trim($str);
}
$regex = array(
'#^\\040*NULL\\040*$#m',
'#^\\s*array\\((.*?)\\)\\s*{\\s*$#m',
'#^\\s*string\\((.*?)\\)\\s*(.*?)$#m',
'#^\\s*int\\((.*?)\\)\\s*$#m',
'#^\\s*bool\\(true\\)\\s*$#m',
'#^\\s*bool\\(false\\)\\s*$#m',
'#^\\s*float\\((.*?)\\)\\s*$#m',
'#^\\s*\[(\\d+)\\]\\s*=>\\s*$#m',
'#\\s*?\\r?\\n\\s*#m',
);
$replace = array(
'N',
'a:\\1:{',
's:\\1:\\2',
'i:\\1',
'b:1',
'b:0',
'd:\\1',
'i:\\1',
';'
);
$serialized = preg_replace($regex, $replace, $str);
$func = create_function(
'$match',
'return "s:".strlen($match[1]).":\\"".$match[1]."\\"";'
);
$serialized = preg_replace_callback(
'#\\s*\\["(.*?)"\\]\\s*=>#',
$func,
$serialized
);
$func = create_function(
'$match',
'return "O:".strlen($match[1]).":\\"".$match[1]."\\":".$match[2].":{";'
);
$serialized = preg_replace_callback(
'#object\\((.*?)\\).*?\\((\\d+)\\)\\s*{\\s*;#',
$func,
$serialized
);
$serialized = preg_replace(
array('#};#', '#{;#'),
array('}', '{'),
$serialized
);
return unserialize($serialized);
}
I tested it on a complex structure such as:
array(4) {
["foo"]=>
string(8) "Foo"bar""
[0]=>
int(4)
[5]=>
float(43.2)
["af"]=>
array(3) {
[0]=>
string(3) "123"
[1]=>
object(stdClass)#2 (2) {
["bar"]=>
string(4) "bart"
["foo"]=>
array(1) {
[0]=>
string(2) "re"
}
}
[2]=>
NULL
}
}
Solution 2:
There's no other way than manual parsing depending on the type. I didn't add support for objects, but it's very similar to the arrays one; you just need to do some reflection magic to populate not only public properties and to not trigger the constructor.
EDIT: Added support for objects... Reflection magic...
function unserializeDump($str, &$i = 0) {
$strtok = substr($str, $i);
switch ($type = strtok($strtok, "(")) { // get type, before first parenthesis
case "bool":
return strtok(")") === "true"?(bool) $i += 10:!$i += 11;
case "int":
$int = (int)substr($str, $i + 4);
$i += strlen($int) + 5;
return $int;
case "string":
$i += 11 + ($len = (int)substr($str, $i + 7)) + strlen($len);
return substr($str, $i - $len - 1, $len);
case "float":
return (float)($float = strtok(")")) + !$i += strlen($float) + 7;
case "NULL":
return NULL;
case "array":
$array = array();
$len = (int)substr($str, $i + 6);
$i = strpos($str, "\n", $i) - 1;
for ($entries = 0; $entries < $len; $entries++) {
$i = strpos($str, "\n", $i);
$indent = -1 - (int)$i + $i = strpos($str, "[", $i);
// get key int/string
if ($str[$i + 1] == '"') {
// use longest possible sequence to avoid key and dump structure collisions
$key = substr($str, $i + 2, - 2 - $i + $i = strpos($str, "\"]=>\n ", $i));
} else {
$key = (int)substr($str, $i + 1);
$i += strlen($key);
}
$i += $indent + 5; // jump line
$array[$key] = unserializeDump($str, $i);
}
$i = strpos($str, "}", $i) + 1;
return $array;
case "object":
$reflection = new ReflectionClass(strtok(")"));
$object = $reflection->newInstanceWithoutConstructor();
$len = !strtok("(") + strtok(")");
$i = strpos($str, "\n", $i) - 1;
for ($entries = 0; $entries < $len; $entries++) {
$i = strpos($str, "\n", $i);
$indent = -1 - (int)$i + $i = strpos($str, "[", $i);
// use longest possible sequence to avoid key and dump structure collisions
$key = substr($str, $i + 2, - 2 - $i + $i = min(strpos($str, "\"]=>\n ", $i)?:INF, strpos($str, "\":protected]=>\n ", $i)?:INF, $priv = strpos($str, "\":\"", $i)?:INF));
if ($priv == $i) {
$ref = new ReflectionClass(substr($str, $i + 3, - 3 - $i + $i = strpos($str, "\":private]=>\n ", $i)));
$i += $indent + 13; // jump line
} else {
$i += $indent + ($str[$i+1] == ":"?15:5); // jump line
$ref = $reflection;
}
$prop = $ref->getProperty($key);
$prop->setAccessible(true);
$prop->setValue($object, unserializeDump($str, $i));
}
$i = strpos($str, "}", $i) + 1;
return $object;
}
throw new Exception("Type not recognized...: $type");
}
(Here are a lot of "magic" numbers when incrementing string position counter $i
, mostly just string lengths of the keywords and some parenthesis etc.)