How to properly deobfusacte a Perl script?

Caution: don't blindly run obfuscated perl, especially if there's an eval, backticks, system, open, etc. call somewhere in it and that might not be all too obvious*. De-obfuscating it with Deparse and carefully replacing the evals with print statements is a must until you understand what's going on. Running in a sandbox/with an unprivileged user/in a VM should be considered too.

*s&&$_ⅇ evaluates $_ for intance.


First observation: 034 is octal. It's equal to 28 (dec) or 0x1c (hex), so nothing fishy there.

The $; thing is purely obfuscation, can't find a reason to use that in particular. $p will just be a string A.T.C.G (with . replaced by $;, whatever it is).
So in the regex [$p] matches any of {'A', 'T', 'C', 'G', $;}. Since $; never appears in $d, it's useless there. In turn [$p]{4} matches any sequence of four letters in the above set, as if this had been used (ignoring the useless $;):

while ( $d =~ /([ATCG]{4})/g ) { ... }

If you had to write this yourself, after having removed whitespace, you'd just grab each successive substring of $d of length four (assuming there are no other chars in $d).

Now this part is fun:

foreach $d ( 0 .. 3 ) {
    $c += $a{ substr $1, $d, 1 } * 4**$d;
}
  • $1 holds the current four-letter codepoint. substr $1, $d, 1 returns each successive letter from that codepoint.
  • %a maps A to 00b (binary), T to 01b, C to 10b, and G to 11b.

    A   00
    T   01
    C   10
    G   11
    
  • multiplying by 4**$d will be equivalent to a bitwise left shift of 0, 2, 4 and 6.

So this funny construct allows you to build any 8bit value in the base-four system with ATCG as digits!

i.e. it does the following conversions:

         A A A A
AAAA -> 00000000

         T A A T
TAAT -> 01000001 -> capital A in ascii

         T A A C
CAAT -> 01000010 -> capital B in ascii

CAATTCCTGGCTGTATTTCTTTCTGCCT -> BioGeek

This part:

next if $j++ % 96 >= 16;

makes the above conversion run only for the first 16 "codepoints", skips the next 80, then converts for the next 16, skips the next 80, etc. It essentially just skips parts of the ellipse (junk DNA removal system).


Here's an ugly text to DNA converter that you could use to produce anything to replace the helix (doesn't handle the 80 skip thing):

use strict;
use warnings;
my $in = shift;

my %conv = ( 0 => 'A', 1 => 'T', 2 => 'C', 3 => 'G');

for (my $i=0; $i<length($in); $i++) {
    my $chr = substr($in, $i, 1);
    my $chv = ord($chr);
    my $encoded ="";
    $encoded .= $conv{($chv >> 0) & 0x3};
    $encoded .= $conv{($chv >> 2) & 0x3};
    $encoded .= $conv{($chv >> 4) & 0x3};
    $encoded .= $conv{($chv >> 6) & 0x3};
    print $encoded;
}
print "\n";
$ perl q.pl 'print "BioGeek\n";'
AAGTCAGTTCCTCGCTATGTAACACACACAATTCCTGGCTGTATTTCTTTCTGCCTAGTTCGCTCACAGCGA

Stick in $d that instead of the helix (and remove the skipping part in the decoder).