How do I open files sent to me in a 'document envelope'?

Solution 1:

I got one of those documents myself today.

Since explaining what is wrong to the tech support people seemed likely to take more time than attempting to extract it myself, I created a small python script to extract and decode the pdf document that was embedded in the sig file.

That is, assuming that there is a single attached pdf file and the sig file format is the same as mine.

I hope that someone would find it useful.

import base64
import xml.etree.ElementTree as ET
import sys


def decode(infile, outfile):
    tree = ET.parse(infile)
    xmlns = '{http://www.w3.org/2000/09/xmldsig#}'
    b64 = tree.find("./SignaturePackage/{0}Signature/{0}Object/DocumentContent".format(xmlns)).text
    txt = base64.b64decode(b64)

    with open(outfile, 'bw+') as f:
        f.write(txt)

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print('usage: python unpack.py <input_filename>')
        exit(1)
    infile = sys.argv[1]
    outfile = 'out.pdf'
    decode(infile, outfile)
    print('Done. Result saved to {0}'.format(outfile))

I created a gist for this script.

You need to have python 3.x installed, put the sig file and the python script in the same folder (or provide the file path to the script) and execute it like so:

python unpack.py <sig_filename>

This will create a file named out.pdf in the same folder.

Solution 2:

Here's a rudimentary script you can use on Unix-like systems (and probably on Windows too with a little modification) to extract the PDF file out of the document envelope; I call it sgn2pdf (since the doc envelope file have an sgn extension). Its command-line interface is

sgn2pdf [INPUT_FILENAME] [OUTPUT_FILENAME]

i.e. if you add a first argument it will read from that file rather than from the standard input; and if you add a second argument it will redirect the output into the second file specified.

Source:

#!/bin/bash
#
# Extract a PDF file from an Israeli courts' .sgn PDF document envelope

exec 3<&0 # tie (new) file descriptor 3 to what is currently the standard input
exec 4>&1 # tie (new) file descriptor 4 to what is currently the standard output

if [[ $# > 0 ]]; then
    exec 3<$1 
    shift
fi
if [[ $# > 0 ]]; then
    exec 4>$1
    shift
fi
exec <&3 >&4
sed -r 's/^.*<DocumentContent[^>]*>//; s/<\/Document.*$//;' | base64 -d -i >&4

The base64 decoder is part of the GNU coreutils package and should be available on any Linux distribution.