decompress ZIP with given encoding

I got ZIP file(s), which contains files, which filenames are in some encoding. Let's say I know encoding of those filenames, but I still dont know how to properly decompress them.

Here is example file, it contains one file "【SSK字幕组】The Vampire Diaries 吸血鬼日记S06E12.ass"

I know used encoding is GB18030 (Chinese)

Question is - how to unpack that file in FreeBSD using unzip or other CLI utility to get proper encoded filename? I tried everything what I could, but result was never good. Please help.

I tried on OSX:

MBP1:test 2ge$ bsdtar xf gb18030.zip
MBP1:test 2ge$ ls
%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12/      gb18030.zip
MBP1:test 2ge$ cd %A1%BESSK%D7%D6Ļ%D7顿The\ Vampire\ Diaries\ %CE%FCѪ%B9%ED%C8ռ%C7S06E12/
MBP1:%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12 2ge$ ls
%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12.ass*
MBP1:%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12 2ge$ find . | iconv -f gb18030 -t utf-8
.
./%A1%BESSK%D7%D6L抬%D7椤縏he Vampire Diaries %CE%FC血%B9%ED%C8占%C7S06E12.ass 
MBP1:%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12 2ge$ convmv -r -f gb18030 -t utf-8 --notest .
Skipping, already UTF-8: ./%A1%BESSK%D7%D6Ļ%D7顿The Vampire Diaries %CE%FCѪ%B9%ED%C8ռ%C7S06E12.ass
Ready!

I tried similar with unzip, but I get similar problem.

Thanks, now trying on FREE BSD, where I am connecting using SSH from OSX (Terminal):

# locale
LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=C

The first thing, I would like to is to proper show Chinese names. I changed

setenv LC_ALL zh_CN.GB18030
setenv LANG zh_CN.GB18030

Then I downloaded file and try to "ls" to see proper characters, but not luck. So I think I have to solve first Chinese locale to verify when I get proper result, actually I can compare it. Can you also help me please with this?

Here's what I do on Ubuntu 16.04 to unzip a zip in any encoding, as long as I know what that encoding is. The same method should work on FreeBSD because it only relies on widely available unzip tool.

I double-check the exact name of the encoding, as to not misspell it: https://www.iana.org/assignments/character-sets/character-sets.xhtml

I simply run

$ unzip -O <encoding> <filename> -d <target_dir>

$ unzip -I <encoding> <filename> -d <target_dir>

choosing between -O or -I according to instructions here:

$ unzip -h
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
  ...
  -O CHARSET  specify a character encoding for DOS, Windows and OS/2 archives
  -I CHARSET  specify a character encoding for UNIX and other archives
  ...

which means that I simply try -O and it should work, because not a lot of people would create a .zip file in Unix...

So, for your specific example:

The exact encoding name is GB18030.

I use the -O flag and:

$ unzip -O GB18030 gb18030.zip -d target_dir
Archive:  gb18030.zip
   creating: target_dir/【SSK字幕组】The Vampire Diaries 吸血鬼日记S06E12/
  inflating: target_dir/【SSK字幕组】The Vampire Diaries 吸血鬼日记S06E12/【SSK字幕组】The Vampire Diaries 吸血鬼日记S06E12.ass

... it works.

Method 1 : use unar utility

sudo apt-get install unar

unar -e gb18030 gb18030.zip

Method 2 : Use a python script to unzip the file (reference https://gist.github.com/usunyu/dfc6e56af6e6caab8018bef4c3f3d452#file-gbk-unzip-py )

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# unzip-gbk.py

import os
import sys
import zipfile
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--encoding", help="encoding for filename, default gbk")
parser.add_argument("-l", help="list filenames in zipfile, do not unzip", action="store_true")
parser.add_argument("file", help="process file.zip")
args = parser.parse_args()
print "Processing File " + args.file

file=zipfile.ZipFile(args.file,"r");
if args.encoding:
    print "Encoding " + args.encoding
for name in file.namelist():
    if args.encoding:
        utf8name=name.decode(args.encoding)
    else:
        utf8name=name.decode('gbk')
    pathname = os.path.dirname(utf8name)
    if args.l:
        print "Filename " + utf8name
    else:
        print "Extracting " + utf8name
        if not os.path.exists(pathname) and pathname!= "":
            os.makedirs(pathname)
        data = file.read(name)
        if not os.path.exists(utf8name):
            fo = open(utf8name, "w")
            fo.write(data)
            fo.close
file.close()

The example gb18030.zip will extract the following file

【SSK字幕组】The Vampire Diaries 吸血鬼日记S06E12
【SSK字幕组】The Vampire Diaries 吸血鬼日记S06E12/【SSK字幕组】The Vampire Diaries 吸血鬼日记S06E12.ass

decompress ZIP with given encoding

Related

Recent Posts