Get files names inside a zip file on FTP server without downloading whole archive
You can implement a file-like object that reads data from FTP, instead of a local file. And pass that to ZipFile
constructor, instead of a (local) file name.
A trivial implementation can be like:
from ftplib import FTP
from ssl import SSLSocket
class FtpFile:
def __init__(self, ftp, name):
self.ftp = ftp
self.name = name
self.size = ftp.size(name)
self.pos = 0
def seek(self, offset, whence):
if whence == 0:
self.pos = offset
if whence == 1:
self.pos += offset
if whence == 2:
self.pos = self.size + offset
def tell(self):
return self.pos
def read(self, size = None):
if size == None:
size = self.size - self.pos
data = B""
# Based on FTP.retrbinary
# (but allows stopping after certain number of bytes read)
# An alternative implementation is at
# https://stackoverflow.com/q/58819210/850848#58819362
ftp.voidcmd('TYPE I')
cmd = "RETR {}".format(self.name)
conn = ftp.transfercmd(cmd, self.pos)
try:
while len(data) < size:
buf = conn.recv(min(size - len(data), 8192))
if not buf:
break
data += buf
# shutdown ssl layer (can be removed if not using TLS/SSL)
if SSLSocket is not None and isinstance(conn, SSLSocket):
conn.unwrap()
finally:
conn.close()
try:
ftp.voidresp()
except:
pass
self.pos += len(data)
return data
And then you can use it like:
ftp = FTP(host, user, passwd)
ftp.cwd(path)
ftpfile = FtpFile(ftp, "archive.zip")
zip = zipfile.ZipFile(ftpfile)
print(zip.namelist())
The above implementation is rather trivial and inefficient. It starts numerous (three at minimum) downloads of small chunks of data to retrieve a list of contained files. It can be optimized by reading and caching larger chunks. But it should give your the idea.
Particularly you can make use of the fact that you are going to read a listing only. The listing is located at the and of a ZIP archive. So you can just download last (about) 10 KB worth of data at the start. And you will be able to fulfill all read
calls out of that cache.
Knowing that, you can actually do a small hack. As the listing is at the end of the archive, you can actually download the end of the archive only. While the downloaded ZIP will be broken, it still can be listed. This way, you won't need the FtpFile
class. You can even download the listing to memory (StringIO
).
zipstring = StringIO()
name = "archive.zip"
size = ftp.size(name)
ftp.retrbinary("RETR " + name, zipstring.write, rest = size - 10*2024)
zip = zipfile.ZipFile(zipstring)
print(zip.namelist())
If you get BadZipfile
exception because the 10 KB is too small to contain whole listing, you can retry the code with a larger chunk.