Get Image size WITHOUT loading image into memory
If you don't care about the image contents, PIL is probably an overkill.
I suggest parsing the output of the python magic module:
>>> t = magic.from_file('teste.png')
>>> t
'PNG image data, 782 x 602, 8-bit/color RGBA, non-interlaced'
>>> re.search('(\d+) x (\d+)', t).groups()
('782', '602')
This is a wrapper around libmagic which read as few bytes as possible in order to identify a file type signature.
Relevant version of script:
https://raw.githubusercontent.com/scardine/image_size/master/get_image_size.py
[update]
Hmmm, unfortunately, when applied to jpegs, the above gives "'JPEG image data, EXIF standard 2.21'". No image size! – Alex Flint
Seems like jpegs are magic-resistant. :-)
I can see why: in order to get the image dimensions for JPEG files, you may have to read more bytes than libmagic likes to read.
Rolled up my sleeves and came with this very untested snippet (get it from GitHub) that requires no third-party modules.
#-------------------------------------------------------------------------------
# Name: get_image_size
# Purpose: extract image dimensions given a file path using just
# core modules
#
# Author: Paulo Scardine (based on code from Emmanuel VAÏSSE)
#
# Created: 26/09/2013
# Copyright: (c) Paulo Scardine 2013
# Licence: MIT
#-------------------------------------------------------------------------------
#!/usr/bin/env python
import os
import struct
class UnknownImageFormat(Exception):
pass
def get_image_size(file_path):
"""
Return (width, height) for a given img file content - no external
dependencies except the os and struct modules from core
"""
size = os.path.getsize(file_path)
with open(file_path) as input:
height = -1
width = -1
data = input.read(25)
if (size >= 10) and data[:6] in ('GIF87a', 'GIF89a'):
# GIFs
w, h = struct.unpack("<HH", data[6:10])
width = int(w)
height = int(h)
elif ((size >= 24) and data.startswith('\211PNG\r\n\032\n')
and (data[12:16] == 'IHDR')):
# PNGs
w, h = struct.unpack(">LL", data[16:24])
width = int(w)
height = int(h)
elif (size >= 16) and data.startswith('\211PNG\r\n\032\n'):
# older PNGs?
w, h = struct.unpack(">LL", data[8:16])
width = int(w)
height = int(h)
elif (size >= 2) and data.startswith('\377\330'):
# JPEG
msg = " raised while trying to decode as JPEG."
input.seek(0)
input.read(2)
b = input.read(1)
try:
while (b and ord(b) != 0xDA):
while (ord(b) != 0xFF): b = input.read(1)
while (ord(b) == 0xFF): b = input.read(1)
if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
input.read(3)
h, w = struct.unpack(">HH", input.read(4))
break
else:
input.read(int(struct.unpack(">H", input.read(2))[0])-2)
b = input.read(1)
width = int(w)
height = int(h)
except struct.error:
raise UnknownImageFormat("StructError" + msg)
except ValueError:
raise UnknownImageFormat("ValueError" + msg)
except Exception as e:
raise UnknownImageFormat(e.__class__.__name__ + msg)
else:
raise UnknownImageFormat(
"Sorry, don't know how to get information from this file."
)
return width, height
[update 2019]
Check out a Rust implementation: https://github.com/scardine/imsz
As the comments allude, PIL does not load the image into memory when calling .open
. Looking at the docs of PIL 1.1.7
, the docstring for .open
says:
def open(fp, mode="r"):
"Open an image file, without loading the raster data"
There are a few file operations in the source like:
...
prefix = fp.read(16)
...
fp.seek(0)
...
but these hardly constitute reading the whole file. In fact .open
simply returns a file object and the filename on success. In addition the docs say:
open(file, mode=”r”)
Opens and identifies the given image file.
This is a lazy operation; this function identifies the file, but the actual image data is not read from the file until you try to process the data (or call the load method).
Digging deeper, we see that .open
calls _open
which is a image-format specific overload. Each of the implementations to _open
can be found in a new file, eg. .jpeg files are in JpegImagePlugin.py
. Let's look at that one in depth.
Here things seem to get a bit tricky, in it there is an infinite loop that gets broken out of when the jpeg marker is found:
while True:
s = s + self.fp.read(1)
i = i16(s)
if i in MARKER:
name, description, handler = MARKER[i]
# print hex(i), name, description
if handler is not None:
handler(self, i)
if i == 0xFFDA: # start of scan
rawmode = self.mode
if self.mode == "CMYK":
rawmode = "CMYK;I" # assume adobe conventions
self.tile = [("jpeg", (0,0) + self.size, 0, (rawmode, ""))]
# self.__offset = self.fp.tell()
break
s = self.fp.read(1)
elif i == 0 or i == 65535:
# padded marker or junk; move on
s = "\xff"
else:
raise SyntaxError("no marker found")
Which looks like it could read the whole file if it was malformed. If it reads the info marker OK however, it should break out early. The function handler
ultimately sets self.size
which are the dimensions of the image.
There is a package on pypi called imagesize
that currently works for me, although it doesn't look like it is very active.
Install:
pip install imagesize
Usage:
import imagesize
width, height = imagesize.get("test.png")
print(width, height)
Homepage: https://github.com/shibukawa/imagesize_py
PyPi: https://pypi.org/project/imagesize/
I often fetch image sizes on the Internet. Of course, you can't download the image and then load it to parse the information. It's too time consuming. My method is to feed chunks to an image container and test whether it can parse the image every time. Stop the loop when I get the information I want.
I extracted the core of my code and modified it to parse local files.
from PIL import ImageFile
ImPar=ImageFile.Parser()
with open(r"D:\testpic\test.jpg", "rb") as f:
ImPar=ImageFile.Parser()
chunk = f.read(2048)
count=2048
while chunk != "":
ImPar.feed(chunk)
if ImPar.image:
break
chunk = f.read(2048)
count+=2048
print(ImPar.image.size)
print(count)
Output:
(2240, 1488)
38912
The actual file size is 1,543,580 bytes and you only read 38,912 bytes to get the image size. Hope this will help.