Python - Find dominant/most common color in an image

I'm looking for a way to find the most dominant color/tone in an image using python. Either the average shade or the most common out of RGB will do. I've looked at the Python Imaging library, and could not find anything relating to what I was looking for in their manual, and also briefly at VTK.

I did however find a PHP script which does what I need, here (login required to download). The script seems to resize the image to 150*150, to bring out the dominant colors. However, after that, I am fairly lost. I did consider writing something that would resize the image to a small size then check every other pixel or so for it's image, though I imagine this would be very inefficient (though implementing this idea as a C python module might be an idea).

However, after all of that, I am still stumped. So I turn to you, SO. Is there an easy, efficient way to find the dominant color in an image.


Solution 1:

Here's code making use of Pillow and Scipy's cluster package.

For simplicity I've hardcoded the filename as "image.jpg". Resizing the image is for speed: if you don't mind the wait, comment out the resize call. When run on this sample image of blue peppers it usually says the dominant colour is #d8c865, which corresponds roughly to the bright yellowish area to the lower left of the two peppers. I say "usually" because the clustering algorithm used has a degree of randomness to it. There are various ways you could change this, but for your purposes it may suit well. (Check out the options on the kmeans2() variant if you need deterministic results.)

from __future__ import print_function
import binascii
import struct
from PIL import Image
import numpy as np
import scipy
import scipy.misc
import scipy.cluster

NUM_CLUSTERS = 5

print('reading image')
im = Image.open('image.jpg')
im = im.resize((150, 150))      # optional, to reduce time
ar = np.asarray(im)
shape = ar.shape
ar = ar.reshape(scipy.product(shape[:2]), shape[2]).astype(float)

print('finding clusters')
codes, dist = scipy.cluster.vq.kmeans(ar, NUM_CLUSTERS)
print('cluster centres:\n', codes)

vecs, dist = scipy.cluster.vq.vq(ar, codes)         # assign codes
counts, bins = scipy.histogram(vecs, len(codes))    # count occurrences

index_max = scipy.argmax(counts)                    # find most frequent
peak = codes[index_max]
colour = binascii.hexlify(bytearray(int(c) for c in peak)).decode('ascii')
print('most frequent is %s (#%s)' % (peak, colour))

Note: when I expand the number of clusters to find from 5 to 10 or 15, it frequently gave results that were greenish or bluish. Given the input image, those are reasonable results too... I can't tell which colour is really dominant in that image either, so I don't fault the algorithm!

Also a small bonus: save the reduced-size image with only the N most-frequent colours:

# bonus: save image using only the N most common colours
import imageio
c = ar.copy()
for i, code in enumerate(codes):
    c[scipy.r_[scipy.where(vecs==i)],:] = code
imageio.imwrite('clusters.png', c.reshape(*shape).astype(np.uint8))
print('saved clustered image')

Solution 2:

Try Color-thief. It is based on Pillow and works awesome.

Installation

pip install colorthief

Usage

from colorthief import ColorThief
color_thief = ColorThief('/path/to/imagefile')
# get the dominant color
dominant_color = color_thief.get_color(quality=1)

It can also find color pallete

palette = color_thief.get_palette(color_count=6)

Solution 3:

Python Imaging Library has method getcolors on Image objects:

im.getcolors() => a list of (count, color) tuples or None

I guess you can still try resizing the image before that and see if it performs any better.

Solution 4:

You can do this in many different ways. And you don't really need scipy and k-means since internally Pillow already does that for you when you either resize the image or reduce the image to a certain pallete.

Solution 1: resize image down to 1 pixel.

def get_dominant_color(pil_img):
    img = pil_img.copy()
    img.convert("RGB")
    img = img.resize((1, 1), resample=0)
    dominant_color = img.getpixel((0, 0))
    return dominant_color

Solution 2: reduce image colors to a pallete

def get_dominant_color(pil_img, palette_size=16):
    # Resize image to speed up processing
    img = pil_img.copy()
    img.thumbnail((100, 100))

    # Reduce colors (uses k-means internally)
    paletted = img.convert('P', palette=Image.ADAPTIVE, colors=palette_size)

    # Find the color that occurs most often
    palette = paletted.getpalette()
    color_counts = sorted(paletted.getcolors(), reverse=True)
    palette_index = color_counts[0][1]
    dominant_color = palette[palette_index*3:palette_index*3+3]

    return dominant_color

Both solutions give similar results. The latter solution gives you probably more accuracy since we keep the aspect ratio when resizing the image. Also you get more control since you can tweak the pallete_size.

Solution 5:

It's not necessary to use k-means to find the dominant color as Peter suggests. This overcomplicates a simple problem. You're also restricting yourself by the amount of clusters you select so basically you need an idea of what you're looking at.

As you mentioned and as suggested by zvone, a quick solution to find the most common/dominant color is by using the Pillow library. We just need to sort the pixels by their count number.

from PIL import Image

    def find_dominant_color(filename):
        #Resizing parameters
        width, height = 150,150
        image = Image.open(filename)
        image = image.resize((width, height),resample = 0)
        #Get colors from image object
        pixels = image.getcolors(width * height)
        #Sort them by count number(first element of tuple)
        sorted_pixels = sorted(pixels, key=lambda t: t[0])
        #Get the most frequent color
        dominant_color = sorted_pixels[-1][1]
        return dominant_color

The only problem is that the method getcolors() returns None when the amount of colors is more than 256. You can deal with it by resizing the original image.

In all, it might not be the most precise solution but it gets the job done.