Compare similarity of images using OpenCV with Python


I'm trying to compare a image to a list of other images and return a selection of images (like Google search images) of this list with up to 70% of similarity.

I get this code in this post and change for my context

# Load the images
img =cv2.imread(MEDIA_ROOT + "/uploads/imagerecognize/armchair.jpg")

# Convert them to grayscale
imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# SURF extraction
surf = cv2.FeatureDetector_create("SURF")
surfDescriptorExtractor = cv2.DescriptorExtractor_create("SURF")
kp = surf.detect(imgg)
kp, descritors = surfDescriptorExtractor.compute(imgg,kp)

# Setting up samples and responses for kNN
samples = np.array(descritors)
responses = np.arange(len(kp),dtype = np.float32)

# kNN training
knn = cv2.KNearest()

modelImages = [MEDIA_ROOT + "/uploads/imagerecognize/1.jpg", MEDIA_ROOT + "/uploads/imagerecognize/2.jpg", MEDIA_ROOT + "/uploads/imagerecognize/3.jpg"]

for modelImage in modelImages:

    # Now loading a template image and searching for similar keypoints
    template = cv2.imread(modelImage)
    templateg= cv2.cvtColor(template,cv2.COLOR_BGR2GRAY)
    keys = surf.detect(templateg)

    keys,desc = surfDescriptorExtractor.compute(templateg, keys)

    for h,des in enumerate(desc):
        des = np.array(des,np.float32).reshape((1,128))

        retval, results, neigh_resp, dists = knn.find_nearest(des,1)
        res,dist =  int(results[0][0]),dists[0][0]

        if dist<0.1: # draw matched keypoints in red color
            color = (0,0,255)

        else:  # draw unmatched in blue color
            #print dist
            color = (255,0,0)

        #Draw matched key points on original image
        x,y = kp[res].pt
        center = (int(x),int(y)),center,2,color,-1)

        #Draw matched key points on template image
        x,y = keys[h].pt
        center = (int(x),int(y)),center,2,color,-1)


My question is, how can I compare the image with the list of images and get only the similar images? Is there any method to do this?

5/23/2017 11:48:33 AM

Accepted Answer

I suggest you to take a look to the earth mover's distance (EMD) between the images. This metric gives a feeling on how hard it is to tranform a normalized grayscale image into another, but can be generalized for color images. A very good analysis of this method can be found in the following paper:

It can be done both on the whole image and on the histogram (which is really faster than the whole image method). I'm not sure of which method allow a full image comparision, but for histogram comparision you can use the cv.CalcEMD2 function.

The only problem is that this method does not define a percentage of similarity, but a distance that you can filter on.

I know that this is not a full working algorithm, but is still a base for it, so I hope it helps.


Here is a spoof of how the EMD works in principle. The main idea is having two normalized matrices (two grayscale images divided by their sum), and defining a flux matrix that describe how you move the gray from one pixel to the other from the first image to obtain the second (it can be defined even for non normalized one, but is more difficult).

In mathematical terms the flow matrix is actually a quadridimensional tensor that gives the flow from the point (i,j) of the old image to the point (k,l) of the new one, but if you flatten your images you can transform it to a normal matrix, just a little more hard to read.

This Flow matrix has three constraints: each terms should be positive, the sum of each row should return the same value of the desitnation pixel and the sum of each column should return the value of the starting pixel.

Given this you have to minimize the cost of the transformation, given by the sum of the products of each flow from (i,j) to (k,l) for the distance between (i,j) and (k,l).

It looks a little complicated in words, so here is the test code. The logic is correct, I'm not sure why the scipy solver complains about it (you should look maybe to openOpt or something similar):

#original data, two 2x2 images, normalized
x = rand(2,2)
y = rand(2,2)

#initial guess of the flux matrix
# just the product of the image x as row for the image y as column
#This is a working flux, but is not an optimal one
F = (y.flatten()*x.flatten().reshape((y.size,-1))).flatten()

#distance matrix, based on euclidean distance
row_x,col_x = meshgrid(range(x.shape[0]),range(x.shape[1]))
row_y,col_y = meshgrid(range(y.shape[0]),range(y.shape[1]))
rows = ((row_x.flatten().reshape((row_x.size,-1)) - row_y.flatten().reshape((-1,row_x.size)))**2)
cols = ((col_x.flatten().reshape((row_x.size,-1)) - col_y.flatten().reshape((-1,row_x.size)))**2)
D = np.sqrt(rows+cols)

D = D.flatten()
x = x.flatten()
y = y.flatten()

#cost function
fun = lambda F: sum(F*D)
jac = lambda F: D
#array of constraint
#the constraint of sum one is implicit given the later constraints
cons  = []
#each row and columns should sum to the value of the start and destination array
cons += [ {'type': 'eq', 'fun': lambda F:  sum(F.reshape((x.size,y.size))[i,:])-x[i]}     for i in range(x.size) ]
cons += [ {'type': 'eq', 'fun': lambda F:  sum(F.reshape((x.size,y.size))[:,i])-y[i]} for i in range(y.size) ]
#the values of F should be positive
bnds = (0, None)*F.size

from scipy.optimize import minimize
res = minimize(fun=fun, x0=F, method='SLSQP', jac=jac, bounds=bnds, constraints=cons)

the variable res contains the result of the minimization...but as I said I'm not sure why it complains about a singular matrix.

The only problem with this algorithm is that is not very fast, so it's not possible to do it on demand, but you have to perform it with patience on the creation of the dataset and store somewhere the results

3/29/2013 11:40:54 AM

You are embarking on a massive problem, referred to as "content based image retrieval", or CBIR. It's a massive and active field. There are no finished algorithms or standard approaches yet, although there are a lot of techniques all with varying levels of success.

Even Google image search doesn't do this (yet) - they do text-based image search - e.g., search for text in a page that's like the text you searched for. (And I'm sure they're working on using CBIR; it's the holy grail for a lot of image processing researchers)

If you have a tight deadline or need to get this done and working soon... yikes.

Here's a ton of papers on the topic:

Generally you will need to do a few things:

  1. Extract features (either at local interest points, or globally, or somehow, SIFT, SURF, histograms, etc.)
  2. Cluster / build a model of image distributions

This can involve feature descriptors, image gists, multiple instance learning. etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow