# Optimized dot product in Python

### Question

The dot product of two n-dimensional vectors `u=[u1,u2,...un]` and `v=[v1,v2,...,vn]` is is given by `u1*v1 + u2*v2 + ... + un*vn`.

A question posted yesterday encouraged me to find the fastest way to compute dot products in Python using only the standard library, no third-party modules or C/Fortran/C++ calls.

I timed four different approaches; so far the fastest seems to be `sum(starmap(mul,izip(v1,v2)))` (where `starmap` and `izip` come from the `itertools` module).

For the code presented below, these are the elapsed times (in seconds, for one million runs):

``````d0: 12.01215
d1: 11.76151
d2: 12.54092
d3: 09.58523
``````

Can you think of a faster way to do this?

``````import timeit # module with timing subroutines
import random # module to generate random numnbers
from itertools import imap,starmap,izip
from operator import mul

def v(N=50,min=-10,max=10):
"""Generates a random vector (in an array) of dimension N; the
values are integers in the range [min,max]."""
out = []
for k in range(N):
out.append(random.randint(min,max))
return out

def check(v1,v2):
if len(v1)!=len(v2):
raise ValueError,"the lenght of both arrays must be the same"
pass

def d0(v1,v2):
"""
d0 is Nominal approach:
"""
check(v1,v2)
out = 0
for k in range(len(v1)):
out += v1[k] * v2[k]
return out

def d1(v1,v2):
"""
d1 uses an imap (from itertools)
"""
check(v1,v2)
return sum(imap(mul,v1,v2))

def d2(v1,v2):
"""
d2 uses a conventional map
"""
check(v1,v2)
return sum(map(mul,v1,v2))

def d3(v1,v2):
"""
d3 uses a starmap (itertools) to apply the mul operator on an izipped (v1,v2)
"""
check(v1,v2)
return sum(starmap(mul,izip(v1,v2)))

# generate the test vectors
v1 = v()
v2 = v()

if __name__ == '__main__':

# Generate two test vectors of dimension N

t0 = timeit.Timer("d0(v1,v2)","from dot_product import d0,v1,v2")
t1 = timeit.Timer("d1(v1,v2)","from dot_product import d1,v1,v2")
t2 = timeit.Timer("d2(v1,v2)","from dot_product import d2,v1,v2")
t3 = timeit.Timer("d3(v1,v2)","from dot_product import d3,v1,v2")

print "d0 elapsed: ", t0.timeit()
print "d1 elapsed: ", t1.timeit()
print "d2 elapsed: ", t2.timeit()
print "d3 elapsed: ", t3.timeit()
``````

Notice that the name of the file must be `dot_product.py` for the script to run; I used Python 2.5.1 on a Mac OS X Version 10.5.8.

EDIT:

I ran the script for N=1000 and these are the results (in seconds, for one million runs):

``````d0: 205.35457
d1: 208.13006
d2: 230.07463
d3: 155.29670
``````

I guess it is safe to assume that, indeed, option three is the fastest and option two the slowest (of the four presented).

1
12
5/23/2017 10:31:18 AM

Just for fun I wrote a "d4" which uses numpy:

``````from numpy import dot
def d4(v1, v2):
check(v1, v2)
return dot(v1, v2)
``````

My results (Python 2.5.1, XP Pro sp3, 2GHz Core2 Duo T7200):

``````d0 elapsed:  12.1977242918
d1 elapsed:  13.885232341
d2 elapsed:  13.7929552499
d3 elapsed:  11.0952246724
``````

d4 elapsed: 56.3278584289 # go numpy!

And, for even more fun, I turned on psyco:

``````d0 elapsed:  0.965477735299
d1 elapsed:  12.5354792299
d2 elapsed:  12.9748163524
d3 elapsed:  9.78255448667
``````

d4 elapsed: 54.4599059378

Based on that, I declare d0 the winner :)

Update

@kaiser.se: I probably should have mentioned that I did convert everything to numpy arrays first:

``````from numpy import array
v3 = [array(vec) for vec in v1]
v4 = [array(vec) for vec in v2]

# then
t4 = timeit.Timer("d4(v3,v4)","from dot_product import d4,v3,v4")
``````

And I included `check(v1, v2)` since it's included in the other tests. Leaving it off would give numpy an unfair advantage (though it looks like it could use one). The array conversion shaved off about a second (much less than I thought it would).

All of my tests were run with N=50.

@nikow: I'm using numpy 1.0.4, which is undoubtedly old, it's certainly possible that they've improved performance over the last year and a half since I've installed it.

Update #2

@kaiser.se Wow, you are totally right. I must have been thinking that these were lists of lists or something (I really have no idea what I was thinking ... +1 for pair programming).

How does this look:

``````v3 = array(v1)
v4 = array(v2)
``````

New results:

``````d4 elapsed:  3.22535741274
``````

With Psyco:

``````d4 elapsed:  2.09182619579
``````

d0 still wins with Psyco, but numpy is probably better overall, especially with larger data sets.

Yesterday I was a bit bothered my slow numpy result, since presumably numpy is used for a lot of computation and has had a lot of optimization. Obviously though, not bothered enough to check my result :)

9
12/2/2009 8:07:58 PM

Here is a comparison with numpy. We compare the fast starmap approach with `numpy.dot`

First, iteration over normal Python lists:

``````\$ python -mtimeit "import numpy as np; r = range(100)" "np.dot(r,r)"
10 loops, best of 3: 316 usec per loop

\$ python -mtimeit "import operator; r = range(100); from itertools import izip, starmap" "sum(starmap(operator.mul, izip(r,r)))"
10000 loops, best of 3: 81.5 usec per loop
``````

Then numpy ndarray:

``````\$ python -mtimeit "import numpy as np; r = np.arange(100)" "np.dot(r,r)"
10 loops, best of 3: 20.2 usec per loop

\$ python -mtimeit "import operator; import numpy as np; r = np.arange(100); from itertools import izip, starmap;" "sum(starmap(operator.mul, izip(r,r)))"
10 loops, best of 3: 405 usec per loop
``````

Seeing this, it seems numpy on numpy arrays is fastest, followed by python functional constructs working with lists.