The dot product of two n-dimensional vectors
v=[v1,v2,...,vn] is is given by
u1*v1 + u2*v2 + ... + un*vn.
A question posted yesterday encouraged me to find the fastest way to compute dot products in Python using only the standard library, no third-party modules or C/Fortran/C++ calls.
I timed four different approaches; so far the fastest seems to be
izip come from the
For the code presented below, these are the elapsed times (in seconds, for one million runs):
d0: 12.01215 d1: 11.76151 d2: 12.54092 d3: 09.58523
Can you think of a faster way to do this?
import timeit # module with timing subroutines import random # module to generate random numnbers from itertools import imap,starmap,izip from operator import mul def v(N=50,min=-10,max=10): """Generates a random vector (in an array) of dimension N; the values are integers in the range [min,max].""" out =  for k in range(N): out.append(random.randint(min,max)) return out def check(v1,v2): if len(v1)!=len(v2): raise ValueError,"the lenght of both arrays must be the same" pass def d0(v1,v2): """ d0 is Nominal approach: multiply/add in a loop """ check(v1,v2) out = 0 for k in range(len(v1)): out += v1[k] * v2[k] return out def d1(v1,v2): """ d1 uses an imap (from itertools) """ check(v1,v2) return sum(imap(mul,v1,v2)) def d2(v1,v2): """ d2 uses a conventional map """ check(v1,v2) return sum(map(mul,v1,v2)) def d3(v1,v2): """ d3 uses a starmap (itertools) to apply the mul operator on an izipped (v1,v2) """ check(v1,v2) return sum(starmap(mul,izip(v1,v2))) # generate the test vectors v1 = v() v2 = v() if __name__ == '__main__': # Generate two test vectors of dimension N t0 = timeit.Timer("d0(v1,v2)","from dot_product import d0,v1,v2") t1 = timeit.Timer("d1(v1,v2)","from dot_product import d1,v1,v2") t2 = timeit.Timer("d2(v1,v2)","from dot_product import d2,v1,v2") t3 = timeit.Timer("d3(v1,v2)","from dot_product import d3,v1,v2") print "d0 elapsed: ", t0.timeit() print "d1 elapsed: ", t1.timeit() print "d2 elapsed: ", t2.timeit() print "d3 elapsed: ", t3.timeit()
Notice that the name of the file must be
dot_product.py for the script to run; I used Python 2.5.1 on a Mac OS X Version 10.5.8.
I ran the script for N=1000 and these are the results (in seconds, for one million runs):
d0: 205.35457 d1: 208.13006 d2: 230.07463 d3: 155.29670
I guess it is safe to assume that, indeed, option three is the fastest and option two the slowest (of the four presented).
Just for fun I wrote a "d4" which uses numpy:
from numpy import dot def d4(v1, v2): check(v1, v2) return dot(v1, v2)
My results (Python 2.5.1, XP Pro sp3, 2GHz Core2 Duo T7200):
d0 elapsed: 12.1977242918 d1 elapsed: 13.885232341 d2 elapsed: 13.7929552499 d3 elapsed: 11.0952246724
d4 elapsed: 56.3278584289 # go numpy!
And, for even more fun, I turned on psyco:
d0 elapsed: 0.965477735299 d1 elapsed: 12.5354792299 d2 elapsed: 12.9748163524 d3 elapsed: 9.78255448667
d4 elapsed: 54.4599059378
Based on that, I declare d0 the winner :)
@kaiser.se: I probably should have mentioned that I did convert everything to numpy arrays first:
from numpy import array v3 = [array(vec) for vec in v1] v4 = [array(vec) for vec in v2] # then t4 = timeit.Timer("d4(v3,v4)","from dot_product import d4,v3,v4")
And I included
check(v1, v2) since it's included in the other tests. Leaving it off would give numpy an unfair advantage (though it looks like it could use one). The array conversion shaved off about a second (much less than I thought it would).
All of my tests were run with N=50.
@nikow: I'm using numpy 1.0.4, which is undoubtedly old, it's certainly possible that they've improved performance over the last year and a half since I've installed it.
@kaiser.se Wow, you are totally right. I must have been thinking that these were lists of lists or something (I really have no idea what I was thinking ... +1 for pair programming).
How does this look:
v3 = array(v1) v4 = array(v2)
d4 elapsed: 3.22535741274
d4 elapsed: 2.09182619579
d0 still wins with Psyco, but numpy is probably better overall, especially with larger data sets.
Yesterday I was a bit bothered my slow numpy result, since presumably numpy is used for a lot of computation and has had a lot of optimization. Obviously though, not bothered enough to check my result :)
Here is a comparison with numpy. We compare the fast starmap approach with
First, iteration over normal Python lists:
$ python -mtimeit "import numpy as np; r = range(100)" "np.dot(r,r)" 10 loops, best of 3: 316 usec per loop $ python -mtimeit "import operator; r = range(100); from itertools import izip, starmap" "sum(starmap(operator.mul, izip(r,r)))" 10000 loops, best of 3: 81.5 usec per loop
Then numpy ndarray:
$ python -mtimeit "import numpy as np; r = np.arange(100)" "np.dot(r,r)" 10 loops, best of 3: 20.2 usec per loop $ python -mtimeit "import operator; import numpy as np; r = np.arange(100); from itertools import izip, starmap;" "sum(starmap(operator.mul, izip(r,r)))" 10 loops, best of 3: 405 usec per loop
Seeing this, it seems numpy on numpy arrays is fastest, followed by python functional constructs working with lists.