What is the fastest way to know if a value exists in a list (a list with millions of values in it) and what its index is?
I know that all values in the list are unique as in this example.
The first method I try is (3.8 sec in my real code):
a = [4,2,3,1,5,6] if a.count(7) == 1: b=a.index(7) "Do something with variable b"
The second method I try is (2x faster: 1.9 sec for my real code):
a = [4,2,3,1,5,6] try: b=a.index(7) except ValueError: "Do nothing" else: "Do something with variable b"
Proposed methods from Stack Overflow user (2.74 sec for my real code):
a = [4,2,3,1,5,6] if 7 in a: a.index(7)
In my real code, the first method takes 3.81 sec and the second method takes 1.88 sec. It's a good improvement, but:
I'm a beginner with Python/scripting, and is there a faster way to do the same things and save more processing time?
More specific explication for my application:
In the Blender API I can access a list of particles:
particles = [1, 2, 3, 4, etc.]
From there, I can access a particle's location:
particles[x].location = [x,y,z]
And for each particle I test if a neighbour exists by searching each particle location like so:
if [x+1,y,z] in particles.location "Find the identity of this neighbour particle in x:the particle's index in the array" particles.index([x+1,y,z])
7 in a
Clearest and fastest way to do it.
You can also consider using a
set, but constructing that set from your list may take more time than faster membership testing will save. The only way to be certain is to benchmark well. (this also depends on what operations you require)
As stated by others,
in can be very slow for large lists. Here are some comparisons of the performances for
bisect. Note the time (in second) is in log scale.
Code for testing:
import random import bisect import matplotlib.pyplot as plt import math import time def method_in(a,b,c): start_time = time.time() for i,x in enumerate(a): if x in b: c[i] = 1 return(time.time()-start_time) def method_set_in(a,b,c): start_time = time.time() s = set(b) for i,x in enumerate(a): if x in s: c[i] = 1 return(time.time()-start_time) def method_bisect(a,b,c): start_time = time.time() b.sort() for i,x in enumerate(a): index = bisect.bisect_left(b,x) if index < len(a): if x == b[index]: c[i] = 1 return(time.time()-start_time) def profile(): time_method_in =  time_method_set_in =  time_method_bisect =  Nls = [x for x in range(1000,20000,1000)] for N in Nls: a = [x for x in range(0,N)] random.shuffle(a) b = [x for x in range(0,N)] random.shuffle(b) c = [0 for x in range(0,N)] time_method_in.append(math.log(method_in(a,b,c))) time_method_set_in.append(math.log(method_set_in(a,b,c))) time_method_bisect.append(math.log(method_bisect(a,b,c))) plt.plot(Nls,time_method_in,marker='o',color='r',linestyle='-',label='in') plt.plot(Nls,time_method_set_in,marker='o',color='b',linestyle='-',label='set') plt.plot(Nls,time_method_bisect,marker='o',color='g',linestyle='-',label='bisect') plt.xlabel('list size', fontsize=18) plt.ylabel('log(time)', fontsize=18) plt.legend(loc = 'upper left') plt.show()