from list of integers, get number closest to a given value


Question

Given a list of integers, I want to find which number is the closest to a number I give in input:

>>> myList = [4, 1, 88, 44, 3]
>>> myNumber = 5
>>> takeClosest(myList, myNumber)
...
4

Is there any quick way to do this?

1
136
5/7/2017 9:24:26 PM

Accepted Answer

If we are not sure that the list is sorted, we could use the built-in min() function, to find the element which has the minimum distance from the specified number.

>>> min(myList, key=lambda x:abs(x-myNumber))
4

Note that it also works with dicts with int keys, like {1: "a", 2: "b"}. This method takes O(n) time.


If the list is already sorted, or you could pay the price of sorting the array once only, use the bisection method illustrated in @Lauritz's answer which only takes O(log n) time (note however checking if a list is already sorted is O(n) and sorting is O(n log n).)

286
5/23/2017 12:34:39 PM

I'll rename the function take_closest to conform with PEP8 naming conventions.

If you mean quick-to-execute as opposed to quick-to-write, min should not be your weapon of choice, except in one very narrow use case. The min solution needs to examine every number in the list and do a calculation for each number. Using bisect.bisect_left instead is almost always faster.

The "almost" comes from the fact that bisect_left requires the list to be sorted to work. Hopefully, your use case is such that you can sort the list once and then leave it alone. Even if not, as long as you don't need to sort before every time you call take_closest, the bisect module will likely come out on top. If you're in doubt, try both and look at the real-world difference.

from bisect import bisect_left

def take_closest(myList, myNumber):
    """
    Assumes myList is sorted. Returns closest value to myNumber.

    If two numbers are equally close, return the smallest number.
    """
    pos = bisect_left(myList, myNumber)
    if pos == 0:
        return myList[0]
    if pos == len(myList):
        return myList[-1]
    before = myList[pos - 1]
    after = myList[pos]
    if after - myNumber < myNumber - before:
       return after
    else:
       return before

Bisect works by repeatedly halving a list and finding out which half myNumber has to be in by looking at the middle value. This means it has a running time of O(log n) as opposed to the O(n) running time of the highest voted answer. If we compare the two methods and supply both with a sorted myList, these are the results:

$ python -m timeit -s "
from closest import take_closest
from random import randint
a = range(-1000, 1000, 10)" "take_closest(a, randint(-1100, 1100))"

100000 loops, best of 3: 2.22 usec per loop

$ python -m timeit -s "
from closest import with_min
from random import randint
a = range(-1000, 1000, 10)" "with_min(a, randint(-1100, 1100))"

10000 loops, best of 3: 43.9 usec per loop

So in this particular test, bisect is almost 20 times faster. For longer lists, the difference will be greater.

What if we level the playing field by removing the precondition that myList must be sorted? Let's say we sort a copy of the list every time take_closest is called, while leaving the min solution unaltered. Using the 200-item list in the above test, the bisect solution is still the fastest, though only by about 30%.

This is a strange result, considering that the sorting step is O(n log(n))! The only reason min is still losing is that the sorting is done in highly optimalized c code, while min has to plod along calling a lambda function for every item. As myList grows in size, the min solution will eventually be faster. Note that we had to stack everything in its favour for the min solution to win.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon