What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary. Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose']) 'duck' >>> most_common(['goose', 'duck', 'duck', 'goose']) 'goose'
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [
itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools import operator def most_common(L): # get an iterable of (item, iterable) pairs SL = sorted((x, i) for i, x in enumerate(L)) # print 'SL:', SL groups = itertools.groupby(SL, key=operator.itemgetter(0)) # auxiliary function to get "quality" for an item def _auxfun(g): item, iterable = g count = 0 min_index = len(L) for _, where in iterable: count += 1 min_index = min(min_index, where) # print 'item %r, count %r, minind %r' % (item, count, min_index) return count, -min_index # pick the highest-count/earliest item return max(groups, key=_auxfun)
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two
print most_common(['goose', 'duck', 'duck', 'goose'])
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)] item 'duck', count 2, minind 1 item 'goose', count 2, minind 0 goose
As you see,
SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via
operator.itemgetter). The auxiliary function, called once per grouping during the
max computation, receives and internally unpacks a group - a tuple with two items
(item, iterable) where the iterable's items are also two-item tuples,
(item, original index) [[the items of
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the
max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L): groups = itertools.groupby(sorted(L)) def _auxfun((item, iterable)): return len(list(iterable)), -L.index(item) return max(groups, key=_auxfun)
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the
L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g def most_common_oneliner(L): return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))
A simpler one-liner:
def most_common(lst): return max(set(lst), key=lst.count)