Is there a built-in that removes duplicates from list in Python, whilst preserving order? I know that I can use a set to remove duplicates, but that destroys the original order. I also know that I can roll my own like this:
def uniq(input): output =  for x in input: if x not in output: output.append(x) return output
But I'd like to avail myself of a built-in or a more Pythonic idiom if possible.
Here you have some alternatives: http://www.peterbe.com/plog/uniqifiers-benchmark
def f7(seq): seen = set() seen_add = seen.add return [x for x in seq if not (x in seen or seen_add(x))]
seen_add instead of just calling
seen.add? Python is a dynamic language, and resolving
seen.add each iteration is more costly than resolving a local variable.
seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play it safe, it has to check the object each time.
If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/
O(1) insertion, deletion and member-check per operation.
(Small additional note:
seen.add() always returns
None, so the
or above is there only as a way to attempt a set update, and not as an integral part of the logical test.)
As Raymond pointed out, in python 3.5+ where
OrderedDict is implemented in C, the list comprehension approach will be slower than
OrderedDict (unless you actually need the list at the end - and even then, only if the input is very short). So the best solution for 3.5+ is
Important Edit 2015
As @abarnert notes, the
more_itertools library (
pip install more_itertools) contains a
unique_everseen function that is built to solve this problem without any unreadable (
not seen.add) mutations in list comprehensions. This is also the fastest solution too:
>>> from more_itertools import unique_everseen >>> items = [1, 2, 0, 1, 3, 2] >>> list(unique_everseen(items)) [1, 2, 0, 3]
Just one simple library import and no hacks.
This comes from an implementation of the itertools recipe
unique_everseen which looks like:
def unique_everseen(iterable, key=None): "List unique elements, preserving order. Remember all elements ever seen." # unique_everseen('AAAABBBCCDAABBB') --> A B C D # unique_everseen('ABBCcAD', str.lower) --> A B C D seen = set() seen_add = seen.add if key is None: for element in filterfalse(seen.__contains__, iterable): seen_add(element) yield element else: for element in iterable: k = key(element) if k not in seen: seen_add(k) yield element
>>> from collections import OrderedDict >>> items = [1, 2, 0, 1, 3, 2] >>> list(OrderedDict.fromkeys(items)) [1, 2, 0, 3]
This looks much nicer than:
seen = set() [x for x in seq if x not in seen and not seen.add(x)]
and doesn't utilize the ugly hack:
which relies on the fact that
set.add is an in-place method that always returns
not None evaluates to
Note however that the hack solution is faster in raw speed though it has the same runtime complexity O(N).