In Python, the itertools.groupby() method allows developers to group values of an iterable class based on a specified property into another iterable set of values.
itertools.groupby(iterable, key=None or some function)
Any python iterable
Function(criteria) on which to group the iterable
groupby() is tricky but a general rule to keep in mind when using it is this:
Always sort the items you want to group with the same key you want to use for grouping
It is recommended that the reader take a look at the documentation here and see how it is explained using a class definition.
Say you have the string
and you would like to split it so all the 'A's are in one list and so with all the 'B's and 'C', etc.
You could do something like this
But for large data set you would be building up these items in memory. This is where groupby() comes in
We could get the same result in a more efficient manner by doing the following
Notice that the number of 'A's in the result when we used group by is less than the actual number of 'A's in the original string. We can avoid that loss of information by sorting the items in s before passing it to c as shown below
Now we have all our 'A's.
This example illustrates how the default key is chosen if we do not specify any
Notice here that the tuple as a whole counts as one key in this list
Notice in this example that mulato and camel don't show up in our result. Only the last element with the specified key shows up. The last result for c actually wipes out two previous results. But watch the new version where I have the data sorted first on same key.
In this example we see what happens when we use different types of iterable.
This example below is essentially the same as the one above it. The only difference is that I have changed all the tuples to lists.
This modified text is an extract of the original Stack Overflow Documentation created by following contributors and released under CC BY-SA 3.0