How do I use Python's itertools.groupby()?


Question

I haven't been able to find an understandable explanation of how to actually use Python's itertools.groupby() function. What I'm trying to do is this:

  • Take a list - in this case, the children of an objectified lxml element
  • Divide it into groups based on some criteria
  • Then later iterate over each of these groups separately.

I've reviewed the documentation, and the examples, but I've had trouble trying to apply them beyond a simple list of numbers.

So, how do I use of itertools.groupby()? Is there another technique I should be using? Pointers to good "prerequisite" reading would also be appreciated.

1
444
8/6/2014 9:34:45 PM

Accepted Answer

IMPORTANT NOTE: You have to sort your data first.


The part I didn't get is that in the example construction

groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
   groups.append(list(g))    # Store group iterator as a list
   uniquekeys.append(k)

k is the current grouping key, and g is an iterator that you can use to iterate over the group defined by that grouping key. In other words, the groupby iterator itself returns iterators.

Here's an example of that, using clearer variable names:

from itertools import groupby

things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]

for key, group in groupby(things, lambda x: x[0]):
    for thing in group:
        print "A %s is a %s." % (thing[1], key)
    print " "

This will give you the output:

A bear is a animal.
A duck is a animal.

A cactus is a plant.

A speed boat is a vehicle.
A school bus is a vehicle.

In this example, things is a list of tuples where the first item in each tuple is the group the second item belongs to.

The groupby() function takes two arguments: (1) the data to group and (2) the function to group it with.

Here, lambda x: x[0] tells groupby() to use the first item in each tuple as the grouping key.

In the above for statement, groupby returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.

Here's a slightly different example with the same data, using a list comprehension:

for key, group in groupby(things, lambda x: x[0]):
    listOfThings = " and ".join([thing[1] for thing in group])
    print key + "s:  " + listOfThings + "."

This will give you the output:

animals: bear and duck.
plants: cactus.
vehicles: speed boat and school bus.

597
10/24/2018 10:30:28 PM

Can you show us your code?

The example on the Python docs is quite straightforward:

groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
    groups.append(list(g))      # Store group iterator as a list
    uniquekeys.append(k)

So in your case, data is a list of nodes, keyfunc is where the logic of your criteria function goes and then groupby() groups the data.

You must be careful to sort the data by the criteria before you call groupby or it won't work. groupby method actually just iterates through a list and whenever the key changes it creates a new group.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon