Python join: why is it string.join(list) instead of list.join(string)?


Question

This has always confused me. It seems like this would be nicer:

my_list = ["Hello", "world"]
print my_list.join("-")
# Produce: "Hello-world"

Than this:

my_list = ["Hello", "world"]
print "-".join(my_list)
# Produce: "Hello-world"

Is there a specific reason it is like this?

1
1625
12/21/2016 11:29:18 AM

Accepted Answer

It's because any iterable can be joined, not just lists, but the result and the "joiner" are always strings.

E.G:

import urllib2
print '\n############\n'.join(
    urllib2.urlopen('http://data.stackexchange.com/users/7095'))
1185
10/27/2015 5:28:55 AM

This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str methods including join) was released in Oct 2000.

  • There were four options proposed in this thread:
    • str.join(seq)
    • seq.join(str)
    • seq.reduce(str)
    • join as a built-in function
  • Guido wanted to support not only lists, tuples, but all sequences/iterables.
  • seq.reduce(str) is difficult for new-comers.
  • seq.join(str) introduces unexpected dependency from sequences to str/unicode.
  • join() as a built-in function would support only specific data types. So using a built in namespace is not good. If join() supports many datatypes, creating optimized implementation would be difficult, if implemented using the __add__ method then it's O(n²).
  • The separater string (sep) should not be omitted. Explicit is better than implicit.

There are no other reasons offered in this thread.

Here are some additional thoughts (my own, and my friend's):

  • Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS2/4. To calculate total buffer length of UTF-8 strings it needs to know character coding rule.
  • At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn't support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).

Guido's decision is recorded in a historical mail, deciding on str.join(seq):

Funny, but it does seem right! Barry, go for it...
--Guido van Rossum


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon