Split a string at uppercase letters


Question

What is the pythonic way to split a string before the occurrences of a given set of characters?

For example, I want to split 'TheLongAndWindingRoad' at any occurrence of an uppercase letter (possibly except the first), and obtain ['The', 'Long', 'And', 'Winding', 'Road'].

Edit: It should also split single occurrences, i.e. from 'ABC' I'd like to obtain ['A', 'B', 'C'].

1
73
12/10/2018 11:46:55 AM

Accepted Answer

Unfortunately it's not possible to split on a zero-width match in Python. But you can use re.findall instead:

>>> import re
>>> re.findall('[A-Z][^A-Z]*', 'TheLongAndWindingRoad')
['The', 'Long', 'And', 'Winding', 'Road']
>>> re.findall('[A-Z][^A-Z]*', 'ABC')
['A', 'B', 'C']
111
2/17/2010 12:22:52 AM

Here is an alternative regex solution. The problem can be reprased as "how do I insert a space before each uppercase letter, before doing the split":

>>> s = "TheLongAndWindingRoad ABC A123B45"
>>> re.sub( r"([A-Z])", r" \1", s).split()
['The', 'Long', 'And', 'Winding', 'Road', 'A', 'B', 'C', 'A123', 'B45']

This has the advantage of preserving all non-whitespace characters, which most other solutions do not.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon