What is the pythonic way to split a string before the occurrences of a given set of characters?
For example, I want to split
at any occurrence of an uppercase letter (possibly except the first), and obtain
['The', 'Long', 'And', 'Winding', 'Road'].
Edit: It should also split single occurrences, i.e.
'ABC' I'd like to obtain
['A', 'B', 'C'].
Unfortunately it's not possible to split on a zero-width match in Python. But you can use
>>> import re >>> re.findall('[A-Z][^A-Z]*', 'TheLongAndWindingRoad') ['The', 'Long', 'And', 'Winding', 'Road'] >>> re.findall('[A-Z][^A-Z]*', 'ABC') ['A', 'B', 'C']
Here is an alternative regex solution. The problem can be reprased as "how do I insert a space before each uppercase letter, before doing the split":
>>> s = "TheLongAndWindingRoad ABC A123B45" >>> re.sub( r"([A-Z])", r" \1", s).split() ['The', 'Long', 'And', 'Winding', 'Road', 'A', 'B', 'C', 'A123', 'B45']
This has the advantage of preserving all non-whitespace characters, which most other solutions do not.