Python: use regular expression to remove the white space from all lines


Question

^(\s+) only removes the whitespace from the first line. How do I remove the front whitespace from all the lines?

1
22
4/29/2016 1:57:22 AM

Accepted Answer

Python's regex module does not default to multi-line ^ matching, so you need to specify that flag explicitly.

r = re.compile(r"^\s+", re.MULTILINE)
r.sub("", "a\n b\n c") # "a\nb\nc"

# or without compiling (only possible for Python 2.7+ because the flags option
# didn't exist in earlier versions of re.sub)

re.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)

# but mind that \s includes newlines:
r.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"

It's also possible to include the flag inline to the pattern:

re.sub(r"(?m)^\s+", "", "a\n b\n c")

An easier solution is to avoid regular expressions because the original problem is very simple:

content = 'a\n b\n\n c'
stripped_content = ''.join(line.lstrip(' \t') for line in content.splitlines(True))
# stripped_content == 'a\nb\n\nc'
26
9/28/2016 12:23:17 PM

@AndiDog acknowledges in his (currently accepted) answer that it munches consecutive newlines.

Here's how to fix that deficiency, which is caused by the fact that \n is BOTH whitespace and a line separator. What we need to do is make an re class that includes only whitespace characters other than newline.

We want whitespace and not newline, which can't be expressed directly in an re class. Let's rewrite that as not not (whitespace and not newline) i.e. not(not whitespace or not not newline (thanks, Augustus) i.e. not(not whitespace or newline) i.e. [^\S\n] in re notation.

So:

>>> re.sub(r"(?m)^[^\S\n]+", "", "  a\n\n   \n\n b\n c\nd  e")
'a\n\n\n\nb\nc\nd  e'

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon