Handling backreferences to capturing groups in re.sub replacement pattern


Question

I want to take the string 0.71331, 52.25378 and return 0.71331,52.25378 - i.e. just look for a digit, a comma, a space and a digit, and strip out the space.

This is my current code:

coords = '0.71331, 52.25378'
coord_re = re.sub("(\d), (\d)", "\1,\2", coords)
print coord_re

But this gives me 0.7133,2.25378. What am I doing wrong?

1
62
10/17/2018 8:05:06 AM

Accepted Answer

You should be using raw strings for regex, try the following:

coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)

With your current code, the backslashes in your replacement string are escaping the digits, so you are replacing all matches the equivalent of chr(1) + "," + chr(2):

>>> '\1,\2'
'\x01,\x02'
>>> print '\1,\2'
,
>>> print r'\1,\2'   # this is what you actually want
\1,\2

Any time you want to leave the backslash in the string, use the r prefix, or escape each backslash (\\1,\\2).

87
11/16/2011 7:19:30 PM

Python interprets the \1 as a character with ASCII value 1, and passes that to sub.

Use raw strings, in which Python doesn't interpret the \.

coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)

This is covered right in the beginning of the re documentation, should you need more info.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon