Python: Replace with regex


I need to replace part of a string. I was looking through the Python documentation and found re.sub.

import re
s = '<textarea id="Foo"></textarea>'
output = re.sub(r'<textarea.*>(.*)</textarea>', 'Bar', s)
print output


I was expecting this to print '<textarea id="Foo">Bar</textarea>' and not 'bar'.

Could anybody tell me what I did wrong?

10/22/2010 2:02:54 PM

Accepted Answer

Instead of capturing the part you want to replace you can capture the parts you want to keep and then refer to them using a reference \1 to include them in the substituted string.

Try this instead:

output = re.sub(r'(<textarea.*>).*(</textarea>)', r'\1Bar\2', s)

Also, assuming this is HTML you should consider using an HTML parser for this task, for example Beautiful Soup.

10/22/2010 2:04:57 PM

Or you could just use the search function instead:'(<textarea.*>).*(</textarea>)', s)
output ='bar'
print output
>>>'<textarea id="Foo">bar</textarea>'

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow