Regular expression: match start or whitespace


Can a regular expression match whitespace or the start of a string?

I'm trying to replace currency the abbreviation GBP with a £ symbol. I could just match anything starting GBP, but I'd like to be a bit more conservative, and look for certain delimiters around it.

>>> import re
>>> text = u'GBP 5 Off when you spend GBP75.00'

>>> re.sub(ur'GBP([\W\d])', ur'£\g<1>', text) # matches GBP with any prefix
u'\xa3 5 Off when you spend \xa375.00'

>>> re.sub(ur'^GBP([\W\d])', ur'£\g<1>', text) # matches at start only
u'\xa3 5 Off when you spend GBP75.00'

>>> re.sub(ur'(\W)GBP([\W\d])', ur'\g<1>£\g<2>', text) # matches whitespace prefix only
u'GBP 5 Off when you spend \xa375.00'

Can I do both of the latter examples at the same time?

4/3/2009 9:49:24 PM

Accepted Answer

Use the OR "|" operator:

>>> re.sub(r'(^|\W)GBP([\W\d])', u'\g<1>£\g<2>', text)
u'\xa3 5 Off when you spend \xa375.00'
2/8/2009 12:59:38 PM

\b is word boundary, which can be a white space, the beginning of a line or a non-alphanumeric symbol (\bGBP\b).

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow