How to exclude a character from a regex group?


Question

I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). How can I change this regular expression to match any non-alphanumeric char except the hyphen?

re.compile('[\W_]')

Thanks.

1
22
11/5/2010 5:45:58 PM

Accepted Answer

You could just use a negated character class instead:

re.compile(r"[^a-zA-Z0-9-]")

This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.

>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens  *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'

Notice that this also replaces spaces (which may certainly be what you want).


Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:

re.compile(r"[^a-zA-Z0-9-]+")

The + will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.

25
11/5/2010 6:14:59 PM

\w matches alphanumerics, add in the hyphen, then negate the entire set: r"[^\w-]"


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon