I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). How can I change this regular expression to match any non-alphanumeric char except the hyphen?
You could just use a negated character class instead:
This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.
>>> r = re.compile(r"[^a-zA-Z0-9-]") >>> s = "some#%te_xt&with--##%--5 hy-phens *#" >>> r.sub("",s) 'sometextwith----5hy-phens'
Notice that this also replaces spaces (which may certainly be what you want).
Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:
+ will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.
\w matches alphanumerics, add in the hyphen, then negate the entire set: