repeating multiple characters regex


Question

Is there a way using a regex to match a repeating set of characters? For example:

ABCABCABCABCABC

ABC{5}

I know that's wrong. But is there anything to match that effect?

Update:

Can you use nested capture groups? So Something like (?<cap>(ABC){5}) ?

1
19
9/2/2010 8:41:34 PM

Accepted Answer

Enclose the regex you want to repeat in parentheses. For instance, if you want 5 repetitions of ABC:

(ABC){5}

Or if you want any number of repetitions (0 or more):

(ABC)*

Or one or more repetitions:

(ABC)+

edit to respond to update

Parentheses in regular expressions do two things; they group together a sequence of items in a regular expression, so that you can apply an operator to an entire sequence instead of just the last item, and they capture the contents of that group so you can extract the substring that was matched by that subexpression in the regex.

You can nest parentheses; they are counted from the first opening paren. For instance:

>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(0)
'123 ABCDEF'
>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(1)
'ABCDEF'
>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(2)
'DEF'

If you would like to avoid capturing when you are grouping, you can use (?:. This can be helpful if you don't want parentheses that you're just using to group together a sequence for the purpose of applying an operator to change the numbering of your matches. It is also faster.

>>> re.search('[0-9]* (?:ABC(...))', '123 ABCDEF 456').group(1)
'DEF'

So to answer your update, yes, you can use nested capture groups, or even avoid capturing with the inner group at all:

>>> re.search('((?:ABC){5})(DEF)', 'ABCABCABCABCABCDEF').group(1)
'ABCABCABCABCABC'
>>> re.search('((?:ABC){5})(DEF)', 'ABCABCABCABCABCDEF').group(2)
'DEF'
39
9/2/2010 8:57:19 PM

ABC{5} matches ABCCCCC. To match 5 ABC's, you should use (ABC){5}. Parentheses are used to group a set of characters. You can also set an interval for occurrences like (ABC){3,5} which matches ABCABCABC, ABCABCABCABC, and ABCABCABCABCABC.

(ABC){1,} means 1 or more repetition which is exactly the same as (ABC)+.

(ABC){0,} means 0 or more repetition which is exactly the same as (ABC)*.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon