In the snippet below, the non-capturing group
"(?:aaa)" should be ignored in the matching result,
The result should be
However, I get
"aaa_bbb" in the matching result; only when I specify group(2) does it show
>>> import re >>> s = "aaa_bbb" >>> print(re.match(r"(?:aaa)(_bbb)", s).group()) aaa_bbb
group(0) will return the entire match. Subsequent groups are actual capture groups.
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(0)) aaa_bbb >>> print (re.match(r"(?:aaa)(_bbb)", string1).group(1)) _bbb >>> print (re.match(r"(?:aaa)(_bbb)", string1).group(2)) Traceback (most recent call last): File "<stdin>", line 1, in ? IndexError: no such group
If you want the same behavior than
" ".join(re.match(r"(?:aaa)(_bbb)", string1).groups())
I think you're misunderstanding the concept of a "non-capturing group". The text matched by a non-capturing group still becomes part of the overall regex match.
Both the regex
(?:aaa)(_bbb) and the regex
aaa_bbb as the overall match. The difference is that the first regex has one capturing group which returns
_bbb as its match, while the second regex has two capturing groups that return
_bbb as their respective matches. In your Python code, to get
_bbb, you'd need to use
group(1) with the first regex, and
group(2) with the second regex.
The main benefit of non-capturing groups is that you can add them to a regex without upsetting the numbering of the capturing groups in the regex. They also offer (slightly) better performance as the regex engine doesn't have to keep track of the text matched by non-capturing groups.
If you really want to exclude
aaa from the overall regex match then you need to use lookaround. In this case, positive lookbehind does the trick:
(?<=aaa)_bbb. With this regex,
_bbb in Python. No capturing groups needed.
My recommendation is that if you have the ability to use capturing groups to get part of the regex match, use that method instead of lookaround.