how to test for a regex match


Question

I have a string. Let's call it 'test'. I want to test a match for this string, but only using the backref of a regex.

Can I do something like this:

import re

for line in f.readlines():
   if '<a href' in line:
      if re.match('<a href="(.*)">', line) == 'test':
         print 'matched!'

? This of course, doesn't seem to work, but I would think that I might be close? Basically the question is how can I get re to return only the backref for comparison?

1
22
1/20/2011 1:35:36 AM

Accepted Answer

re.match matches only at the beginning of the string.

def url_match(line, url):
    match = re.match(r'<a href="(?P<url>[^"]*?)"', line)
    return match and match.groupdict()['url'] == url:

example usage:

>>> url_match('<a href="test">', 'test')
True
>>> url_match('<a href="test">', 'te')
False
>>> url_match('this is a <a href="test">', 'test')
False

If the pattern could occur anywhere in the line, use re.search.

def url_search(line, url):
    match = re.search(r'<a href="(?P<url>[^"]*?)"', line)
    return match and match.groupdict()['url'] == url:

example usage:

>>> url_search('<a href="test">', 'test')
True
>>> url_search('<a href="test">', 'te')
False
>>> url_search('this is a <a href="test">', 'test')
True

N.B : If you are trying to parsing HTML using a regex, read RegEx match open tags except XHTML self-contained tags before going any further.

22
5/23/2017 10:31:06 AM

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon