Python Regular Expression example


Question

I want to write a simple regular expression in Python that extracts a number from HTML. The HTML sample is as follows:

Your number is <b>123</b>

Now how can I extract "123", i.e. the contents of first bold text after string "Your number is"?

1
38
10/26/2016 1:05:22 PM

Accepted Answer

import re
m = re.search("Your number is <b>(\d+)</b>",
      "xxx Your number is <b>123</b>  fdjsk")
if m:
    print m.groups()[0]
54
6/23/2012 4:56:45 PM

Given s = "Your number is <b>123</b>" then:

 import re 
 m = re.search(r"\d+", s)

will work and give you

 m.group()
'123'

The regular expression looks for 1 or more consecutive digits in your string.

Note that in this specific case we knew that there would be a numeric sequence, otherwise you would have to test the return value of re.search() to make sure that m contained a valid reference, otherwise m.group() would result in a AttributeError: exception.

Of course if you are going to process a lot of HTML you want to take a serious look at BeautifulSoup - it's meant for that and much more. The whole idea with BeautifulSoup is to avoid "manual" parsing using string ops or regular expressions.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon