How to extract an IP address from an HTML string?


Question

I want to extract an IP address from a string (actually a one-line HTML) using Python.

>>> s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>"

-- '165.91.15.131' is what I want!

I tried using regular expressions, but so far I can only get to the first number.

>>> import re
>>> ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s )
>>> ip
['165']

But I don't have a firm grasp on reg-expression; the above code was found and modified from elsewhere on the web.

1
23
7/22/2019 4:56:32 PM

Accepted Answer

Remove your capturing group:

ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )

Result:

['165.91.15.131']

Notes:

  • If you are parsing HTML it might be a good idea to look at BeautifulSoup.
  • Your regular expression matches some invalid IP addresses such as 0.00.999.9999. This isn't necessarily a problem, but you should be aware of it and possibly handle this situation. You could change the + to {1,3} for a partial fix without making the regular expression overly complex.
51
5/23/2010 7:36:57 AM

You can use the following regex to capture only valid IP addresses

re.findall(r'\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\b',s)

returns

['165', '91', '15', '131']

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon