Python TypeError on regex


So, I have this code:

url = ''
linkregex = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>')
m = urllib.request.urlopen(url)
msg =
links = linkregex.findall(msg)

But then python returns this error:

links = linkregex.findall(msg)
TypeError: can't use a string pattern on a bytes-like object

What did I do wrong?

Accepted Answer

TypeError: can't use a string pattern on a bytes-like object

You used a string pattern on a bytes object. Use a bytes pattern instead:

linkregex = re.compile(b'<a\s*href=[\'|"](.*?)[\'"].*?>')
            Add the b there, it makes it into a bytes object


 >>> from disclaimer include dont_use_regexp_on_html
 "Use BeautifulSoup or lxml instead."


If you are running Python 2.6 then there isn't any "request" in "urllib". So the third line becomes:

m = urllib.urlopen(url) 

And in version 3 you should use this:

links = linkregex.findall(str(msg))

Because 'msg' is a bytes object and not a string as findall() expects. Or you could decode using the correct encoding. For instance, if "latin1" is the encoding then:

links = linkregex.findall(msg.decode("latin1"))

