How can I download a webpage with a user agent other than the default one on urllib2.urlopen?
The short story: You can use Request.add_header to do this.
You can also pass the headers as a dictionary when creating the Request itself, as the docs note:
headers should be a dictionary, and will be treated as if
add_header()was called with each key and value as arguments. This is often used to “spoof” the
User-Agentheader, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as
"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/184.108.40.206", while
urllib2‘s default user agent string is
"Python-urllib/2.6"(on Python 2.6).
There is example code in that question, but basically you can do something like this: (Note the capitalization of
User-Agent as of RFC 2616, section 14.43.)
opener = urllib2.build_opener() opener.addheaders = [('User-Agent', 'Mozilla/5.0')] response = opener.open('http://www.stackoverflow.com')