I have a string that contains unicode characters e.g.
\u2026 etc. Somehow it is not received to me as
unicode, but is received as a
str. How do I convert it back to unicode?
>>> a="Hello\u2026" >>> b=u"Hello\u2026" >>> print a Hello\u2026 >>> print b Hello… >>> print unicode(a) Hello\u2026 >>>
unicode(a) is not the answer. Then what is?
Unicode escapes only work in unicode strings, so this
is actually a string of 6 characters: '\', 'u', '2', '0', '2', '6'.
To make unicode out of this, use
a="\u2026" print repr(a) print repr(a.decode('unicode-escape')) ## '\\u2026' ## u'\u2026'
Decode it with the
>>> a="Hello\u2026" >>> a.decode('unicode-escape') u'Hello\u2026' >>> print _ Hello…
This is because for a non-unicode string the
\u2026 is not recognised but is instead treated as a literal series of characters (to put it more clearly,
'Hello\\u2026'). You need to decode the escapes, and the
unicode-escape codec can do that for you.
Note that you can get
unicode to recognise it in the same way by specifying the codec argument:
>>> unicode(a, 'unicode-escape') u'Hello\u2026'
a.decode() way is nicer.