Encode keys of dictionaries inside a list from unicode to ascii


Question

I have sample response with friends list from facebook:

[{u'uid': 513351886, u'name': u'Mohammed Hossein', u'pic_small': u'http://profile.ak.fbcdn.net/hprofile-ak-snc4/hs643.snc3/27383_513351886_4933_t.jpg'},
    {u'uid': 516583220, u'name': u'Sim Salabim', u'pic_small': u'http://profile.ak.fbcdn.net/hprofile-ak-snc4/hs348.snc4/41505_516583220_5681339_t.jpg'}]

How I could parse through this list encoding key's of the dictionaries to ascii ? I've tried something like this :

response = simplejson.load(urllib.urlopen(REST_SERVER, data))
for k in response:
    for id, stuff in k.items():
        id.encode("ascii")
        logging.debug("id: %s" % id)
return response

But encoded keys are not saved and as a result I'm still getting unicode values.

1
4
11/29/2010 11:34:29 AM

Accepted Answer

First: do you really need to do this? The strings are in Unicode for a reason: you simply can't represent everything in plain ASCII that you can in Unicode. This probably won't be a problem for your dictionary keys 'uid', 'name' and 'pic_small'; but it probably won't be a problem to leave them as Unicode, either. (The 'simplejson' library does not know anything about your data, so it uses Unicode for every string - better safe than sorry.)

Anyway:

In Python, strings cannot be modified. The .encode method does not change the string; it returns a new string that is the encoded version.

What you want to do is produce a new dictionary, which replaces the keys with the encoded keys. We can do this by passing each pair of (encoded key, original value) as *args for the dict constructor.

That looks like:

dict((k.encode('ascii'), v) for (k, v) in original.items())

Similarly, we can use a list comprehension to apply this to every dictionary, and create the new list. (We can modify the list in-place, but this way is cleaner.)

response = simplejson.load(urllib.urlopen(REST_SERVER, data))
# We create the list of modified dictionaries, and re-assign 'response' to it:
response = [
     dict((k.encode('ascii'), v) for (k, v) in original.items()) # the modified version
     for original in response # of each original dictionary.
]
return response
10
11/29/2010 11:46:37 AM

Your other responses hint at this but don't come out and say it: dictionary lookup and string comparison in Python transparently convert between Unicode and ASCII:

>>> x = {u'foo':'bar'}    # unicode key, ascii value
>>> x['foo']              # look up by ascii
'bar'
>>> x[u'foo']             # or by unicode
'bar'
>>> x['foo'] == u'bar'    # ascii value has a unicode equivalent
True

So for most uses of a dictionary converted from JSON, you don't usually need to worry about the fact that everything's Unicode.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon