Generate unique hashes for django models


I want to use unique hashes for each model rather than ids.

I implemented the following function to use it across the board easily.

import random,hashlib
from base64 import urlsafe_b64encode

def set_unique_random_value(model_object,field_name='hash_uuid',length=5,use_sha=True,urlencode=False):
    while 1:
        uuid_number = str(random.random())[2:]
        uuid = hashlib.sha256(uuid_number).hexdigest() if use_sha else uuid_number
        uuid = uuid[:length]
        if urlencode:
            uuid = urlsafe_b64encode(uuid)[:-1]
        hash_id_dict = {field_name:uuid}
        except model_object.__class__.DoesNotExist:

I'm seeking feedback, how else could I do it? How can I improve it? What is good bad and ugly about it?

3/25/2010 10:29:11 PM

I do not like this bit:

uuid = uuid[:5]

In the best scenario (uuid are uniformly distributed) you will get a collision with probability greater than 0.5 after 1k of elements!

It is because of the birthday problem. In a brief it is proven that the probability of collision exceeds 0.5 when number of elements is larger than square root from number of possible labels.

You have 0xFFFFF=10^6 labels (different numbers) so after a 1000 of generated values you will start having collisions.

Even if you enlarge length to -1 you have still problem here:


You will start having collisions after 3 * 10^6 (the same calculations follows).

I think your best bet is to use uuid that is more likely to be unique, here is an example

>>> import uuid
>>> uuid.uuid1().hex

Update If you do not trust math just run the following sample to see the collision:

 >>> len(set(hashlib.sha256(str(i)).hexdigest()[:5] for i in range(0,2000)))
 1999 # it should obviously print 2000 if there wasn't any collision
3/25/2010 11:30:21 PM

