How to make Django QuerySet bulk delete() more efficient


Question

Setup:
Django 1.1.2, MySQL 5.1

Problem:

Blob.objects.filter(foo = foo) \
            .filter(status = Blob.PLEASE_DELETE) \
            .delete()

This snippet results in the ORM first generating a SELECT * from xxx_blob where ... query, then doing a DELETE from xxx_blob where id in (BLAH); where BLAH is a ridiculously long list of id's. Since I'm deleting a large amount of blobs, this makes both me and the DB very unhappy.

Is there a reason for this? I don't see why the ORM can't convert the above snippet into a single DELETE query. Is there a way to optimize this without resorting to raw SQL?

1
26
9/9/2014 11:59:46 AM

Accepted Answer

Not without writing your own custom SQL or managers or something; they are apparently working on it though.

http://code.djangoproject.com/ticket/9519

14
2/1/2011 9:50:52 PM

For those who are still looking for an efficient way to bulk delete in django, here's a possible solution:

The reason delete() may be so slow is twofold: 1) django has to ensure cascade deleting functions properly, thus looking for foreign key references to your models; 2) django has to handle pre and post-save signals for your models.

If you know your models don't have cascade deleting or signals to be handled, you can accelerate this process by resorting to the private API _raw_delete as follows:

queryset._raw_delete(queryset.db)

More details in here. Please note that django already tries to make a good handling of these events, though using the raw delete is, in many situations, much more efficient.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon