I am looking for a linked list and related algorithms implementation for Python. Everyone I ask just recommends using built in Python lists, but performance measurements indicate that list insertion and removal is a bottleneck for our application. It's trivial to implement a simple linked list, but I wonder if there is a mature library which includes some operations like sort, merge, splice, search, lower/upper bound, etc...
I know this is a dupe, but searching for python list on any search engine gives predictably poor results, with most people just saying that linked lists are unneeded in python (pfft!).
PS: I need to insert and remove from anywhere in the list, not just the ends.
OK, you asked for it: I need to maintain an ordered list of several hundred thousand entries. I will iterate over the list forwards (one by one), using a visitor on each entry, starting from the beginning or a position found by a binary search. When an entry matching a predicate is found it is removed from the list, and then, another binary search is performed on a subset of the list beginning from the removed entry's previous position, until a position determined statistically beforehand. Ignoring the error condition, the modified entry may be used to create another linked list which is spliced into the new position found through the second binary search. Iteration is continued from the position where the entry was removed. On occasion, several thousand contiguous ordered entries may be added to/removed from any place in the list. Sometimes several thousand non-contiguous entries must be searched for and removed incrementally.
python's list is unnacceptable as the cost of insertion/removal is prohibitive, and the minor gains in speed for the binary search are totally irrelevant to the total cost. Our in house tests confirm this.
if I have neglected any detail perhaps I can e-mail you a copy of my company's non-disclosure agreement and I can privately correspond with you on the matter. sarcasm.end().
Here's a blog post sharing your pain. It includes an implementation of a linked list and a performance comparison.
blist would be better, though (from here)?
The use cases where the BList is slightly slower than Python's list are as follows (O(log n) vs. O(1)):
- A large list that never changes length.
- A large lists where inserts and deletes are only at the end of the list (LIFO).
With that disclaimer out of the way, here are some of the use cases where the BLists is dramatically faster than the built-in list:
- Insertion into or removal from a large list (O(log n) vs. O(n))
- Taking large slices of large lists (O(log n) vs O(n))
- Making shallow copies of large lists (O(1) vs. O(n))
- Changing large slices of large lists (O(log n + log k) vs. O(n + k))
- Multiplying a list to make a large, sparse list (O(log k) vs. O(kn))
Note that it's actually implemented as a B+ tree, allowing great performance for all those operations.