Randomly mix lines of 3 million-line file


Question

Everything is in the title. I'm wondering if any one knows a quick and with reasonable memory demands way of randomly mixing all the lines of a 3 million lines file. I guess it is not possible with a simple vim command, so any simple script using Python. I tried with python by using a random number generator, but did not manage to find a simple way out.

1
26
2/8/2018 4:02:25 PM

Accepted Answer

import random
with open('the_file','r') as source:
    data = [ (random.random(), line) for line in source ]
data.sort()
with open('another_file','w') as target:
    for _, line in data:
        target.write( line )

That should do it. 3 million lines will fit into most machine's memory unless the lines are HUGE (over 512 characters).

31
1/6/2011 9:54:21 PM

Takes only a few seconds in Python:

>>> import random
>>> lines = open('3mil.txt').readlines()
>>> random.shuffle(lines)
>>> open('3mil.txt', 'w').writelines(lines)

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon