I have a very big file 4GB and when I try to read it my computer hangs. So I want to read it piece by piece and after processing each piece store the processed piece into another file and read next piece.
Is there any method to
yield these pieces ?
I would love to have a lazy method.
To write a lazy function, just use
def read_in_chunks(file_object, chunk_size=1024): """Lazy function (generator) to read a file piece by piece. Default chunk size: 1k.""" while True: data = file_object.read(chunk_size) if not data: break yield data f = open('really_big_file.dat') for piece in read_in_chunks(f): process_data(piece)
Another option would be to use
iter and a helper function:
f = open('really_big_file.dat') def read1k(): return f.read(1024) for piece in iter(read1k, ''): process_data(piece)
If the file is line-based, the file object is already a lazy generator of lines:
for line in open('really_big_file.dat'): process_data(line)
If your computer, OS and python are 64-bit, then you can use the mmap module to map the contents of the file into memory and access it with indices and slices. Here an example from the documentation:
import mmap with open("hello.txt", "r+") as f: # memory-map the file, size 0 means whole file map = mmap.mmap(f.fileno(), 0) # read content via standard file methods print map.readline() # prints "Hello Python!" # read content via slice notation print map[:5] # prints "Hello" # update content using slice notation; # note that new content must have same size map[6:] = " world!\n" # ... and read again using standard file methods map.seek(0) print map.readline() # prints "Hello world!" # close the map map.close()
If either your computer, OS or python are 32-bit, then mmap-ing large files can reserve large parts of your address space and starve your program of memory.