Requests is a really nice library. I'd like to use it for download big files (>1GB). The problem is it's not possible to keep whole file in memory I need to read it in chunks. And this is a problem with the following code
import requests def DownloadFile(url) local_filename = url.split('/')[-1] r = requests.get(url) f = open(local_filename, 'wb') for chunk in r.iter_content(chunk_size=512 * 1024): if chunk: # filter out keep-alive new chunks f.write(chunk) f.close() return
By some reason it doesn't work this way. It still loads response into memory before save it to a file.
If you need a small client (Python 2.x /3.x) which can download big files from FTP, you can find it here. It supports multithreading & reconnects (it does monitor connections) also it tunes socket params for the download task.
With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:
def download_file(url): local_filename = url.split('/')[-1] # NOTE the stream=True parameter below with requests.get(url, stream=True) as r: r.raise_for_status() with open(local_filename, 'wb') as f: for chunk in r.iter_content(chunk_size=8192): if chunk: # filter out keep-alive new chunks f.write(chunk) # f.flush() return local_filename
Note that the number of bytes returned using
iter_content is not exactly the
chunk_size; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.
See http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow for further reference.
import requests import shutil def download_file(url): local_filename = url.split('/')[-1] with requests.get(url, stream=True) as r: with open(local_filename, 'wb') as f: shutil.copyfileobj(r.raw, f) return local_filename
This streams the file to disk without using excessive memory, and the code is simple.