Reading entire file in Python


Question

If you read an entire file with content = open('Path/to/file', 'r').read() is the file handle left open until the script exits? Is there a more concise method to read a whole file?

1
314
9/13/2011 11:44:12 PM

Accepted Answer

The answer to that question depends somewhat on the particular Python implementation.

To understand what this is all about, pay particular attention to the actual file object. In your code, that object is mentioned only once, in an expression, and becomes inaccessible immediately after the read() call returns.

This means that the file object is garbage. The only remaining question is "When will the garbage collector collect the file object?".

in CPython, which uses a reference counter, this kind of garbage is noticed immediately, and so it will be collected immediately. This is not generally true of other python implementations.

A better solution, to make sure that the file is closed, is this pattern:

with open('Path/to/file', 'r') as content_file:
    content = content_file.read()

which will always close the file immediately after the block ends; even if an exception occurs.

Edit: To put a finer point on it:

Other than file.__exit__(), which is "automatically" called in a with context manager setting, the only other way that file.close() is automatically called (that is, other than explicitly calling it yourself,) is via file.__del__(). This leads us to the question of when does __del__() get called?

A correctly-written program cannot assume that finalizers will ever run at any point prior to program termination.

-- https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13203

In particular:

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

[...]

CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.

-- https://docs.python.org/3.5/reference/datamodel.html#objects-values-and-types

(Emphasis mine)

but as it suggests, other implementations may have other behavior. As an example, PyPy has 6 different garbage collection implementations!

516
6/20/2019 2:01:00 AM

You can use pathlib.

For Python 3.5 and above:

from pathlib import Path
contents = Path(file_path).read_text()

For lower versions of Python use pathlib2:

$ pip install pathlib2

Then:

from pathlib2 import Path
contents = Path(file_path).read_text()

This is the actual read_text implementation:

def read_text(self, encoding=None, errors=None):
    """
    Open the file in text mode, read it, and close the file.
    """
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
        return f.read()

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon