Files & Folders I/O
When it comes to storing, reading, or communicating data, working with the files of an operating system is both necessary and easy with Python. Unlike other languages where file input and output requires complex reading and writing objects, Python simplifies the process only needing commands to open, read/write and close the file. This topic explains how Python can interface with files on the operating system.
- file_object = open(filename [, access_mode][, buffering])
|filename||the path to your file or, if the file is in the working directory, the filename of your file|
|access_mode||a string value that determines how the file is opened|
|buffering||an integer value used for optional line buffering|
Avoiding the cross-platform Encoding Hell
When using Python's built-in
open(), it is best-practice to always pass the
encoding argument, if you intend your code to be run cross-platform.
The Reason for this, is that a system's default encoding differs from platform to platform.
linux systems do indeed use
utf-8 as default, this is not necessarily true for MAC and Windows.
To check a system's default encoding, try this:
from any python interpreter.
Hence, it is wise to always sepcify an encoding, to make sure the strings you're working with are encoded as what you think they are, ensuring cross-platform compatiblity.
There are different modes you can open a file with, specified by the
mode parameter. These include:
'r'- reading mode. The default. It allows you only to read the file, not to modify it. When using this mode the file must exist.
'w'- writing mode. It will create a new file if it does not exist, otherwise will erase the file and allow you to write to it.
'a'- append mode. It will write data to the end of the file. It does not erase the file, and the file must exist for this mode.
'rb'- reading mode in binary. This is similar to
rexcept that the reading is forced in binary mode. This is also a default choice.
'r+'- reading mode plus writing mode at the same time. This allows you to read and write into files at the same time without having to use
'rb+'- reading and writing mode in binary. The same as
r+except the data is in binary
'wb'- writing mode in binary. The same as
wexcept the data is in binary.
'w+'- writing and reading mode. The exact same as
r+but if the file does not exist, a new one is made. Otherwise, the file is overwritten.
'wb+'- writing and reading mode in binary mode. The same as
w+but the data is in binary.
'ab'- appending in binary mode. Similar to
aexcept that the data is in binary.
'a+'- appending and reading mode. Similar to
w+as it will create a new file if the file does not exist. Otherwise, the file pointer is at the end of the file if it exists.
'ab+'- appending and reading mode in binary. The same as
a+except that the data is in binary.
Python 3 added a new mode for
exclusive creation so that you will not accidentally truncate or overwrite and existing file.
'x'- open for exclusive creation, will raise
FileExistsErrorif the file already exists
'xb'- open for exclusive creation writing mode in binary. The same as
xexcept the data is in binary.
'x+'- reading and writing mode. Similar to
w+as it will create a new file if the file does not exist. Otherwise, will raise
'xb+'- writing and reading mode. The exact same as
x+but the data is binary
Allow one to write your file open code in a more pythonic manner:
In Python 2 you would have done something like
Check whether a file or path exists
Employ the EAFP coding style and
try to open it.
This will also avoid race-conditions if another process deleted the file between the check and when it is used. This race condition could happen in the following cases:
To check whether a given path exists or not, you can follow the above EAFP procedure, or explicitly check the path:
Checking if a file is empty
However, both will throw an exception if the file does not exist. To avoid having to catch such an error, do this:
which will return a
Copy a directory tree
The destination directory must not exist already.
Copying contents of one file to a different file
- Using the
Getting the full contents of a file
The preferred method of file i/o is to use the
with keyword. This will ensure the file handle is closed once the reading or writing has been completed.
or, to handle closing the file manually, you can forgo
with and simply call
Keep in mind that without using a
with statement, you might accidentally keep the file open in case an unexpected exception arises like so:
Iterate files (recursively)
To iterate all files, including in sub directories, use os.walk:
root_dir can be "." to start from current directory, or any other path to start from.
If you also wish to get information about the file, you may use the more efficient method os.scandir like so:
Random File Access Using mmap
mmap module allows the user to randomly access locations in a file by mapping the file into memory. This is an alternative to using normal file operations.
Read a file between a range of lines
So let's suppose you want to iterate only between some specific lines of a file
You can make use of
itertools for that
This will read through the lines 13 to 20 as in python indexing starts from 0. So line number 1 is indexed as 0
As can also read some extra lines by making use of the
next() keyword here.
And when you are using the file object as an iterable, please don't use the
readline() statement here as the two techniques of traversing a file are not to be mixed together
Reading a file line-by-line
The simplest way to iterate over a file line-by-line:
readline() allows for more granular control over line-by-line iteration. The example below is equivalent to the one above:
Using the for loop iterator and readline() together is considered bad practice.
More commonly, the
readlines() method is used to store an iterable collection of the file's lines:
This would print the following:
Line 0: hello
Line 1: world
Replacing text in a file
Writing to a file
If you open
myfile.txt, you will see that its contents are:
Line 1Line 2Line 3Line 4
Python doesn't automatically add line breaks, you need to do that manually:
Do not use
os.linesep as a line terminator when writing files opened in text mode (the default); use
If you want to specify an encoding, you simply add the
encoding parameter to the
It is also possible to use the print statement to write to a file. The mechanics are different in Python 2 vs Python 3, but the concept is the same in that you can take the output that would have gone to the screen and send it to a file instead.
In Python 2 you would have done something like
Unlike using the write function, the print function does automatically add line breaks.