Python recursive folder read


Question

I have a C++/Obj-C background and I am just discovering Python (been writing it for about an hour). I am writing a script to recursively read the contents of text files in a folder structure.

The problem I have is the code I have written will only work for one folder deep. I can see why in the code (see #hardcoded path), I just don't know how I can move forward with Python since my experience with it is only brand new.

Python Code:

import os
import sys

rootdir = sys.argv[1]

for root, subFolders, files in os.walk(rootdir):

    for folder in subFolders:
        outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path
        folderOut = open( outfileName, 'w' )
        print "outfileName is " + outfileName

        for file in files:
            filePath = rootdir + '/' + file
            f = open( filePath, 'r' )
            toWrite = f.read()
            print "Writing '" + toWrite + "' to" + filePath
            folderOut.write( toWrite )
            f.close()

        folderOut.close()
1
179
5/17/2015 8:04:19 PM

Accepted Answer

Make sure you understand the three return values of os.walk:

for root, subdirs, files in os.walk(rootdir):

has the following meaning:

  • root: Current path which is "walked through"
  • subdirs: Files in root of type directory
  • files: Files in root (not in subdirs) of type other than directory

And please use os.path.join instead of concatenating with a slash! Your problem is filePath = rootdir + '/' + file - you must concatenate the currently "walked" folder instead of the topmost folder. So that must be filePath = os.path.join(root, file). BTW "file" is a builtin, so you don't normally use it as variable name.

Another problem are your loops, which should be like this, for example:

import os
import sys

walk_dir = sys.argv[1]

print('walk_dir = ' + walk_dir)

# If your current working directory may change during script execution, it's recommended to
# immediately convert program arguments to an absolute path. Then the variable root below will
# be an absolute path as well. Example:
# walk_dir = os.path.abspath(walk_dir)
print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))

for root, subdirs, files in os.walk(walk_dir):
    print('--\nroot = ' + root)
    list_file_path = os.path.join(root, 'my-directory-list.txt')
    print('list_file_path = ' + list_file_path)

    with open(list_file_path, 'wb') as list_file:
        for subdir in subdirs:
            print('\t- subdirectory ' + subdir)

        for filename in files:
            file_path = os.path.join(root, filename)

            print('\t- file %s (full path: %s)' % (filename, file_path))

            with open(file_path, 'rb') as f:
                f_content = f.read()
                list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))
                list_file.write(f_content)
                list_file.write(b'\n')

If you didn't know, the with statement for files is a shorthand:

with open('filename', 'rb') as f:
    dosomething()

# is effectively the same as

f = open('filename', 'rb')
try:
    dosomething()
finally:
    f.close()
297
8/21/2014 7:50:29 AM

If you are using Python 3.5 or above, you can get this done in 1 line.

import glob

for filename in glob.iglob(root_dir + '**/*.txt', recursive=True):
     print(filename)

As mentioned in the documentation

If recursive is true, the pattern '**' will match any files and zero or more directories and subdirectories.

If you want every file, you can use

import glob

for filename in glob.iglob(root_dir + '**/*', recursive=True):
     print(filename)

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon