I have a C++/Obj-C background and I am just discovering Python (been writing it for about an hour). I am writing a script to recursively read the contents of text files in a folder structure.
The problem I have is the code I have written will only work for one folder deep. I can see why in the code (see
#hardcoded path), I just don't know how I can move forward with Python since my experience with it is only brand new.
import os import sys rootdir = sys.argv for root, subFolders, files in os.walk(rootdir): for folder in subFolders: outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path folderOut = open( outfileName, 'w' ) print "outfileName is " + outfileName for file in files: filePath = rootdir + '/' + file f = open( filePath, 'r' ) toWrite = f.read() print "Writing '" + toWrite + "' to" + filePath folderOut.write( toWrite ) f.close() folderOut.close()
Make sure you understand the three return values of
for root, subdirs, files in os.walk(rootdir):
has the following meaning:
root: Current path which is "walked through"
subdirs: Files in
rootof type directory
files: Files in
subdirs) of type other than directory
And please use
os.path.join instead of concatenating with a slash! Your problem is
filePath = rootdir + '/' + file - you must concatenate the currently "walked" folder instead of the topmost folder. So that must be
filePath = os.path.join(root, file). BTW "file" is a builtin, so you don't normally use it as variable name.
Another problem are your loops, which should be like this, for example:
import os import sys walk_dir = sys.argv print('walk_dir = ' + walk_dir) # If your current working directory may change during script execution, it's recommended to # immediately convert program arguments to an absolute path. Then the variable root below will # be an absolute path as well. Example: # walk_dir = os.path.abspath(walk_dir) print('walk_dir (absolute) = ' + os.path.abspath(walk_dir)) for root, subdirs, files in os.walk(walk_dir): print('--\nroot = ' + root) list_file_path = os.path.join(root, 'my-directory-list.txt') print('list_file_path = ' + list_file_path) with open(list_file_path, 'wb') as list_file: for subdir in subdirs: print('\t- subdirectory ' + subdir) for filename in files: file_path = os.path.join(root, filename) print('\t- file %s (full path: %s)' % (filename, file_path)) with open(file_path, 'rb') as f: f_content = f.read() list_file.write(('The file %s contains:\n' % filename).encode('utf-8')) list_file.write(f_content) list_file.write(b'\n')
If you didn't know, the
with statement for files is a shorthand:
with open('filename', 'rb') as f: dosomething() # is effectively the same as f = open('filename', 'rb') try: dosomething() finally: f.close()
If you are using Python 3.5 or above, you can get this done in 1 line.
import glob for filename in glob.iglob(root_dir + '**/*.txt', recursive=True): print(filename)
As mentioned in the documentation
If recursive is true, the pattern '**' will match any files and zero or more directories and subdirectories.
If you want every file, you can use
import glob for filename in glob.iglob(root_dir + '**/*', recursive=True): print(filename)