How to write a multidimensional array to a text file?


Question

In another question, other users offered some help if I could supply the array I was having trouble with. However, I even fail at a basic I/O task, such as writing an array to a file.

Can anyone explain what kind of loop I would need to write a 4x11x14 numpy array to file?

This array consist of four 11 x 14 arrays, so I should format it with a nice newline, to make the reading of the file easier on others.

Edit: So I've tried the numpy.savetxt function. Strangely, it gives the following error:

TypeError: float argument required, not numpy.ndarray

I assume that this is because the function doesn't work with multidimensional arrays? Any solutions as I would like them within one file?

1
99
9/11/2010 2:30:43 PM

Accepted Answer

If you want to write it to disk so that it will be easy to read back in as a numpy array, look into numpy.save. Pickling it will work fine, as well, but it's less efficient for large arrays (which yours isn't, so either is perfectly fine).

If you want it to be human readable, look into numpy.savetxt.

Edit: So, it seems like savetxt isn't quite as great an option for arrays with >2 dimensions... But just to draw everything out to it's full conclusion:

I just realized that numpy.savetxt chokes on ndarrays with more than 2 dimensions... This is probably by design, as there's no inherently defined way to indicate additional dimensions in a text file.

E.g. This (a 2D array) works fine

import numpy as np
x = np.arange(20).reshape((4,5))
np.savetxt('test.txt', x)

While the same thing would fail (with a rather uninformative error: TypeError: float argument required, not numpy.ndarray) for a 3D array:

import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt('test.txt', x)

One workaround is just to break the 3D (or greater) array into 2D slices. E.g.

x = np.arange(200).reshape((4,5,10))
with file('test.txt', 'w') as outfile:
    for slice_2d in x:
        np.savetxt(outfile, slice_2d)

However, our goal is to be clearly human readable, while still being easily read back in with numpy.loadtxt. Therefore, we can be a bit more verbose, and differentiate the slices using commented out lines. By default, numpy.loadtxt will ignore any lines that start with # (or whichever character is specified by the comments kwarg). (This looks more verbose than it actually is...)

import numpy as np

# Generate some test data
data = np.arange(200).reshape((4,5,10))

# Write the array to disk
with open('test.txt', 'w') as outfile:
    # I'm writing a header here just for the sake of readability
    # Any line starting with "#" will be ignored by numpy.loadtxt
    outfile.write('# Array shape: {0}\n'.format(data.shape))

    # Iterating through a ndimensional array produces slices along
    # the last axis. This is equivalent to data[i,:,:] in this case
    for data_slice in data:

        # The formatting string indicates that I'm writing out
        # the values in left-justified columns 7 characters in width
        # with 2 decimal places.  
        np.savetxt(outfile, data_slice, fmt='%-7.2f')

        # Writing out a break to indicate different slices...
        outfile.write('# New slice\n')

This yields:

# Array shape: (4, 5, 10)
0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   
10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  
20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  
30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  
40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  
# New slice
50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  
60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  
70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  
80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  
90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  
# New slice
100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 
110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 
120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 
130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 
140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 
# New slice
150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 
160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 
170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 
180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 
190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 
# New slice

Reading it back in is very easy, as long as we know the shape of the original array. We can just do numpy.loadtxt('test.txt').reshape((4,5,10)). As an example (You can do this in one line, I'm just being verbose to clarify things):

# Read the array from disk
new_data = np.loadtxt('test.txt')

# Note that this returned a 2D array!
print new_data.shape

# However, going back to 3D is easy if we know the 
# original shape of the array
new_data = new_data.reshape((4,5,10))

# Just to check that they're the same...
assert np.all(new_data == data)
180
12/28/2018 6:57:38 AM

I am not certain if this meets your requirements, given I think you are interested in making the file readable by people, but if that's not a primary concern, just pickle it.

To save it:

import pickle

my_data = {'a': [1, 2.0, 3, 4+6j],
           'b': ('string', u'Unicode string'),
           'c': None}
output = open('data.pkl', 'wb')
pickle.dump(my_data, output)
output.close()

To read it back:

import pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)
pprint.pprint(data1)

pkl_file.close()

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon