numpy.savetxt ()

Arrays | File handling | NumPy | Python Methods and Functions | savetxt

numpy.savetxt (fname, X, fmt = & # 39;%. 18e & # 39 ;, delimiter = & # 39; & # 39 ;, newline = & # 39; / n & # 39 ;, header = & # 39; & # 39 ;, footer = & # 39; & # 39 ;, comments = & # 39; # & # 39 ;, encoding = None) : this Method is used to save the array to a text file.

Parameters:
fname: If the filename ends in .gz , the file is automatically saved in compressed gzip format. loadtxt understands gzipped files transparently.
X : [1D or 2D array_like] Data to be saved to a text file.
fmt : A single format (% 10.5f), a sequence of formats, or a multi-format string, eg `Iteration% d -% 10.5f`, in which case delimiter is ignored.
delimiter : String or character separating columns.
newline : String or character separating lines.
header : String that will be written at the beginning of the file.
footer : String that will be written at the end of the file.
comments : String that will be prepended to the header and footer strings, to mark them as comments. Default: `#`, as expected by eg numpy.loadtxt.
encoding : Encoding used to encode the output file. Does not apply to output streams. If the encoding is something other than `bytes` or `latin1` you will not be able to load the file in NumPy versions & lt; 1.14. Default is `latin1`.

Code # 1:

# Python program explaining
# savetxt () function

import numpy as geek

 

x = geek.arange ( 0 , 10 , 1 )

print ( " x is: " )

print (x)

  
# X is mac siv

c = geek.savetxt ( `geekfile.txt` , x, delimiter = `,`

a = open ( "geekfile.txt" , `r` ) # open file in reading mode

  

print ( "the file contains:" )

print (a.read ())

Output:

 x is: [0 1 2 3 4 5 6 7 8 9 ] the file contains: 0.000000000000000000e + 00 1.000000000000000000e + 00 2.000000000000000000e + 00 3.000000000000000000e + 00 4.000000000000000000e + 00 5.000000000000000000e + 00 6.000000000000000000e + 00 7.000000000000000000e + 00 8.000000000000000000e + 00 9.000000000000000000e + 00 
Code # 2:

# Python program explaining
# savetxt () function

 

import numpy as geek

 

x = geek.arange ( 0 , 10 , 1 )

y = geek.arange ( 10 , 20 , 1 )

z = geek.arange ( 20 , 30 , 1 )

print ( "x is:" )

print (x)

 

print ( "y is:" )

print (y)

 

print ( "z is:" )

print (z)

 
# x, y, z 3 arrays with the same size

c = geek.savetxt ( `geekfile.txt` , (x, y, z)) 

a = open ( "geekfile.txt" , `r` ) # open file in read mode

 

print ( "the file contains:" )

print (a.read ())

Output:

x is:
[0 1 2 3 4 5 6 7 8 9]
y is:
[10 11 12 13 14 15 16 17 18 19]
z is:
[20 21 22 23 24 25 26 27 28 29]

the file contains:
0.000000000000000000e + 00 1.000000000000000000e + 00 2.000000000000000000e + 00 3.000000000000000000e + 00 4.000000000000000000e + 00 5.000000000000000000e + 00 6.000000000000000000e + 00 7.000000000000000000e + 00 8.000000000000000000e + 00 9.000000000000000000e + 00 br /> 1.000000000000000000e + 01 1.100000000000000000e + 01 1.200000000000000000e + 01 1.300000000000000000e + 01 1.400000000000000000e + 01 1.500000000000000000e + 01 1.600000000000000000e + 01 1.700000000000000000e + 01 1.800000000000000000e + 01 1.900000000000000000e + 01 1.900000000000000000e + 01 1.900000000000000000e + 0100000000
0000000000 01 2.100000000000000000e + 01 2.200000000000 000000e + 01 2.300000000000000000e + 01 2.400000000000000000e + 01 2.500000000000000000e + 01 2.600000000000000000e + 01 2.700000000000000000e + 01 2.800000000000000000e + 01 2.900000000000000000e + 01

Code # 3: TypeError

# Python program explaining
# savetxt () function

 

import numpy as geek

 

x = geek.arange ( 0 , 10 , 1 )

y = geek.arange ( 0 , 20 , 1 )

z = geek.arange ( 0 , 30 , 1 )

print ( "x is:" )

print (x)

 

print ( " y is: " )

print (y)

  

print ( "z is:" )

print (z)

 
# x, y, z - 3 arrays with the same size

c = geek.savetxt ( ` geekfile.txt` , (x, y, z)) 

Exit :

fh.write (asbytes (format% tuple (row) + newline))
TypeError: only length-1 arrays can be converted to Python scalars

During handling of the above exception, another exception occurred:

% (str (X.dtype), format))
TypeError: Mismatch between array dtype (`object`) and format specifier (`% .18e`)

 hiccups measurement error.





numpy.savetxt (): StackOverflow Questions

Answer #1

The most reliable way I have found to do this is to use np.savetxt with np.loadtxt and not np.fromfile which is better suited to binary files written with tofile. The np.fromfile and np.tofile methods write and read binary files whereas np.savetxt writes a text file. So, for example:

a = np.array([1, 2, 3, 4])
np.savetxt("test1.txt", a, fmt="%d")
b = np.loadtxt("test1.txt", dtype=int)
a == b
# array([ True,  True,  True,  True], dtype=bool)

Or:

a.tofile("test2.dat")
c = np.fromfile("test2.dat", dtype=int)
c == a
# array([ True,  True,  True,  True], dtype=bool)

I use the former method even if it is slower and creates bigger files (sometimes): the binary format can be platform dependent (for example, the file format depends on the endianness of your system).

There is a platform independent format for NumPy arrays, which can be saved and read with np.save and np.load:

np.save("test3.npy", a)    # .npy extension is added if not given
d = np.load("test3.npy")
a == d
# array([ True,  True,  True,  True], dtype=bool)

Answer #2

numpy.savetxt saves an array to a text file.

import numpy
a = numpy.asarray([ [1,2,3], [4,5,6], [7,8,9] ])
numpy.savetxt("foo.csv", a, delimiter=";")

Answer #3

Well, I decided to workout myself on my question to solve above problem. What I wanted is to implement a simpl OCR using KNearest or SVM features in OpenCV. And below is what I did and how. ( it is just for learning how to use KNearest for simple OCR purposes).

1) My first question was about letter_recognition.data file that comes with OpenCV samples. I wanted to know what is inside that file.

It contains a letter, along with 16 features of that letter.

And this SOF helped me to find it. These 16 features are explained in the paperLetter Recognition Using Holland-Style Adaptive Classifiers. ( Although I didn"t understand some of the features at end)

2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.

So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)

I took below image for my training data:

enter image description here

( I know the amount of training data is less. But, since all letters are of same font and size, I decided to try on this).

To prepare the data for training, I made a small code in OpenCV. It does following things:

  1. It loads the image.
  2. Selects the digits ( obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
  3. Draws the bounding rectangle around one letter and wait for key press manually. This time we press the digit key ourselves corresponding to the letter in box.
  4. Once corresponding digit key is pressed, it resizes this box to 10x10 and saves 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).
  5. Then save both the arrays in separate txt files.

At the end of manual classification of digits, all the digits in the train data( train.png) are labeled manually by ourselves, image will look like below:

enter image description here

Below is the code I used for above purpose ( of course, not so clean):

import sys

import numpy as np
import cv2

im = cv2.imread("pitrain.png")
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

#################      Now finding Contours         ###################

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)

        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow("norm",im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"

np.savetxt("generalsamples.data",samples)
np.savetxt("generalresponses.data",responses)

Now we enter in to training and testing part.

For testing part I used below image, which has same type of letters I used to train.

enter image description here

For training we do as follows:

  1. Load the txt files we already saved earlier
  2. create a instance of classifier we are using ( here, it is KNearest)
  3. Then we use KNearest.train function to train the data

For testing purposes, we do as follows:

  1. We load the image used for testing
  2. process the image as earlier and extract each digit using contour methods
  3. Draw bounding box for it, then resize to 10x10, and store its pixel values in an array as done earlier.
  4. Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognises the correct digit.)

I included last two steps ( training and testing) in single code below:

import cv2
import numpy as np

#######   training part    ############### 
samples = np.loadtxt("generalsamples.data",np.float32)
responses = np.loadtxt("generalresponses.data",np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

############################# testing part  #########################

im = cv2.imread("pi.png")
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            roismall = roismall.reshape((1,100))
            roismall = np.float32(roismall)
            retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
            string = str(int((results[0][0])))
            cv2.putText(out,string,(x,y+h),0,1,(0,255,0))

cv2.imshow("im",im)
cv2.imshow("out",out)
cv2.waitKey(0)

And it worked, below is the result I got:

enter image description here


Here it worked with 100% accuracy. I assume this is because all the digits are of same kind and same size.

But any way, this is a good start to go for beginners ( I hope so).

Answer #4

If you want to write it to disk so that it will be easy to read back in as a numpy array, look into numpy.save. Pickling it will work fine, as well, but it"s less efficient for large arrays (which yours isn"t, so either is perfectly fine).

If you want it to be human readable, look into numpy.savetxt.

Edit: So, it seems like savetxt isn"t quite as great an option for arrays with >2 dimensions... But just to draw everything out to it"s full conclusion:

I just realized that numpy.savetxt chokes on ndarrays with more than 2 dimensions... This is probably by design, as there"s no inherently defined way to indicate additional dimensions in a text file.

E.g. This (a 2D array) works fine

import numpy as np
x = np.arange(20).reshape((4,5))
np.savetxt("test.txt", x)

While the same thing would fail (with a rather uninformative error: TypeError: float argument required, not numpy.ndarray) for a 3D array:

import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt("test.txt", x)

One workaround is just to break the 3D (or greater) array into 2D slices. E.g.

x = np.arange(200).reshape((4,5,10))
with open("test.txt", "w") as outfile:
    for slice_2d in x:
        np.savetxt(outfile, slice_2d)

However, our goal is to be clearly human readable, while still being easily read back in with numpy.loadtxt. Therefore, we can be a bit more verbose, and differentiate the slices using commented out lines. By default, numpy.loadtxt will ignore any lines that start with # (or whichever character is specified by the comments kwarg). (This looks more verbose than it actually is...)

import numpy as np

# Generate some test data
data = np.arange(200).reshape((4,5,10))

# Write the array to disk
with open("test.txt", "w") as outfile:
    # I"m writing a header here just for the sake of readability
    # Any line starting with "#" will be ignored by numpy.loadtxt
    outfile.write("# Array shape: {0}
".format(data.shape))
    
    # Iterating through a ndimensional array produces slices along
    # the last axis. This is equivalent to data[i,:,:] in this case
    for data_slice in data:

        # The formatting string indicates that I"m writing out
        # the values in left-justified columns 7 characters in width
        # with 2 decimal places.  
        np.savetxt(outfile, data_slice, fmt="%-7.2f")

        # Writing out a break to indicate different slices...
        outfile.write("# New slice
")

This yields:

# Array shape: (4, 5, 10)
0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   
10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  
20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  
30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  
40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  
# New slice
50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  
60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  
70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  
80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  
90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  
# New slice
100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 
110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 
120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 
130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 
140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 
# New slice
150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 
160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 
170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 
180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 
190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 
# New slice

Reading it back in is very easy, as long as we know the shape of the original array. We can just do numpy.loadtxt("test.txt").reshape((4,5,10)). As an example (You can do this in one line, I"m just being verbose to clarify things):

# Read the array from disk
new_data = np.loadtxt("test.txt")

# Note that this returned a 2D array!
print new_data.shape

# However, going back to 3D is easy if we know the 
# original shape of the array
new_data = new_data.reshape((4,5,10))
    
# Just to check that they"re the same...
assert np.all(new_data == data)

Get Solution for free from DataCamp guru