  # numpy.square () in Python

NumPy | Python Methods and Functions | square

Parameters :

`  arr:   [array_like]  Input array or object whose elements, we need to square. `

Return:

` An array with square value of each array. `

Code # 1: Work

 ` # Python program explaining ` ` # square () function `   ` import ` ` numpy as np ` ` `  ` arr1 ` ` = ` ` [` ` 1 , - 3 , 15 , - 466 ] `` print ( "Square Value of arr1:" , np.square (arr1))   arr2 = [ 23 , - 56 ] print ( "Square Value of arr2:" , np.square (arr2) ) `

Output:

` Square Value of arr1: [1 9 225 217156] Square Value of arr2: [529 3136] `

Code # 2: Working with Complex Numbers

 ` # Python program explaining ` ` # square () function `   ` import numpy as np ``   a = 4 + 3j print ( "Square (4 + 3j):" , np.square (a))   b = 16 + 13j print ( " Square value (16 + 13j): " , np.square (b)) `

Output:

` Square (4 + 3j): (7 + 24j) Square value (16 + 13j): (87 + 416j) `

Code # 3: Graphical representation of numpy.square ()

` `

` # Python program explaining # square () function   import numpy as np import matplotlib.pyplot as plt   a = np.linspace (start = - 5 , stop = 5 , num = 6 , endpoint = True )   print ( " Graphical Representation: " , np.square (a))    plt.title ( "blue: with square red: without square" ) plt.plot (a, np.square (a))   plt.plot (a, a, color = `red` ) plt.show () `

Output:

` Graphical Representation: [25. 9. 1. 1. 1. 9. 25.] `

## How to remove convexity defects in a Sudoku square?

I was doing a fun project: Solving a Sudoku from an input image using OpenCV (as in Google goggles etc). And I have completed the task, but at the end I found a little problem for which I came here.

I did the programming using Python API of OpenCV 2.3.1.

Below is what I did :

2. Find the contours
3. Select the one with maximum area, ( and also somewhat equivalent to square).
4. Find the corner points.

e.g. given below: (Notice here that the green line correctly coincides with the true boundary of the Sudoku, so the Sudoku can be correctly warped. Check next image)

5. warp the image to a perfect square

eg image: 6. Perform OCR ( for which I used the method I have given in Simple Digit Recognition OCR in OpenCV-Python )

And the method worked well.

Problem:

Check out this image.

Performing the step 4 on this image gives the result below: The red line drawn is the original contour which is the true outline of sudoku boundary.

The green line drawn is approximated contour which will be the outline of warped image.

Which of course, there is difference between green line and red line at the top edge of sudoku. So while warping, I am not getting the original boundary of the Sudoku.

My Question :

How can I warp the image on the correct boundary of the Sudoku, i.e. the red line OR how can I remove the difference between red line and green line? Is there any method for this in OpenCV?

## Is there a library function for Root mean square error (RMSE) in python?

I know I could implement a root mean squared error function like this:

``````def rmse(predictions, targets):
return np.sqrt(((predictions - targets) ** 2).mean())
``````

What I"m looking for if this rmse function is implemented in a library somewhere, perhaps in scipy or scikit-learn?

## What"s the difference between lists enclosed by square brackets and parentheses in Python?

``````>>> x=[1,2]
>>> x
2
>>> x=(1,2)
>>> x
2
``````

Are they both valid? Is one preferred for some reason?

## How do I calculate square root in Python?

Why does Python give the "wrong" answer?

``````x = 16

sqrt = x**(.5)  #returns 4
sqrt = x**(1/2) #returns 1
``````

Yes, I know `import math` and use `sqrt`. But I"m looking for an answer to the above.

## What do square brackets mean in pip install?

I see more and more commands like this:

``````\$ pip install "splinter[django]"
``````

What do these square brackets do?

## How do I calculate r-squared using Python and Numpy?

I"m using Python and Numpy to calculate a best fit polynomial of arbitrary degree. I pass a list of x values, y values, and the degree of the polynomial I want to fit (linear, quadratic, etc.).

This much works, but I also want to calculate r (coefficient of correlation) and r-squared(coefficient of determination). I am comparing my results with Excel"s best-fit trendline capability, and the r-squared value it calculates. Using this, I know I am calculating r-squared correctly for linear best-fit (degree equals 1). However, my function does not work for polynomials with degree greater than 1.

Excel is able to do this. How do I calculate r-squared for higher-order polynomials using Numpy?

Here"s my function:

``````import numpy

# Polynomial Regression
def polyfit(x, y, degree):
results = {}

coeffs = numpy.polyfit(x, y, degree)
# Polynomial Coefficients
results["polynomial"] = coeffs.tolist()

correlation = numpy.corrcoef(x, y)[0,1]

# r
results["correlation"] = correlation
# r-squared
results["determination"] = correlation**2

return results
``````

## What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?

I"ve noticed three methods of selecting a column in a Pandas DataFrame:

First method of selecting a column using loc:

``````df_new = df.loc[:, "col1"]
``````

Second method - seems simpler and faster:

``````df_new = df["col1"]
``````

Third method - most convenient:

``````df_new = df.col1
``````

Is there a difference between these three methods? I don"t think so, in which case I"d rather use the third method.

I"m mostly curious as to why there appear to be three methods for doing the same thing.

I think you"re almost there, try removing the extra square brackets around the `lst`"s (Also you don"t need to specify the column names when you"re creating a dataframe from a dict like this):

``````import pandas as pd
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)
percentile_list = pd.DataFrame(
{"lst1Title": lst1,
"lst2Title": lst2,
"lst3Title": lst3
})

percentile_list
lst1Title  lst2Title  lst3Title
0          0         0         0
1          1         1         1
2          2         2         2
3          3         3         3
4          4         4         4
5          5         5         5
6          6         6         6
...
``````

If you need a more performant solution you can use `np.column_stack` rather than `zip` as in your first attempt, this has around a 2x speedup on the example here, however comes at bit of a cost of readability in my opinion:

``````import numpy as np
percentile_list = pd.DataFrame(np.column_stack([lst1, lst2, lst3]),
columns=["lst1Title", "lst2Title", "lst3Title"])
``````

There is a clean, one-line way of doing this in Pandas:

``````df["col_3"] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
``````

This allows `f` to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.

Example with data (based on original question):

``````import pandas as pd

df = pd.DataFrame({"ID":["1", "2", "3"], "col_1": [0, 2, 3], "col_2":[1, 4, 5]})
mylist = ["a", "b", "c", "d", "e", "f"]

def get_sublist(sta,end):
return mylist[sta:end+1]

df["col_3"] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)
``````

Output of `print(df)`:

``````  ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]
``````

If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:

``````df["col_3"] = df.apply(lambda x: f(x["col 1"], x["col 2"]), axis=1)
``````

# Distribution Fitting with Sum of Square Error (SSE)

This is an update and modification to Saullo"s answer, that uses the full list of the current `scipy.stats` distributions and returns the distribution with the least SSE between the distribution"s histogram and the data"s histogram.

## Example Fitting

Using the El Ni√±o dataset from `statsmodels`, the distributions are fit and error is determined. The distribution with the least error is returned.

### All Distributions ### Best Fit Distribution ### Example Code

``````%matplotlib inline

import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt

matplotlib.rcParams["figure.figsize"] = (16.0, 12.0)
matplotlib.style.use("ggplot")

# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
"""Model data by finding best fit distribution to data"""
# Get histogram of original data
y, x = np.histogram(data, bins=bins, density=True)
x = (x + np.roll(x, -1))[:-1] / 2.0

# Best holders
best_distributions = []

# Estimate distribution parameters from data
for ii, distribution in enumerate([d for d in _distn_names if not d in ["levy_stable", "studentized_range"]]):

print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))

distribution = getattr(st, distribution)

# Try to fit the distribution
try:
# Ignore warnings from data that can"t be fit
with warnings.catch_warnings():
warnings.filterwarnings("ignore")

# fit dist to data
params = distribution.fit(data)

# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]

# Calculate fitted PDF and error with fit in distribution
pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
sse = np.sum(np.power(y - pdf, 2.0))

# if axis pass in add to plot
try:
if ax:
pd.Series(pdf, x).plot(ax=ax)
end
except Exception:
pass

# identify if this distribution is better
best_distributions.append((distribution, params, sse))

except Exception:
pass

return sorted(best_distributions, key=lambda x:x)

def make_pdf(dist, params, size=10000):
"""Generate distributions"s Probability Distribution Function """

# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]

# Get sane start and end points of distribution
start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)

# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = dist.pdf(x, loc=loc, scale=scale, *arg)
pdf = pd.Series(y, x)

return pdf

# Load data from statsmodels datasets

# Plot for comparison
plt.figure(figsize=(12,8))
ax = data.plot(kind="hist", bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams["axes.prop_cycle"])["color"])

# Save plot limits
dataYLim = ax.get_ylim()

# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions

# Update plots
ax.set_ylim(dataYLim)
ax.set_title(u"El Ni√±o sea temp.
All Fitted Distributions")
ax.set_xlabel(u"Temp (¬∞C)")
ax.set_ylabel("Frequency")

# Make PDF with best params
pdf = make_pdf(best_dist, best_dist)

# Display
plt.figure(figsize=(12,8))
ax = pdf.plot(lw=2, label="PDF", legend=True)
data.plot(kind="hist", bins=50, density=True, alpha=0.5, label="Data", legend=True, ax=ax)

param_names = (best_dist.shapes + ", loc, scale").split(", ") if best_dist.shapes else ["loc", "scale"]
param_str = ", ".join(["{}={:0.2f}".format(k,v) for k,v in zip(param_names, best_dist)])
dist_str = "{}({})".format(best_dist.name, param_str)

ax.set_title(u"El Ni√±o sea temp. with best fit distribution
" + dist_str)
ax.set_xlabel(u"Temp. (¬∞C)")
ax.set_ylabel("Frequency")
``````

TL;DR

``````def square_list(n):
the_list = []                         # Replace
for x in range(n):
y = x * x
the_list.append(y)                # these
return the_list                       # lines
``````

# do this:

``````def square_yield(n):
for x in range(n):
y = x * x
yield y                           # with this one.
``````

Whenever you find yourself building a list from scratch, `yield` each piece instead.

This was my first "aha" moment with yield.

`yield` is a sugary way to say

build a series of stuff

Same behavior:

``````>>> for square in square_list(4):
...     print(square)
...
0
1
4
9
>>> for square in square_yield(4):
...     print(square)
...
0
1
4
9
``````

Different behavior:

Yield is single-pass: you can only iterate through once. When a function has a yield in it we call it a generator function. And an iterator is what it returns. Those terms are revealing. We lose the convenience of a container, but gain the power of a series that"s computed as needed, and arbitrarily long.

Yield is lazy, it puts off computation. A function with a yield in it doesn"t actually execute at all when you call it. It returns an iterator object that remembers where it left off. Each time you call `next()` on the iterator (this happens in a for-loop) execution inches forward to the next yield. `return` raises StopIteration and ends the series (this is the natural end of a for-loop).

Yield is versatile. Data doesn"t have to be stored all together, it can be made available one at a time. It can be infinite.

``````>>> def squares_all_of_them():
...     x = 0
...     while True:
...         yield x * x
...         x += 1
...
>>> squares = squares_all_of_them()
>>> for _ in range(4):
...     print(next(squares))
...
0
1
4
9
``````

If you need multiple passes and the series isn"t too long, just call `list()` on it:

``````>>> list(square_yield(4))
[0, 1, 4, 9]
``````

Brilliant choice of the word `yield` because both meanings apply:

yield — produce or provide (as in agriculture)

...provide the next data in the series.

yield — give way or relinquish (as in political power)

...relinquish CPU execution until the iterator advances.

This is kind of overkill but let"s give it a go. First lets use statsmodel to find out what the p-values should be

``````import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
from scipy import stats

X = diabetes.data
y = diabetes.target

est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())
``````

and we get

``````                         OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.518
Method:                 Least Squares   F-statistic:                     46.27
Date:                Wed, 08 Mar 2017   Prob (F-statistic):           3.83e-62
Time:                        10:08:24   Log-Likelihood:                -2386.0
No. Observations:                 442   AIC:                             4794.
Df Residuals:                     431   BIC:                             4839.
Df Model:                          10
Covariance Type:            nonrobust
==============================================================================
coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        152.1335      2.576     59.061      0.000     147.071     157.196
x1           -10.0122     59.749     -0.168      0.867    -127.448     107.424
x2          -239.8191     61.222     -3.917      0.000    -360.151    -119.488
x3           519.8398     66.534      7.813      0.000     389.069     650.610
x4           324.3904     65.422      4.958      0.000     195.805     452.976
x5          -792.1842    416.684     -1.901      0.058   -1611.169      26.801
x6           476.7458    339.035      1.406      0.160    -189.621    1143.113
x7           101.0446    212.533      0.475      0.635    -316.685     518.774
x8           177.0642    161.476      1.097      0.273    -140.313     494.442
x9           751.2793    171.902      4.370      0.000     413.409    1089.150
x10           67.6254     65.984      1.025      0.306     -62.065     197.316
==============================================================================
Omnibus:                        1.506   Durbin-Watson:                   2.029
Prob(Omnibus):                  0.471   Jarque-Bera (JB):                1.404
Skew:                           0.017   Prob(JB):                        0.496
Kurtosis:                       2.726   Cond. No.                         227.
==============================================================================
``````

Ok, let"s reproduce this. It is kind of overkill as we are almost reproducing a linear regression analysis using Matrix Algebra. But what the heck.

``````lm = LinearRegression()
lm.fit(X,y)
params = np.append(lm.intercept_,lm.coef_)
predictions = lm.predict(X)

newX = pd.DataFrame({"Constant":np.ones(len(X))}).join(pd.DataFrame(X))
MSE = (sum((y-predictions)**2))/(len(newX)-len(newX.columns))

# Note if you don"t want to use a DataFrame replace the two lines above with
# newX = np.append(np.ones((len(X),1)), X, axis=1)
# MSE = (sum((y-predictions)**2))/(len(newX)-len(newX))

var_b = MSE*(np.linalg.inv(np.dot(newX.T,newX)).diagonal())
sd_b = np.sqrt(var_b)
ts_b = params/ sd_b

p_values =[2*(1-stats.t.cdf(np.abs(i),(len(newX)-len(newX)))) for i in ts_b]

sd_b = np.round(sd_b,3)
ts_b = np.round(ts_b,3)
p_values = np.round(p_values,3)
params = np.round(params,4)

myDF3 = pd.DataFrame()
myDF3["Coefficients"],myDF3["Standard Errors"],myDF3["t values"],myDF3["Probabilities"] = [params,sd_b,ts_b,p_values]
print(myDF3)
``````

And this gives us.

``````    Coefficients  Standard Errors  t values  Probabilities
0       152.1335            2.576    59.061         0.000
1       -10.0122           59.749    -0.168         0.867
2      -239.8191           61.222    -3.917         0.000
3       519.8398           66.534     7.813         0.000
4       324.3904           65.422     4.958         0.000
5      -792.1842          416.684    -1.901         0.058
6       476.7458          339.035     1.406         0.160
7       101.0446          212.533     0.475         0.635
8       177.0642          161.476     1.097         0.273
9       751.2793          171.902     4.370         0.000
10       67.6254           65.984     1.025         0.306
``````

So we can reproduce the values from statsmodel.

The problem is the use of `aspect="equal"`, which prevents the subplots from stretching to an arbitrary aspect ratio and filling up all the empty space.

Normally, this would work:

``````import matplotlib.pyplot as plt

ax = [plt.subplot(2,2,i+1) for i in range(4)]

for a in ax:
a.set_xticklabels([])
a.set_yticklabels([])

``````

The result is this: However, with `aspect="equal"`, as in the following code:

``````import matplotlib.pyplot as plt

ax = [plt.subplot(2,2,i+1) for i in range(4)]

for a in ax:
a.set_xticklabels([])
a.set_yticklabels([])
a.set_aspect("equal")

``````

This is what we get: The difference in this second case is that you"ve forced the x- and y-axes to have the same number of units/pixel. Since the axes go from 0 to 1 by default (i.e., before you plot anything), using `aspect="equal"` forces each axis to be a square. Since the figure is not a square, pyplot adds in extra spacing between the axes horizontally.

To get around this problem, you can set your figure to have the correct aspect ratio. We"re going to use the object-oriented pyplot interface here, which I consider to be superior in general:

``````import matplotlib.pyplot as plt

fig = plt.figure(figsize=(8,8)) # Notice the equal aspect ratio
ax = [fig.add_subplot(2,2,i+1) for i in range(4)]

for a in ax:
a.set_xticklabels([])
a.set_yticklabels([])
a.set_aspect("equal")

``````

Here"s the result: How about using `numpy.vectorize`.

``````import numpy as np
x = np.array([1, 2, 3, 4, 5])
squarer = lambda t: t ** 2
vfunc = np.vectorize(squarer)
vfunc(x)
# Output : array([ 1,  4,  9, 16, 25])
``````

Use imap instead of map, which returns an iterator of processed values.

``````from multiprocessing import Pool
import tqdm
import time

def _foo(my_number):
square = my_number * my_number
time.sleep(1)
return square

if __name__ == "__main__":
with Pool(2) as p:
r = list(tqdm.tqdm(p.imap(_foo, range(30)), total=30))
``````

I"d like to shed a little bit more light on the interplay of `iter`, `__iter__` and `__getitem__` and what happens behind the curtains. Armed with that knowledge, you will be able to understand why the best you can do is

``````try:
iter(maybe_iterable)
print("iteration will probably work")
except TypeError:
print("not iterable")
``````

I will list the facts first and then follow up with a quick reminder of what happens when you employ a `for` loop in python, followed by a discussion to illustrate the facts.

# Facts

1. You can get an iterator from any object `o` by calling `iter(o)` if at least one of the following conditions holds true:

a) `o` has an `__iter__` method which returns an iterator object. An iterator is any object with an `__iter__` and a `__next__` (Python 2: `next`) method.

b) `o` has a `__getitem__` method.

2. Checking for an instance of `Iterable` or `Sequence`, or checking for the attribute `__iter__` is not enough.

3. If an object `o` implements only `__getitem__`, but not `__iter__`, `iter(o)` will construct an iterator that tries to fetch items from `o` by integer index, starting at index 0. The iterator will catch any `IndexError` (but no other errors) that is raised and then raises `StopIteration` itself.

4. In the most general sense, there"s no way to check whether the iterator returned by `iter` is sane other than to try it out.

5. If an object `o` implements `__iter__`, the `iter` function will make sure that the object returned by `__iter__` is an iterator. There is no sanity check if an object only implements `__getitem__`.

6. `__iter__` wins. If an object `o` implements both `__iter__` and `__getitem__`, `iter(o)` will call `__iter__`.

7. If you want to make your own objects iterable, always implement the `__iter__` method.

# `for` loops

In order to follow along, you need an understanding of what happens when you employ a `for` loop in Python. Feel free to skip right to the next section if you already know.

When you use `for item in o` for some iterable object `o`, Python calls `iter(o)` and expects an iterator object as the return value. An iterator is any object which implements a `__next__` (or `next` in Python 2) method and an `__iter__` method.

By convention, the `__iter__` method of an iterator should return the object itself (i.e. `return self`). Python then calls `next` on the iterator until `StopIteration` is raised. All of this happens implicitly, but the following demonstration makes it visible:

``````import random

class DemoIterable(object):
def __iter__(self):
print("__iter__ called")
return DemoIterator()

class DemoIterator(object):
def __iter__(self):
return self

def __next__(self):
print("__next__ called")
r = random.randint(1, 10)
if r == 5:
print("raising StopIteration")
raise StopIteration
return r
``````

Iteration over a `DemoIterable`:

``````>>> di = DemoIterable()
>>> for x in di:
...     print(x)
...
__iter__ called
__next__ called
9
__next__ called
8
__next__ called
10
__next__ called
3
__next__ called
10
__next__ called
raising StopIteration
``````

# Discussion and illustrations

On point 1 and 2: getting an iterator and unreliable checks

Consider the following class:

``````class BasicIterable(object):
def __getitem__(self, item):
if item == 3:
raise IndexError
return item
``````

Calling `iter` with an instance of `BasicIterable` will return an iterator without any problems because `BasicIterable` implements `__getitem__`.

``````>>> b = BasicIterable()
>>> iter(b)
<iterator object at 0x7f1ab216e320>
``````

However, it is important to note that `b` does not have the `__iter__` attribute and is not considered an instance of `Iterable` or `Sequence`:

``````>>> from collections import Iterable, Sequence
>>> hasattr(b, "__iter__")
False
>>> isinstance(b, Iterable)
False
>>> isinstance(b, Sequence)
False
``````

This is why Fluent Python by Luciano Ramalho recommends calling `iter` and handling the potential `TypeError` as the most accurate way to check whether an object is iterable. Quoting directly from the book:

As of Python 3.4, the most accurate way to check whether an object `x` is iterable is to call `iter(x)` and handle a `TypeError` exception if it isn‚Äôt. This is more accurate than using `isinstance(x, abc.Iterable)` , because `iter(x)` also considers the legacy `__getitem__` method, while the `Iterable` ABC does not.

On point 3: Iterating over objects which only provide `__getitem__`, but not `__iter__`

Iterating over an instance of `BasicIterable` works as expected: Python constructs an iterator that tries to fetch items by index, starting at zero, until an `IndexError` is raised. The demo object"s `__getitem__` method simply returns the `item` which was supplied as the argument to `__getitem__(self, item)` by the iterator returned by `iter`.

``````>>> b = BasicIterable()
>>> it = iter(b)
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
``````

Note that the iterator raises `StopIteration` when it cannot return the next item and that the `IndexError` which is raised for `item == 3` is handled internally. This is why looping over a `BasicIterable` with a `for` loop works as expected:

``````>>> for x in b:
...     print(x)
...
0
1
2
``````

Here"s another example in order to drive home the concept of how the iterator returned by `iter` tries to access items by index. `WrappedDict` does not inherit from `dict`, which means instances won"t have an `__iter__` method.

``````class WrappedDict(object): # note: no inheritance from dict!
def __init__(self, dic):
self._dict = dic

def __getitem__(self, item):
try:
return self._dict[item] # delegate to dict.__getitem__
except KeyError:
raise IndexError
``````

Note that calls to `__getitem__` are delegated to `dict.__getitem__` for which the square bracket notation is simply a shorthand.

``````>>> w = WrappedDict({-1: "not printed",
...                   0: "hi", 1: "StackOverflow", 2: "!",
...                   4: "not printed",
...                   "x": "not printed"})
>>> for x in w:
...     print(x)
...
hi
StackOverflow
!
``````

On point 4 and 5: `iter` checks for an iterator when it calls `__iter__`:

When `iter(o)` is called for an object `o`, `iter` will make sure that the return value of `__iter__`, if the method is present, is an iterator. This means that the returned object must implement `__next__` (or `next` in Python 2) and `__iter__`. `iter` cannot perform any sanity checks for objects which only provide `__getitem__`, because it has no way to check whether the items of the object are accessible by integer index.

``````class FailIterIterable(object):
def __iter__(self):
return object() # not an iterator

class FailGetitemIterable(object):
def __getitem__(self, item):
raise Exception
``````

Note that constructing an iterator from `FailIterIterable` instances fails immediately, while constructing an iterator from `FailGetItemIterable` succeeds, but will throw an Exception on the first call to `__next__`.

``````>>> fii = FailIterIterable()
>>> iter(fii)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: iter() returned non-iterator of type "object"
>>>
>>> fgi = FailGetitemIterable()
>>> it = iter(fgi)
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/iterdemo.py", line 42, in __getitem__
raise Exception
Exception
``````

On point 6: `__iter__` wins

This one is straightforward. If an object implements `__iter__` and `__getitem__`, `iter` will call `__iter__`. Consider the following class

``````class IterWinsDemo(object):
def __iter__(self):
return iter(["__iter__", "wins"])

def __getitem__(self, item):
return ["__getitem__", "wins"][item]
``````

and the output when looping over an instance:

``````>>> iwd = IterWinsDemo()
>>> for x in iwd:
...     print(x)
...
__iter__
wins
``````

On point 7: your iterable classes should implement `__iter__`

You might ask yourself why most builtin sequences like `list` implement an `__iter__` method when `__getitem__` would be sufficient.

``````class WrappedList(object): # note: no inheritance from list!
def __init__(self, lst):
self._list = lst

def __getitem__(self, item):
return self._list[item]
``````

After all, iteration over instances of the class above, which delegates calls to `__getitem__` to `list.__getitem__` (using the square bracket notation), will work fine:

``````>>> wl = WrappedList(["A", "B", "C"])
>>> for x in wl:
...     print(x)
...
A
B
C
``````

The reasons your custom iterables should implement `__iter__` are as follows:

1. If you implement `__iter__`, instances will be considered iterables, and `isinstance(o, collections.abc.Iterable)` will return `True`.
2. If the object returned by `__iter__` is not an iterator, `iter` will fail immediately and raise a `TypeError`.
3. The special handling of `__getitem__` exists for backwards compatibility reasons. Quoting again from Fluent Python:

That is why any Python sequence is iterable: they all implement `__getitem__` . In fact, the standard sequences also implement `__iter__`, and yours should too, because the special handling of `__getitem__` exists for backward compatibility reasons and may be gone in the future (although it is not deprecated as I write this).

There are lots of things I have seen make a model diverge.

1. Too high of a learning rate. You can often tell if this is the case if the loss begins to increase and then diverges to infinity.

2. I am not to familiar with the DNNClassifier but I am guessing it uses the categorical cross entropy cost function. This involves taking the log of the prediction which diverges as the prediction approaches zero. That is why people usually add a small epsilon value to the prediction to prevent this divergence. I am guessing the DNNClassifier probably does this or uses the tensorflow opp for it. Probably not the issue.

3. Other numerical stability issues can exist such as division by zero where adding the epsilon can help. Another less obvious one if the square root who"s derivative can diverge if not properly simplified when dealing with finite precision numbers. Yet again I doubt this is the issue in the case of the DNNClassifier.

4. You may have an issue with the input data. Try calling `assert not np.any(np.isnan(x))` on the input data to make sure you are not introducing the nan. Also make sure all of the target values are valid. Finally, make sure the data is properly normalized. You probably want to have the pixels in the range [-1, 1] and not [0, 255].

5. The labels must be in the domain of the loss function, so if using a logarithmic-based loss function all labels must be non-negative (as noted by evan pu and the comments below).