List to array conversion to use ravel() function

ravel | StackOverflow

I have a list in python and I want to convert it to an array to be able to use ravel() function.

Answer rating: 243

Use numpy.asarray:

import numpy as np
myarray = np.asarray(mylist)

List to array conversion to use ravel() function: StackOverflow Questions

What is the difference between flatten and ravel functions in numpy?

import numpy as np
y = np.array(((1,2,3),(4,5,6),(7,8,9)))
[1   2   3   4   5   6   7   8   9]
[1   2   3   4   5   6   7   8   9]

Both function return the same list. Then what is the need of two different functions performing same job.

List to array conversion to use ravel() function

I have a list in python and I want to convert it to an array to be able to use ravel() function.

Answer #1

The current API is that:

  • flatten always returns a copy.
  • ravel returns a view of the original array whenever possible. This isn"t visible in the printed output, but if you modify the array returned by ravel, it may modify the entries in the original array. If you modify the entries in an array returned from flatten this will never happen. ravel will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.
  • reshape((-1,)) gets a view whenever the strides of the array allow it even if that means you don"t always get a contiguous array.

Answer #2

Change this line:

model =, train_y)


model =, train_y.values.ravel())


.values will give the values in an array. (shape: (n,1)

.ravel will convert that array shape to (n, )

Answer #3

Distribution Fitting with Sum of Square Error (SSE)

This is an update and modification to Saullo"s answer, that uses the full list of the current scipy.stats distributions and returns the distribution with the least SSE between the distribution"s histogram and the data"s histogram.

Example Fitting

Using the El Niño dataset from statsmodels, the distributions are fit and error is determined. The distribution with the least error is returned.

All Distributions

All Fitted Distributions

Best Fit Distribution

Best Fit Distribution

Example Code

%matplotlib inline

import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt

matplotlib.rcParams["figure.figsize"] = (16.0, 12.0)"ggplot")

# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
    """Model data by finding best fit distribution to data"""
    # Get histogram of original data
    y, x = np.histogram(data, bins=bins, density=True)
    x = (x + np.roll(x, -1))[:-1] / 2.0

    # Best holders
    best_distributions = []

    # Estimate distribution parameters from data
    for ii, distribution in enumerate([d for d in _distn_names if not d in ["levy_stable", "studentized_range"]]):

        print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))

        distribution = getattr(st, distribution)

        # Try to fit the distribution
            # Ignore warnings from data that can"t be fit
            with warnings.catch_warnings():
                # fit dist to data
                params =

                # Separate parts of parameters
                arg = params[:-2]
                loc = params[-2]
                scale = params[-1]
                # Calculate fitted PDF and error with fit in distribution
                pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
                sse = np.sum(np.power(y - pdf, 2.0))
                # if axis pass in add to plot
                    if ax:
                        pd.Series(pdf, x).plot(ax=ax)
                except Exception:

                # identify if this distribution is better
                best_distributions.append((distribution, params, sse))
        except Exception:

    return sorted(best_distributions, key=lambda x:x[2])

def make_pdf(dist, params, size=10000):
    """Generate distributions"s Probability Distribution Function """

    # Separate parts of parameters
    arg = params[:-2]
    loc = params[-2]
    scale = params[-1]

    # Get sane start and end points of distribution
    start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
    end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)

    # Build PDF and turn into pandas Series
    x = np.linspace(start, end, size)
    y = dist.pdf(x, loc=loc, scale=scale, *arg)
    pdf = pd.Series(y, x)

    return pdf

# Load data from statsmodels datasets
data = pd.Series(sm.datasets.elnino.load_pandas().data.set_index("YEAR").values.ravel())

# Plot for comparison
ax = data.plot(kind="hist", bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams["axes.prop_cycle"])[1]["color"])

# Save plot limits
dataYLim = ax.get_ylim()

# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions[0]

# Update plots
ax.set_title(u"El Niño sea temp.
 All Fitted Distributions")
ax.set_xlabel(u"Temp (°C)")

# Make PDF with best params 
pdf = make_pdf(best_dist[0], best_dist[1])

# Display
ax = pdf.plot(lw=2, label="PDF", legend=True)
data.plot(kind="hist", bins=50, density=True, alpha=0.5, label="Data", legend=True, ax=ax)

param_names = (best_dist[0].shapes + ", loc, scale").split(", ") if best_dist[0].shapes else ["loc", "scale"]
param_str = ", ".join(["{}={:0.2f}".format(k,v) for k,v in zip(param_names, best_dist[1])])
dist_str = "{}({})".format(best_dist[0].name, param_str)

ax.set_title(u"El Niño sea temp. with best fit distribution 
" + dist_str)
ax.set_xlabel(u"Temp. (°C)")

Answer #4

Disclaimer: I"m mostly writing this post with syntactical considerations and general behaviour in mind. I"m not familiar with the memory and CPU aspect of the methods described, and I aim this answer at those who have reasonably small sets of data, such that the quality of the interpolation can be the main aspect to consider. I am aware that when working with very large data sets, the better-performing methods (namely griddata and RBFInterpolator without a neighbors keyword argument) might not be feasible.

Note that this answer uses the new RBFInterpolator class introduced in SciPy 1.7.0. For the legacy Rbf class see the previous version of this answer.

I"m going to compare three kinds of multi-dimensional interpolation methods (interp2d/splines, griddata and RBFInterpolator). I will subject them to two kinds of interpolation tasks and two kinds of underlying functions (points from which are to be interpolated). The specific examples will demonstrate two-dimensional interpolation, but the viable methods are applicable in arbitrary dimensions. Each method provides various kinds of interpolation; in all cases I will use cubic interpolation (or something close1). It"s important to note that whenever you use interpolation you introduce bias compared to your raw data, and the specific methods used affect the artifacts that you will end up with. Always be aware of this, and interpolate responsibly.

The two interpolation tasks will be

  1. upsampling (input data is on a rectangular grid, output data is on a denser grid)
  2. interpolation of scattered data onto a regular grid

The two functions (over the domain [x, y] in [-1, 1]x[-1, 1]) will be

  1. a smooth and friendly function: cos(pi*x)*sin(pi*y); range in [-1, 1]
  2. an evil (and in particular, non-continuous) function: x*y / (x^2 + y^2) with a value of 0.5 near the origin; range in [-0.5, 0.5]

Here"s how they look:

fig1: test functions

I will first demonstrate how the three methods behave under these four tests, then I"ll detail the syntax of all three. If you know what you should expect from a method, you might not want to waste your time learning its syntax (looking at you, interp2d).

Test data

For the sake of explicitness, here is the code with which I generated the input data. While in this specific case I"m obviously aware of the function underlying the data, I will only use this to generate input for the interpolation methods. I use numpy for convenience (and mostly for generating the data), but scipy alone would suffice too.

import numpy as np
import scipy.interpolate as interp

# auxiliary function for mesh generation
def gimme_mesh(n):
    minval = -1
    maxval =  1
    # produce an asymmetric shape in order to catch issues with transpositions
    return np.meshgrid(np.linspace(minval, maxval, n),
                       np.linspace(minval, maxval, n + 1))

# set up underlying test functions, vectorized
def fun_smooth(x, y):
    return np.cos(np.pi*x) * np.sin(np.pi*y)

def fun_evil(x, y):
    # watch out for singular origin; function has no unique limit there
    return np.where(x**2 + y**2 > 1e-10, x*y/(x**2+y**2), 0.5)

# sparse input mesh, 6x7 in shape
N_sparse = 6
x_sparse, y_sparse = gimme_mesh(N_sparse)
z_sparse_smooth = fun_smooth(x_sparse, y_sparse)
z_sparse_evil = fun_evil(x_sparse, y_sparse)

# scattered input points, 10^2 altogether (shape (100,))
N_scattered = 10
rng = np.random.default_rng()
x_scattered, y_scattered = rng.random((2, N_scattered**2))*2 - 1
z_scattered_smooth = fun_smooth(x_scattered, y_scattered)
z_scattered_evil = fun_evil(x_scattered, y_scattered)

# dense output mesh, 20x21 in shape
N_dense = 20
x_dense, y_dense = gimme_mesh(N_dense)

Smooth function and upsampling

Let"s start with the easiest task. Here"s how an upsampling from a mesh of shape [6, 7] to one of [20, 21] works out for the smooth test function:

fig2: smooth upsampling

Even though this is a simple task, there are already subtle differences between the outputs. At a first glance all three outputs are reasonable. There are two features to note, based on our prior knowledge of the underlying function: the middle case of griddata distorts the data most. Note the y == -1 boundary of the plot (nearest the x label): the function should be strictly zero (since y == -1 is a nodal line for the smooth function), yet this is not the case for griddata. Also note the x == -1 boundary of the plots (behind, to the left): the underlying function has a local maximum (implying zero gradient near the boundary) at [-1, -0.5], yet the griddata output shows clearly non-zero gradient in this region. The effect is subtle, but it"s a bias none the less.

Evil function and upsampling

A bit harder task is to perform upsampling on our evil function:

fig3: evil upsampling

Clear differences are starting to show among the three methods. Looking at the surface plots, there are clear spurious extrema appearing in the output from interp2d (note the two humps on the right side of the plotted surface). While griddata and RBFInterpolator seem to produce similar results at first glance, producing local minima near [0.4, -0.4] that is absent from the underlying function.

However, there is one crucial aspect in which RBFInterpolator is far superior: it respects the symmetry of the underlying function (which is of course also made possible by the symmetry of the sample mesh). The output from griddata breaks the symmetry of the sample points, which is already weakly visible in the smooth case.

Smooth function and scattered data

Most often one wants to perform interpolation on scattered data. For this reason I expect these tests to be more important. As shown above, the sample points were chosen pseudo-uniformly in the domain of interest. In realistic scenarios you might have additional noise with each measurement, and you should consider whether it makes sense to interpolate your raw data to begin with.

Output for the smooth function:

fig4: smooth scattered interpolation

Now there"s already a bit of a horror show going on. I clipped the output from interp2d to between [-1, 1] exclusively for plotting, in order to preserve at least a minimal amount of information. It"s clear that while some of the underlying shape is present, there are huge noisy regions where the method completely breaks down. The second case of griddata reproduces the shape fairly nicely, but note the white regions at the border of the contour plot. This is due to the fact that griddata only works inside the convex hull of the input data points (in other words, it doesn"t perform any extrapolation). I kept the default NaN value for output points lying outside the convex hull.2 Considering these features, RBFInterpolator seems to perform best.

Evil function and scattered data

And the moment we"ve all been waiting for:

fig5: evil scattered interpolation

It"s no huge surprise that interp2d gives up. In fact, during the call to interp2d you should expect some friendly RuntimeWarnings complaining about the impossibility of the spline to be constructed. As for the other two methods, RBFInterpolator seems to produce the best output, even near the borders of the domain where the result is extrapolated.

So let me say a few words about the three methods, in decreasing order of preference (so that the worst is the least likely to be read by anybody).


The RBF in the name of the RBFInterpolator class stands for "radial basis functions". To be honest I"ve never considered this approach until I started researching for this post, but I"m pretty sure I"ll be using these in the future.

Just like the spline-based methods (see later), usage comes in two steps: first one creates a callable RBFInterpolator class instance based on the input data, and then calls this object for a given output mesh to obtain the interpolated result. Example from the smooth upsampling test:

import scipy.interpolate as interp

sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1)  # shape (N, 2) in 2d
dense_points = np.stack([x_dense.ravel(), y_dense.ravel()], -1)  # shape (N, 2) in 2d

zfun_smooth_rbf = interp.RBFInterpolator(sparse_points, z_sparse_smooth.ravel(),
                                         smoothing=0, kernel="cubic")  # explicit default smoothing=0 for interpolation
z_dense_smooth_rbf = zfun_smooth_rbf(dense_points).reshape(x_dense.shape)  # not really a function, but a callable class instance

zfun_evil_rbf = interp.RBFInterpolator(sparse_points, z_sparse_evil.ravel(),
                                       smoothing=0, kernel="cubic")  # explicit default smoothing=0 for interpolation
z_dense_evil_rbf = zfun_evil_rbf(dense_points).reshape(x_dense.shape)  # not really a function, but a callable class instance

Note that we had to do some array building gymnastics to make the API of RBFInterpolator happy. Since we have to pass the 2d points as arrays of shape (N, 2), we have to flatten the input grid and stack the two flattened arrays. The constructed interpolator also expects query points in this format, and the result will be a 1d array of shape (N,) which we have to reshape back to match our 2d grid for plotting. Since RBFInterpolator makes no assumptions about the number of dimensions of the input points, it supports arbitrary dimensions for interpolation.

So, scipy.interpolate.RBFInterpolator

  • produces well-behaved output even for crazy input data
  • supports interpolation in higher dimensions
  • extrapolates outside the convex hull of the input points (of course extrapolation is always a gamble, and you should generally not rely on it at all)
  • creates an interpolator as a first step, so evaluating it in various output points is less additional effort
  • can have output point arrays of arbitrary shape (as opposed to being constrained to rectangular meshes, see later)
  • more likely to preserving the symmetry of the input data
  • supports multiple kinds of radial functions for keyword kernel: multiquadric, inverse_multiquadric, inverse_quadratic, gaussian, linear, cubic, quintic, thin_plate_spline (the default). As of SciPy 1.7.0 the class doesn"t allow passing a custom callable due to technical reasons, but this is likely to be added in a future version.
  • can give inexact interpolations by increasing the smoothing parameter

One drawback of RBF interpolation is that interpolating N data points involves inverting an N x N matrix. This quadratic complexity very quickly blows up memory need for a large number of data points. However, the new RBFInterpolator class also supports a neighbors keyword parameter that restricts computation of each radial basis function to k nearest neighbours, thereby reducing memory need.


My former favourite, griddata, is a general workhorse for interpolation in arbitrary dimensions. It doesn"t perform extrapolation beyond setting a single preset value for points outside the convex hull of the nodal points, but since extrapolation is a very fickle and dangerous thing, this is not necessarily a con. Usage example:

sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], -1)  # shape (N, 2) in 2d
z_dense_smooth_griddata = interp.griddata(sparse_points, z_sparse_smooth.ravel(),
                                          (x_dense, y_dense), method="cubic")  # default method is linear

Note that the same array transformations were necessary for the input arrays as for RBFInterpolator. The input points have to be specified in an array of shape [N, D] in D dimensions, or alternatively as a tuple of 1d arrays:

z_dense_smooth_griddata = interp.griddata((x_sparse.ravel(), y_sparse.ravel()),
                                          z_sparse_smooth.ravel(), (x_dense, y_dense), method="cubic")

The output point arrays can be specified as a tuple of arrays of arbitrary dimensions (as in both above snippets), which gives us some more flexibility.

In a nutshell, scipy.interpolate.griddata

  • produces well-behaved output even for crazy input data
  • supports interpolation in higher dimensions
  • does not perform extrapolation, a single value can be set for the output outside the convex hull of the input points (see fill_value)
  • computes the interpolated values in a single call, so probing multiple sets of output points starts from scratch
  • can have output points of arbitrary shape
  • supports nearest-neighbour and linear interpolation in arbitrary dimensions, cubic in 1d and 2d. Nearest-neighbour and linear interpolation use NearestNDInterpolator and LinearNDInterpolator under the hood, respectively. 1d cubic interpolation uses a spline, 2d cubic interpolation uses CloughTocher2DInterpolator to construct a continuously differentiable piecewise-cubic interpolator.
  • might violate the symmetry of the input data


The only reason I"m discussing interp2d and its relatives is that it has a deceptive name, and people are likely to try using it. Spoiler alert: don"t use it (as of scipy version 1.7.0). It"s already more special than the previous subjects in that it"s specifically used for two-dimensional interpolation, but I suspect this is by far the most common case for multivariate interpolation.

As far as syntax goes, interp2d is similar to RBFInterpolator in that it first needs constructing an interpolation instance, which can be called to provide the actual interpolated values. There"s a catch, however: the output points have to be located on a rectangular mesh, so inputs going into the call to the interpolator have to be 1d vectors which span the output grid, as if from numpy.meshgrid:

# reminder: x_sparse and y_sparse are of shape [6, 7] from numpy.meshgrid
zfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind="cubic")   # default kind is "linear"
# reminder: x_dense and y_dense are of shape (20, 21) from numpy.meshgrid
xvec = x_dense[0,:] # 1d array of unique x values, 20 elements
yvec = y_dense[:,0] # 1d array of unique y values, 21 elements
z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec, yvec)   # output is (20, 21)-shaped array

One of the most common mistakes when using interp2d is putting your full 2d meshes into the interpolation call, which leads to explosive memory consumption, and hopefully to a hasty MemoryError.

Now, the greatest problem with interp2d is that it often doesn"t work. In order to understand this, we have to look under the hood. It turns out that interp2d is a wrapper for the lower-level functions bisplrep + bisplev, which are in turn wrappers for FITPACK routines (written in Fortran). The equivalent call to the previous example would be

kind = "cubic"
if kind == "linear":
    kx = ky = 1
elif kind == "cubic":
    kx = ky = 3
elif kind == "quintic":
    kx = ky = 5
# bisplrep constructs a spline representation, bisplev evaluates the spline at given points
bisp_smooth = interp.bisplrep(x_sparse.ravel(), y_sparse.ravel(),
                              z_sparse_smooth.ravel(), kx=kx, ky=ky, s=0)
z_dense_smooth_bisplrep = interp.bisplev(xvec, yvec, bisp_smooth).T  # note the transpose

Now, here"s the thing about interp2d: (in scipy version 1.7.0) there is a nice comment in interpolate/ for interp2d:

if not rectangular_grid:
    # TODO: surfit is really not meant for interpolation!
    self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)

and indeed in interpolate/, in bisplrep there"s some setup and ultimately

tx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky,
                                task, s, eps, tx, ty, nxest, nyest,
                                wrk, lwrk1, lwrk2)                 

And that"s it. The routines underlying interp2d are not really meant to perform interpolation. They might suffice for sufficiently well-behaved data, but under realistic circumstances you will probably want to use something else.

Just to conclude, interpolate.interp2d

  • can lead to artifacts even with well-tempered data
  • is specifically for bivariate problems (although there"s the limited interpn for input points defined on a grid)
  • performs extrapolation
  • creates an interpolator as a first step, so evaluating it in various output points is less additional effort
  • can only produce output over a rectangular grid, for scattered output you would have to call the interpolator in a loop
  • supports linear, cubic and quintic interpolation
  • might violate the symmetry of the input data

1I"m fairly certain that the cubic and linear kind of basis functions of RBFInterpolator do not exactly correspond to the other interpolators of the same name.
2These NaNs are also the reason for why the surface plot seems so odd: matplotlib historically has difficulties with plotting complex 3d objects with proper depth information. The NaN values in the data confuse the renderer, so parts of the surface that should be in the back are plotted to be in the front. This is an issue with visualization, and not interpolation.

Answer #5

To save some folks some time, here is a list I extracted from a small corpus. I do not know if it is complete, but it should have most (if not all) of the help definitions from upenn_tagset...

CC: conjunction, coordinating

& "n and both but either et for less minus neither nor or plus so
therefore times v. versus vs. whether yet

CD: numeral, cardinal

mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one forty-
seven 1987 twenty "79 zero two 78-degrees eighty-four IX "60s .025
fifteen 271,124 dozen quintillion DM2,000 ...

DT: determiner

all an another any both del each either every half la many much nary
neither no some such that the them these this those

EX: existential there


IN: preposition or conjunction, subordinating

astride among upon whether out inside pro despite on by throughout
below within for towards near behind atop around if like until below
next into if beside ...

JJ: adjective or numeral, ordinal

third ill-mannered pre-war regrettable oiled calamitous first separable
ectoplasmic battery-powered participatory fourth still-to-be-named
multilingual multi-disciplinary ...

JJR: adjective, comparative

bleaker braver breezier briefer brighter brisker broader bumper busier
calmer cheaper choosier cleaner clearer closer colder commoner costlier
cozier creamier crunchier cuter ...

JJS: adjective, superlative

calmest cheapest choicest classiest cleanest clearest closest commonest
corniest costliest crassest creepiest crudest cutest darkest deadliest
dearest deepest densest dinkiest ...

LS: list item marker

A A. B B. C C. D E F First G H I J K One SP-44001 SP-44002 SP-44005
SP-44007 Second Third Three Two * a b c d first five four one six three

MD: modal auxiliary

can cannot could couldn"t dare may might must need ought shall should
shouldn"t will would

NN: noun, common, singular or mass

common-carrier cabbage knuckle-duster Casino afghan shed thermostat
investment slide humour falloff slick wind hyena override subhumanity
machinist ...

NNP: noun, proper, singular

Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos
Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA
Shannon A.K.C. Meltex Liverpool ...

NNS: noun, common, plural

undergraduates scotches bric-a-brac products bodyguards facets coasts
divestitures storehouses designs clubs fragrances averages
subjectivists apprehensions muses factory-jobs ...

PDT: pre-determiner

all both half many quite such sure this

POS: genitive marker

" "s

PRP: pronoun, personal

hers herself him himself hisself it itself me myself one oneself ours
ourselves ownself self she thee theirs them themselves they thou thy us

PRP$: pronoun, possessive

her his mine my our ours their thy your

RB: adverb

occasionally unabatingly maddeningly adventurously professedly
stirringly prominently technologically magisterially predominately
swiftly fiscally pitilessly ...

RBR: adverb, comparative

further gloomier grander graver greater grimmer harder harsher
healthier heavier higher however larger later leaner lengthier less-
perfectly lesser lonelier longer louder lower more ...

RBS: adverb, superlative

best biggest bluntest earliest farthest first furthest hardest
heartiest highest largest least less most nearest second tightest worst

RP: particle

aboard about across along apart around aside at away back before behind
by crop down ever fast for forth from go high i.e. in into just later
low more off on open out over per pie raising start teeth that through
under unto up up-pp upon whole with you

TO: "to" as preposition or infinitive marker


UH: interjection

Goodbye Goody Gosh Wow Jeepers Jee-sus Hubba Hey Kee-reist Oops amen
huh howdy uh dammit whammo shucks heck anyways whodunnit honey golly
man baby diddle hush sonuvabitch ...

VB: verb, base form

ask assemble assess assign assume atone attention avoid bake balkanize
bank begin behold believe bend benefit bevel beware bless boil bomb
boost brace break bring broil brush build ...

VBD: verb, past tense

dipped pleaded swiped regummed soaked tidied convened halted registered
cushioned exacted snubbed strode aimed adopted belied figgered
speculated wore appreciated contemplated ...

VBG: verb, present participle or gerund

telegraphing stirring focusing angering judging stalling lactating
hankerin" alleging veering capping approaching traveling besieging
encrypting interrupting erasing wincing ...

VBN: verb, past participle

multihulled dilapidated aerosolized chaired languished panelized used
experimented flourished imitated reunifed factored condensed sheared
unsettled primed dubbed desired ...

VBP: verb, present tense, not 3rd person singular

predominate wrap resort sue twist spill cure lengthen brush terminate
appear tend stray glisten obtain comprise detest tease attract
emphasize mold postpone sever return wag ...

VBZ: verb, present tense, 3rd person singular

bases reconstructs marks mixes displeases seals carps weaves snatches
slumps stretches authorizes smolders pictures emerges stockpiles
seduces fizzes uses bolsters slaps speaks pleads ...

WDT: WH-determiner

that what whatever which whichever

WP: WH-pronoun

that what whatever whatsoever which who whom whosoever

WRB: Wh-adverb

how however whence whenever where whereby whereever wherein whereof why

Answer #6

The answer below pertains primarily to Signed Cookies, an implementation of the concept of sessions (as used in web applications). Flask offers both, normal (unsigned) cookies (via request.cookies and response.set_cookie()) and signed cookies (via flask.session). The answer has two parts, the first describes how a Signed Cookie is generated, and the second is presented in the form of a QA that addresses different aspects of the scheme. The syntax used for the examples is Python3, but the concepts apply also to previous versions.

What is SECRET_KEY (or how to create a Signed Cookie)?

Signing cookies is a preventive measure against cookie tampering. During the process of signing a cookie, the SECRET_KEY is used in a way similar to how a "salt" would be used to muddle a password before hashing it. Here"s a (wildly) simplified description of the concept. The code in the examples is meant to be illustrative. Many of the steps have been omitted and not all of the functions actually exist. The goal here is to provide an understanding of the general idea, actual implementations will be a bit more involved. Also, keep in mind that Flask does most of this for you in the background. So, besides setting values to your cookie (via the session API) and providing a SECRET_KEY, it"s not only ill-advised to reimplement this yourself, but there"s no need to do so:

A poor man"s cookie signature

Before sending a Response to the browser:

( 1 ) First a SECRET_KEY is established. It should only be known to the application and should be kept relatively constant during the application"s life cycle, including through application restarts.

# choose a salt, a secret string of bytes
>>> SECRET_KEY = "my super secret key".encode("utf8")

( 2 ) create a cookie

>>> cookie = make_cookie(
...     name="_profile", 
...     content="uid=382|membership=regular",
...     ...
...     expires="July 1 2030..."
... )

>>> print(cookie)
name: _profile
content: uid=382|membership=regular...
expires: July 1 2030, 1:20:40 AM UTC

( 3 ) to create a signature, append (or prepend) the SECRET_KEY to the cookie byte string, then generate a hash from that combination.

# encode and salt the cookie, then hash the result
>>> cookie_bytes = str(cookie).encode("utf8")
>>> signature = sha1(cookie_bytes+SECRET_KEY).hexdigest()
>>> print(signature)

( 4 ) Now affix the signature at one end of the content field of the original cookie.

# include signature as part of the cookie
>>> cookie.content = cookie.content + "|" + signature
>>> print(cookie)
name: _profile
content: uid=382|membership=regular|7ae0e9...  <--- signature
path: /
send for: Encrypted connections only
expires: July 1 2030, 1:20:40 AM UTC

and that"s what"s sent to the client.

# add cookie to response
>>> response.set_cookie(cookie)
# send to browser --> 

Upon receiving the cookie from the browser:

( 5 ) When the browser returns this cookie back to the server, strip the signature from the cookie"s content field to get back the original cookie.

# Upon receiving the cookie from browser
>>> cookie = request.get_cookie()
# pop the signature out of the cookie
>>> (cookie.content, popped_signature) = cookie.content.rsplit("|", 1)

( 6 ) Use the original cookie with the application"s SECRET_KEY to recalculate the signature using the same method as in step 3.

# recalculate signature using SECRET_KEY and original cookie
>>> cookie_bytes = str(cookie).encode("utf8")
>>> calculated_signature = sha1(cookie_bytes+SECRET_KEY).hexdigest()

( 7 ) Compare the calculated result with the signature previously popped out of the just received cookie. If they match, we know that the cookie has not been messed with. But if even just a space has been added to the cookie, the signatures won"t match.

# if both signatures match, your cookie has not been modified
>>> good_cookie = popped_signature==calculated_signature

( 8 ) If they don"t match then you may respond with any number of actions, log the event, discard the cookie, issue a fresh one, redirect to a login page, etc.

>>> if not good_cookie:
...     security_log(cookie)

Hash-based Message Authentication Code (HMAC)

The type of signature generated above that requires a secret key to ensure the integrity of some contents is called in cryptography a Message Authentication Code or MAC.

I specified earlier that the example above is an oversimplification of that concept and that it wasn"t a good idea to implement your own signing. That"s because the algorithm used to sign cookies in Flask is called HMAC and is a bit more involved than the above simple step-by-step. The general idea is the same, but due to reasons beyond the scope of this discussion, the series of computations are a tad bit more complex. If you"re still interested in crafting a DIY, as it"s usually the case, Python has some modules to help you get started :) here"s a starting block:

import hmac
import hashlib

def create_signature(secret_key, msg, digestmod=None):
    if digestmod is None:
        digestmod = hashlib.sha1
    mac =, msg=msg, digestmod=digestmod)
    return mac.digest()

The documentaton for hmac and hashlib.

The "Demystification" of SECRET_KEY :)

What"s a "signature" in this context?

It"s a method to ensure that some content has not been modified by anyone other than a person or an entity authorized to do so.

One of the simplest forms of signature is the "checksum", which simply verifies that two pieces of data are the same. For example, when installing software from source it"s important to first confirm that your copy of the source code is identical to the author"s. A common approach to do this is to run the source through a cryptographic hash function and compare the output with the checksum published on the project"s home page.

Let"s say for instance that you"re about to download a project"s source in a gzipped file from a web mirror. The SHA1 checksum published on the project"s web page is "eb84e8da7ca23e9f83...."

# so you get the code from the mirror
# you calculate the hash as instructed
> eb84e8da7c....

Both hashes are the same, you know that you have an identical copy.

What"s a cookie?

An extensive discussion on cookies would go beyond the scope of this question. I provide an overview here since a minimal understanding can be useful to have a better understanding of how and why SECRET_KEY is useful. I highly encourage you to follow up with some personal readings on HTTP Cookies.

A common practice in web applications is to use the client (web browser) as a lightweight cache. Cookies are one implementation of this practice. A cookie is typically some data added by the server to an HTTP response by way of its headers. It"s kept by the browser which subsequently sends it back to the server when issuing requests, also by way of HTTP headers. The data contained in a cookie can be used to emulate what"s called statefulness, the illusion that the server is maintaining an ongoing connection with the client. Only, in this case, instead of a wire to keep the connection "alive", you simply have snapshots of the state of the application after it has handled a client"s request. These snapshots are carried back and forth between client and server. Upon receiving a request, the server first reads the content of the cookie to reestablish the context of its conversation with the client. It then handles the request within that context and before returning the response to the client, updates the cookie. The illusion of an ongoing session is thus maintained.

What does a cookie look like?

A typical cookie would look like this:

name: _profile
content: uid=382|status=genie
path: /
send for: Encrypted connections only
expires: July 1 2030, 1:20:40 AM UTC

Cookies are trivial to peruse from any modern browser. On Firefox for example go to Preferences > Privacy > History > remove individual cookies.

The content field is the most relevant to the application. Other fields carry mostly meta instructions to specify various scopes of influence.

Why use cookies at all?

The short answer is performance. Using cookies, minimizes the need to look things up in various data stores (memory caches, files, databases, etc), thus speeding things up on the server application"s side. Keep in mind that the bigger the cookie the heavier the payload over the network, so what you save in database lookup on the server you might lose over the network. Consider carefully what to include in your cookies.

Why would cookies need to be signed?

Cookies are used to keep all sorts of information, some of which can be very sensitive. They"re also by nature not safe and require that a number of auxiliary precautions be taken to be considered secure in any way for both parties, client and server. Signing cookies specifically addresses the problem that they can be tinkered with in attempts to fool server applications. There are other measures to mitigate other types of vulnerabilities, I encourage you to read up more on cookies.

How can a cookie be tampered with?

Cookies reside on the client in text form and can be edited with no effort. A cookie received by your server application could have been modified for a number of reasons, some of which may not be innocent. Imagine a web application that keeps permission information about its users on cookies and grants privileges based on that information. If the cookie is not tinker-proof, anyone could modify theirs to elevate their status from "role=visitor" to "role=admin" and the application would be none the wiser.

Why is a SECRET_KEY necessary to sign cookies?

Verifying cookies is a tad bit different than verifying source code the way it"s described earlier. In the case of the source code, the original author is the trustee and owner of the reference fingerprint (the checksum), which will be kept public. What you don"t trust is the source code, but you trust the public signature. So to verify your copy of the source you simply want your calculated hash to match the public hash.

In the case of a cookie however the application doesn"t keep track of the signature, it keeps track of its SECRET_KEY. The SECRET_KEY is the reference fingerprint. Cookies travel with a signature that they claim to be legit. Legitimacy here means that the signature was issued by the owner of the cookie, that is the application, and in this case, it"s that claim that you don"t trust and you need to check the signature for validity. To do that you need to include an element in the signature that is only known to you, that"s the SECRET_KEY. Someone may change a cookie, but since they don"t have the secret ingredient to properly calculate a valid signature they cannot spoof it. As stated a bit earlier this type of fingerprinting, where on top of the checksum one also provides a secret key, is called a Message Authentication Code.

What about Sessions?

Sessions in their classical implementation are cookies that carry only an ID in the content field, the session_id. The purpose of sessions is exactly the same as signed cookies, i.e. to prevent cookie tampering. Classical sessions have a different approach though. Upon receiving a session cookie the server uses the ID to look up the session data in its own local storage, which could be a database, a file, or sometimes a cache in memory. The session cookie is typically set to expire when the browser is closed. Because of the local storage lookup step, this implementation of sessions typically incurs a performance hit. Signed cookies are becoming a preferred alternative and that"s how Flask"s sessions are implemented. In other words, Flask sessions are signed cookies, and to use signed cookies in Flask just use its Session API.

Why not also encrypt the cookies?

Sometimes the contents of cookies can be encrypted before also being signed. This is done if they"re deemed too sensitive to be visible from the browser (encryption hides the contents). Simply signing cookies however, addresses a different need, one where there"s a desire to maintain a degree of visibility and usability to cookies on the browser, while preventing that they"d be meddled with.

What happens if I change the SECRET_KEY?

By changing the SECRET_KEY you"re invalidating all cookies signed with the previous key. When the application receives a request with a cookie that was signed with a previous SECRET_KEY, it will try to calculate the signature with the new SECRET_KEY, and both signatures won"t match, this cookie and all its data will be rejected, it will be as if the browser is connecting to the server for the first time. Users will be logged out and their old cookie will be forgotten, along with anything stored inside. Note that this is different from the way an expired cookie is handled. An expired cookie may have its lease extended if its signature checks out. An invalid signature just implies a plain invalid cookie.

So unless you want to invalidate all signed cookies, try to keep the SECRET_KEY the same for extended periods.

What"s a good SECRET_KEY?

A secret key should be hard to guess. The documentation on Sessions has a good recipe for random key generation:

>>> import os
>>> os.urandom(24)

You copy the key and paste it in your configuration file as the value of SECRET_KEY.

Short of using a key that was randomly generated, you could use a complex assortment of words, numbers, and symbols, perhaps arranged in a sentence known only to you, encoded in byte form.

Do not set the SECRET_KEY directly with a function that generates a different key each time it"s called. For example, don"t do this:

# this is not good
SECRET_KEY = random_key_generator()

Each time your application is restarted it will be given a new key, thus invalidating the previous.

Instead, open an interactive python shell and call the function to generate the key, then copy and paste it to the config.

Answer #7

Actually the purpose of np.meshgrid is already mentioned in the documentation:


Return coordinate matrices from coordinate vectors.

Make N-D coordinate arrays for vectorized evaluations of N-D scalar/vector fields over N-D grids, given one-dimensional coordinate arrays x1, x2,..., xn.

So it"s primary purpose is to create a coordinates matrices.

You probably just asked yourself:

Why do we need to create coordinate matrices?

The reason you need coordinate matrices with Python/NumPy is that there is no direct relation from coordinates to values, except when your coordinates start with zero and are purely positive integers. Then you can just use the indices of an array as the index. However when that"s not the case you somehow need to store coordinates alongside your data. That"s where grids come in.

Suppose your data is:

1  2  1
2  5  2
1  2  1

However, each value represents a 3 x 2 kilometer area (horizontal x vertical). Suppose your origin is the upper left corner and you want arrays that represent the distance you could use:

import numpy as np
h, v = np.meshgrid(np.arange(3)*3, np.arange(3)*2)

where v is:

array([[0, 0, 0],
       [2, 2, 2],
       [4, 4, 4]])

and h:

array([[0, 3, 6],
       [0, 3, 6],
       [0, 3, 6]])

So if you have two indices, let"s say x and y (that"s why the return value of meshgrid is usually xx or xs instead of x in this case I chose h for horizontally!) then you can get the x coordinate of the point, the y coordinate of the point and the value at that point by using:

h[x, y]    # horizontal coordinate
v[x, y]    # vertical coordinate
data[x, y]  # value

That makes it much easier to keep track of coordinates and (even more importantly) you can pass them to functions that need to know the coordinates.

A slightly longer explanation

However, np.meshgrid itself isn"t often used directly, mostly one just uses one of similar objects np.mgrid or np.ogrid. Here np.mgrid represents the sparse=False and np.ogrid the sparse=True case (I refer to the sparse argument of np.meshgrid). Note that there is a significant difference between np.meshgrid and np.ogrid and np.mgrid: The first two returned values (if there are two or more) are reversed. Often this doesn"t matter but you should give meaningful variable names depending on the context.

For example, in case of a 2D grid and matplotlib.pyplot.imshow it makes sense to name the first returned item of np.meshgrid x and the second one y while it"s the other way around for np.mgrid and np.ogrid.

np.ogrid and sparse grids

>>> import numpy as np
>>> yy, xx = np.ogrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])

As already said the output is reversed when compared to np.meshgrid, that"s why I unpacked it as yy, xx instead of xx, yy:

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6), sparse=True)
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])

This already looks like coordinates, specifically the x and y lines for 2D plots.


yy, xx = np.ogrid[-5:6, -5:6]
plt.title("ogrid (sparse meshgrid)")
plt.scatter(xx, np.zeros_like(xx), color="blue", marker="*")
plt.scatter(np.zeros_like(yy), yy, color="red", marker="x")

enter image description here

np.mgrid and dense/fleshed out grids

>>> yy, xx = np.mgrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])

The same applies here: The output is reversed compared to np.meshgrid:

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6))
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])

Unlike ogrid these arrays contain all xx and yy coordinates in the -5 <= xx <= 5; -5 <= yy <= 5 grid.

yy, xx = np.mgrid[-5:6, -5:6]
plt.title("mgrid (dense meshgrid)")
plt.yticks(yy[:, 0])
plt.scatter(xx, yy, color="red", marker="x")

enter image description here


It"s not only limited to 2D, these functions work for arbitrary dimensions (well, there is a maximum number of arguments given to function in Python and a maximum number of dimensions that NumPy allows):

>>> x1, x2, x3, x4 = np.ogrid[:3, 1:4, 2:5, 3:6]
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print("x{}".format(i+1))
...     print(repr(x))




array([[[[3, 4, 5]]]])

>>> # equivalent meshgrid output, note how the first two arguments are reversed and the unpacking
>>> x2, x1, x3, x4 = np.meshgrid(np.arange(1,4), np.arange(3), np.arange(2, 5), np.arange(3, 6), sparse=True)
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print("x{}".format(i+1))
...     print(repr(x))
# Identical output so it"s omitted here.

Even if these also work for 1D there are two (much more common) 1D grid creation functions:

Besides the start and stop argument it also supports the step argument (even complex steps that represent the number of steps):

>>> x1, x2 = np.mgrid[1:10:2, 1:10:4j]
>>> x1  # The dimension with the explicit step width of 2
array([[1., 1., 1., 1.],
       [3., 3., 3., 3.],
       [5., 5., 5., 5.],
       [7., 7., 7., 7.],
       [9., 9., 9., 9.]])
>>> x2  # The dimension with the "number of steps"
array([[ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.]])


You specifically asked about the purpose and in fact, these grids are extremely useful if you need a coordinate system.

For example if you have a NumPy function that calculates the distance in two dimensions:

def distance_2d(x_point, y_point, x, y):
    return np.hypot(x-x_point, y-y_point)

And you want to know the distance of each point:

>>> ys, xs = np.ogrid[-5:5, -5:5]
>>> distances = distance_2d(1, 2, xs, ys)  # distance to point (1, 2)
>>> distances
array([[9.21954446, 8.60232527, 8.06225775, 7.61577311, 7.28010989,
        7.07106781, 7.        , 7.07106781, 7.28010989, 7.61577311],
       [8.48528137, 7.81024968, 7.21110255, 6.70820393, 6.32455532,
        6.08276253, 6.        , 6.08276253, 6.32455532, 6.70820393],
       [7.81024968, 7.07106781, 6.40312424, 5.83095189, 5.38516481,
        5.09901951, 5.        , 5.09901951, 5.38516481, 5.83095189],
       [7.21110255, 6.40312424, 5.65685425, 5.        , 4.47213595,
        4.12310563, 4.        , 4.12310563, 4.47213595, 5.        ],
       [6.70820393, 5.83095189, 5.        , 4.24264069, 3.60555128,
        3.16227766, 3.        , 3.16227766, 3.60555128, 4.24264069],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.        , 5.        , 4.        , 3.        , 2.        ,
        1.        , 0.        , 1.        , 2.        , 3.        ],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128]])

The output would be identical if one passed in a dense grid instead of an open grid. NumPys broadcasting makes it possible!

Let"s visualize the result:

plt.title("distance to point (1, 2)")
plt.imshow(distances, origin="lower", interpolation="none")
plt.xticks(np.arange(xs.shape[1]), xs.ravel())  # need to set the ticks manually
plt.yticks(np.arange(ys.shape[0]), ys.ravel())

enter image description here

And this is also when NumPys mgrid and ogrid become very convenient because it allows you to easily change the resolution of your grids:

ys, xs = np.ogrid[-5:5:200j, -5:5:200j]
# otherwise same code as above

enter image description here

However, since imshow doesn"t support x and y inputs one has to change the ticks by hand. It would be really convenient if it would accept the x and y coordinates, right?

It"s easy to write functions with NumPy that deal naturally with grids. Furthermore, there are several functions in NumPy, SciPy, matplotlib that expect you to pass in the grid.

I like images so let"s explore matplotlib.pyplot.contour:

ys, xs = np.mgrid[-5:5:200j, -5:5:200j]
density = np.sin(ys)-np.cos(xs)
plt.contour(xs, ys, density)

enter image description here

Note how the coordinates are already correctly set! That wouldn"t be the case if you just passed in the density.

Or to give another fun example using astropy models (this time I don"t care much about the coordinates, I just use them to create some grid):

from astropy.modeling import models
z = np.zeros((100, 100))
y, x = np.mgrid[0:100, 0:100]
for _ in range(10):
    g2d = models.Gaussian2D(amplitude=100, 
                           x_mean=np.random.randint(0, 100), 
                           y_mean=np.random.randint(0, 100), 
    z += g2d(x, y)
    a2d = models.AiryDisk2D(amplitude=70, 
                            x_0=np.random.randint(0, 100), 
                            y_0=np.random.randint(0, 100), 
    z += a2d(x, y)

enter image description here

Although that"s just "for the looks" several functions related to functional models and fitting (for example scipy.interpolate.interp2d, scipy.interpolate.griddata even show examples using np.mgrid) in Scipy, etc. require grids. Most of these work with open grids and dense grids, however some only work with one of them.

Answer #8

Pandas >= 0.25

Series and DataFrame methods define a .explode() method that explodes lists into separate rows. See the docs section on Exploding a list-like column.

Since you have a list of comma separated strings, split the string on comma to get a list of elements, then call explode on that column.

df = pd.DataFrame({"var1": ["a,b,c", "d,e,f"], "var2": [1, 2]})
    var1  var2
0  a,b,c     1
1  d,e,f     2


  var1  var2
0    a     1
0    b     1
0    c     1
1    d     2
1    e     2
1    f     2

Note that explode only works on a single column (for now). To explode multiple columns at once, see below.

NaNs and empty lists get the treatment they deserve without you having to jump through hoops to get it right.

df = pd.DataFrame({"var1": ["d,e,f", "", np.nan], "var2": [1, 2, 3]})
    var1  var2
0  d,e,f     1
1            2
2    NaN     3


0    [d, e, f]
1           []
2          NaN


  var1  var2
0    d     1
0    e     1
0    f     1
1          2  # empty list entry becomes empty string after exploding 
2  NaN     3  # NaN left un-touched

This is a serious advantage over ravel/repeat -based solutions (which ignore empty lists completely, and choke on NaNs).

Exploding Multiple Columns

Note that explode only works on a single column at a time, but you can use apply to explode multiple column at once:

df = pd.DataFrame({"var1": ["a,b,c", "d,e,f"], 
                   "var2": ["i,j,k", "l,m,n"], 
                   "var3": [1, 2]})
    var1   var2  var3
0  a,b,c  i,j,k     1
1  d,e,f  l,m,n     2

   .apply(lambda col: col.str.split(",").explode())
   .reindex(df.columns, axis=1))

  var1 var2  var3
0    a    i     1
1    b    j     1
2    c    k     1
3    d    l     2
4    e    m     2
5    f    n     2

The idea is to set as the index, all the columns that should NOT be exploded, then explode the remaining columns via apply. This works well when the lists are equally sized.

Answer #9

As explained here a key difference is that:

  • flatten is a method of an ndarray object and hence can only be called for true numpy arrays.

  • ravel is a library-level function and hence can be called on any object that can successfully be parsed.

For example ravel will work on a list of ndarrays, while flatten is not available for that type of object.

@IanH also points out important differences with memory handling in his answer.

Answer #10

I realize this is old but I figured I"d clear up a misconception for other travelers. Setting plt.pyplot.isinteractive() to False means that the plot will on be drawn on specific commands to draw (i.e. Setting plt.pyplot.isinteractive() to True means that every pyplot (plt) command will trigger a draw command (i.e. So what you were more than likely looking for is at the end of your program to display the graph.

As a side note you can shorten these statements a bit by using the following import command import matplotlib.pyplot as plt rather than matplotlib as plt.